date:20070312

Re: [BUG} usb-serial regression in 2.6.21-rc2-git3

2007-03-12 Thread Mark Lord


Oliver Neukum wrote:

Am Montag, 12. März 2007 15:56 schrieb Mark Lord:

Still no improvement on the b0rken usb-serial resume support in -rc*.


Have you applied the fixes in Greg's current tree?


I don't know anything about any "current tree"
other than the 2.6.21-rc3-git* on kernel.org.

Greg hasn't suggested anything for this yet,
mostly because he's been busy keeping the 2.6.20.xx
point releases sane.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bug in select() in linux

2007-03-12 Thread Lluís Batlle

Wouldn't it be better for all of us that select() doesn't block on
write(), unless there is a socket writting buffer fulfilled? It will
be consistent with the select() specification.

2007/3/12, Alistair John Strachan <[EMAIL PROTECTED]>:

On Monday 12 March 2007 15:02, Lluís Batlle wrote:
> Oh, of course you're right. I was inside too much layers to think of
> the tcp protocol, and I did not pay attention to it.
>
> Maybe something could be added to the manpage anyway.
>
> The bad thing is that there's no way I can use a socket for writing
> using select() if that connection has been half-closed by the other
> end. Moo.

This question comes up from time to time. I think the answer is
ultimately "select() sucks, use poll()".

I can't exactly remember the details, but I believe POLLHUP or POLLOUT as
flags do what you want.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bug in select() in linux

2007-03-12 Thread Alistair John Strachan

On Monday 12 March 2007 15:02, Lluís Batlle wrote:
> Oh, of course you're right. I was inside too much layers to think of
> the tcp protocol, and I did not pay attention to it.
>
> Maybe something could be added to the manpage anyway.
>
> The bad thing is that there's no way I can use a socket for writing
> using select() if that connection has been half-closed by the other
> end. Moo.

This question comes up from time to time. I think the answer is 
ultimately "select() sucks, use poll()".

I can't exactly remember the details, but I believe POLLHUP or POLLOUT as 
flags do what you want.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Keyboard stops working after *lock [Was: 2.6.21-rc2-mm1]

2007-03-12 Thread Alan Stern

On Mon, 12 Mar 2007, Jiri Slaby wrote:

> Bisecting figured out the culprit:
> Commit: 17230acdc71137622ca7dfd789b3944c75d39404
> Author: Alan Stern <[EMAIL PROTECTED]> Mon, 19 Feb 2007 15:52:45 -0500
> 
>  UHCI: Eliminate asynchronous skeleton Queue Headers
> 
>  This patch (as856) attempts to improve the performance of uhci-hcd by
>  removing the asynchronous skeleton Queue Headers.  They don't contain
>  any useful information but the controller has to read through them at
>  least once every millisecond, incurring a non-zero DMA overhead.
> 
>  Now all the asynchronous queues are combined, along with the period-1
>  interrupt queue, into a single list with a single skeleton QH.  The
>  start of the low-speed control, full-speed control, and bulk sublists
>  is determined by linear search.  Since there should rarely be more
>  than a couple of QHs in the list, the searches should incur a much
>  smaller total load than keeping the skeleton QHs.
> 
>  Signed-off-by: Alan Stern <[EMAIL PROTECTED]>
>  Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
> 
> 
> -mm minus (only) this one is OK.

Okay, here's how to track this down.  I assume that even after the
keyboard stops working you can access the machine via a network
connection.

So turn on CONFIG_USB_DEBUG, CONFIG_USB_MON, and CONFIG_DEBUG_FS.  Then
modprobe uhci-hcd with debug=2, and mount a debugfs filesystem.  Before
using the keyboard, start a cat process to capture the usbmon output for
the keyboard's bus (see the instructions for usbmon in
Documentation/usb/usbmon.txt).

After hanging the keyboard, get a copy of the appropriate controller's 
file in the uhci/ subdirectory of the debugfs filesystem.  Post it along 
with the usbmon log, and I'll try to figure out what happened.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Mon, Mar 12, 2007 at 07:31:48PM +0530, Srivatsa Vaddagiri wrote:
> not so. This in-fact lets vservers and containers to work with each
> other. So:

s/containers/cpusets

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG} usb-serial regression in 2.6.21-rc2-git3

2007-03-12 Thread Oliver Neukum

Am Montag, 12. März 2007 15:56 schrieb Mark Lord:
> Still no improvement on the b0rken usb-serial resume support in -rc*.

Have you applied the fixes in Greg's current tree?

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: do_generic_mapping_read performance issue

2007-03-12 Thread Jan Kara

On Mon 12-03-07 15:39:00, Nick Piggin wrote:
> On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
> >   Hi,
> > 
> > > Hi, I am encountering a performance problem, which I have tracked into 
> > > the 
> > > Linux kernel. The problem occurs with my experimental web server that 
> > > uses 
> > > sendfile to repeatedly transmit files.  The files are based on the static 
> > > portion of the SPECweb99 fileset and range in size to model a reasonable 
> > > workload.  With this workload, a significant number of the requests are 
> > > for files of size 4 KB or less.
> > > 
> > > I have determined that the performance problems occurs in the function
> > > do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> > > Here is the specific code fragment:
> > > 
> > > /*
> > >  * When (part of) the same page is read multiple times
> > >  * in succession, only mark it as accessed the first time.
> > >  */
> > > if (prev_index != index)
> > > mark_page_accessed(page);
> >   Actually, the code is like that certainly for two years :).
> 
> Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
> stage (ie. read from the start of a page -- obviously that also has holes).
  Yes, at least in 2.6.12-rc5 which is the first one in git :).

> > > I was wondering if anyone could explain why the call to 
> > > mark_page_accessed 
> > > is conditional? That is, what problem it is trying to solve. It would 
> > > seem 
> > > that in many scenarios, if the same page is accessed repeatedly, then it 
> > > would be appropriate to keep that page cached.
> >   I also don't know why the condition is there but it's there at least
> > for two years so I'm not sure anybody remembers ;). Nick, do you have
> > an idea?
> 
> Yeah it is there because that is basically how our "use once" detection
> handles the case where an app does not read in chunks that are PAGE_SIZE
> multiples and PAGE_SIZE aligned.
  OK, I see. Then I'm not sure the check does more good than bad. Because
if we happen to reread the same chunk several times, then the check does a
wrong thing...

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possible "struct pid" leak from tty_io.c

2007-03-12 Thread Catalin Marinas


On 09/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

If I can manage to focus on this, it looks like the information I need to
start fixing this.


I had a look at the second leak reported it seems to be caused by the
same proc_set_tty() call but, in this case, there is no
disassociate_tty() call for the task (and the patch I posted is not
enough). Maybe something like below (no thourough testing):

diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index db91398..ea6ca7d 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -3854,7 +3854,14 @@ EXPORT_SYMBOL(tty_devnum);

void proc_clear_tty(struct task_struct *p)
{
+   struct tty_struct *tty;
+
spin_lock_irq(>sighand->siglock);
+   tty = p->signal->tty;
+   if (tty) {
+   put_pid(tty->session);
+   put_pid(tty->pgrp);
+   }
p->signal->tty = NULL;
spin_unlock_irq(>sighand->siglock);
}

--
Catalin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bug in select() in linux

2007-03-12 Thread Lluís Batlle


Oh, of course you're right. I was inside too much layers to think of
the tcp protocol, and I did not pay attention to it.

Maybe something could be added to the manpage anyway.

The bad thing is that there's no way I can use a socket for writing
using select() if that connection has been half-closed by the other
end. Moo.

Thanks a lot for your time,
Lluís

2007/3/12, Alan Cox <[EMAIL PROTECTED]>:

> I've tried a select() for write against a closed tcp socket (closed by
> the other side), and the select call _blocks_.

A TCP close from the remote node is single sided. It sends a FIN and we
ACK the FIN, that means only that the remote node has completed *sending*.

> Any write() call to that socket will _not block_, and will return with EPIPE.

We don't know this until you try the write. At that point the other end
if it has closed entirely will issue an RST and terminate the connection
in full.

I don't therefore believe that this is actually a bug.

Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
> > 3. This next leads me to think that 'tasks' file in each directory doesnt 
> > make
> >sense for containers. In fact it can lend itself to error situations (by
> >administrator/script mistake) when some tasks of a container are in one
> >resource class while others are in a different class.
> >
> > Instead, from a containers pov, it may be usefull to write
> > a 'container id' (if such a thing exists) into the tasks file
> > which will move all the tasks of the container into
> > the new resource class. This is the same requirement we
> > discussed long back of moving all threads of a process into new
> > resource class.
> 
> I think you need to give a more concrete example and use case of what
> you're trying to propose here. I don't really see what advantage
> you're getting.

Ok, this is what I had in mind:


mount -t container -o ns /dev/namespace
mount -t container -o cpu /dev/cpu

Lets we have the namespaces/resource-groups created as under:

/dev/namespace
|-- prof
||- tasks <- (T1, T2)
||- container_id <- 1 (doesnt exist today perhaps)
|
|-- student
||- tasks <- (T3, T4)
||- container_id <- 2 (doesnt exist today perhaps)

/dev/cpu
   |-- prof
   ||-- tasks
   ||-- cpu_limit (40%)
   |
   |-- student
   ||-- tasks
   ||-- cpu_limit (20%)
   |
   |


Is it possible to create the above structure in container patches? 
/me thinks so.

If so, then accidentally someone can do this:

echo T1 > /dev/cpu/prof/tasks
echo T2 > /dev/cpu/student/tasks

with the result that tasks of the same container are now in different
resource classes.

Thats why in case of containers I felt we shldnt allow individual tasks
to be cat'ed to tasks file. 

Or rather, it may be nice to say :

echo "cid 2" > /dev/cpu/prof/tasks 

and have all tasks belonging to container id 2 move to the new resource
group.



-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-12 Thread Alan Stern

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > > Why? What's wrong with simply calling kref_get/put?
> > 
> > It's the same old problem: the race between unbind and sysfs I/O.  What
> > good does holding a reference to the private data structure do if the
> > show/store method gets called after the driver has been unbound from the
> > device?  dev_get_drvdata() will no longer provide a valid pointer to the
> > private data, so the method will have no way to access it.  Hence the
> > method needs another argument.
> 
> It does half the job. You can make sure the driver is not asked to access
> freed memory.
> It is true that a driver will have to mark that device "disconnected"
> and return errors if that device's attributes are referenced, but this can
> be done internally.

No, you're missing the point.  Let's say driver A's disconnect() is
called, so the driver marks its private data structure as "disconnected"
and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
device, and it does its own dev_set_drvdata().  Then a user still holding
an open sysfs file reference for driver A calls a show() or store()  
method.  The method will do dev_get_drvdata(), receiving the pointer to
driver B's private data.  Now you're in trouble, because A's method will
think it owns B's private data!

> Yes, this is a bit more complicated.
> {rant mode}
> Who came up with the idea of making life simpler by adding a code path?
> All these problems were already solved for device nodes. Ioctl is ugly, but
> at least a known code path.
> {rant off}

I'll let Greg give the complete answer.  :-)  Bear in mind, however, that
the aim was probably to make life simpler for userspace -- which does not
mean making life simpler for the kernel.

(Incidentally, I'm not so sure that all these problems really were solved 
by ioctl on device nodes.  I bet you could find plenty of cases where 
ioctl races with disconnect if you looked.)

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Bitbanging i2c bus driver using the GPIO API

2007-03-12 Thread Haavard Skinnemoen

On Mon, 12 Mar 2007 15:34:57 +0100
Haavard Skinnemoen <[EMAIL PROTECTED]> wrote:

> > > + bit_data->udelay= 5,/* 100 kHz */
> > > + bit_data->timeout   = HZ / 10,  /* 100 ms */  
> > 
> > Can we add these udelay/timeout to struct i2c_gpio_platform_data? And
> > let customer to choose these according their specific requirement. We
> > use Kconfig to do this, but Jean and David don't like the idea, -:(  
> 
> Yeah, they need to be a bit more configurable than they currently are.
> And I think it makes sense to pass them from the board setup code, since
> this is where things depending on board-specific details (signal quality
> issues, pullup resistor values, etc.) are supposed to go.

By the way, timeout seems to be hardcoded to 100 jiffies in the
i2c-algo-bit driver, so there's probably not much point passing it from
the board code when it's going to be overridden anyway. I'll add just a
udelay parameter to the platform struct for  now.

Haavard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-03-12 Thread Jeff Garzik


Andi Kleen wrote:

in Linux. Apparently in some cases sata_nv does DMA on an already freed and then
reused mapping.


Any data or additional info on that?  Did you discover this by tracking 
the DMA API software routines, or something lower level (like a bus 
analyzer)?


libata handles all the DMA allocation and mapping and cleanup for 
sata_nv, so any software problem would affect the whole of libata.


But it's possible that the nForce SATA chip has DMA padding needs that 
are different from those provided by libata-core (grep for "pad"), which 
could create a situation where the hardware continues DMA'ing past the 
end of the DMA area.


Jeff




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-12 Thread jos poortvliet

Op Monday 12 March 2007, schreef Al Boldi:
> Con Kolivas wrote:
> > > > The higher priority one always get 6-7ms whereas the lower priority
> > > > one runs 6-7ms and then one larger perfectly bound expiration amount.
> > > > Basically exactly as I'd expect. The higher priority task gets
> > > > precisely RR_INTERVAL maximum latency whereas the lower priority task
> > > > gets RR_INTERVAL min and full expiration (according to the virtual
> > > > deadline) as a maximum. That's exactly how I intend it to work. Yes I
> > > > realise that the max latency ends up being longer intermittently on
> > > > the niced task but that's -in my opinion- perfectly fine as a
> > > > compromise to ensure the nice 0 one always gets low latency.
> > >
> > > I think, it should be possible to spread this max expiration latency
> > > across the rotation, should it not?
> >
> > There is a way that I toyed with of creating maps of slots to use for
> > each different priority, but it broke the O(1) nature of the virtual
> > deadline management. Minimising algorithmic complexity seemed more
> > important to maintain than getting slightly better latency spreads for
> > niced tasks. It also appeared to be less cache friendly in design. I
> > could certainly try and implement it but how much importance are we to
> > place on latency of niced tasks? Are you aware of any usage scenario
> > where latency sensitive tasks are ever significantly niced in the real
> > world?
>
> It only takes one negatively nice'd proc to affect X adversely.

Then, maybe, we should start nicing X again, like we did/had to do until a few 
years ago? Or should we just wait until X gets fixed (after all, development 
goes faster than ever)? Or is this really the scheduler's fault?

> Thanks!
>
> --
> Al
>
> ___
> http://ck.kolivas.org/faqs/replying-to-mailing-list.txt
> ck mailing list - mailto: [EMAIL PROTECTED]
> http://vds.kolivas.org/mailman/listinfo/ck



-- 
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb. 
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld wat 
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf. 
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.


pgppX4nCMFZsG.pgp
Description: PGP signature

Re: [BUG} usb-serial regression in 2.6.21-rc2-git3

2007-03-12 Thread Mark Lord


Still no improvement on the b0rken usb-serial resume support in -rc*.

Today I got this ooops on resume from RAM.
Slightly tainted kernel this time (vmware), but not previously
on similar crashes.  I cannot yet get it to "crash on demaind",
so you'll just have to live with it this time.

All USB was dead until after a reboot this time.

pl2303 5-6.3:1.0: PM: resume from 2, parent 5-6.3 still 2
usbdev5.23_ep81: PM: resume from 0, parent 5-6.3:1.0 still 2
usbdev5.23_ep02: PM: resume from 0, parent 5-6.3:1.0 still 2
usbdev5.23_ep83: PM: resume from 0, parent 5-6.3:1.0 still 2
usbdev5.23: PM: resume from 0, parent 5-6.3 still 2
usb 5-6.4: PM: resume from 2, parent 5-6 still 2
usbdev5.24_ep00: PM: resume from 0, parent 5-6.4 still 2
usbhid 5-6.4:1.0: PM: resume from 2, parent 5-6.4 still 2
usbdev5.24_ep81: PM: resume from 0, parent 5-6.4:1.0 still 2
usbdev5.24: PM: resume from 0, parent 5-6.4 still 2
Restarting tasks ... done.
ata1.00: configured for UDMA/100
SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO 
or FUA
usb 5-6: USB disconnect, address 22
usb 5-6.3: USB disconnect, address 23
BUG: unable to handle kernel paging request at virtual address fffb
printing eip:
c01823cb
*pde = 2067
*pte = 
Oops:  [#1]
PREEMPT 
Modules linked in: hci_usb radeon drm vmnet(P) vmmon(P) nfsd exportfs lockd nfs_acl sunrpc acpi_cpufreq cpufreq_ondemand cpufreq_powersave cpufreq_userspace cpufreq_stats freq_table cpufreq_conservative ac fan button thermal video battery container processor rfcomm l2cap bluetooth cfq_iosched deflate zlib_deflate twofish twofish_common serpent blowfish des cbc ecb blkcipher aes xcbc sha256 sha1 crypto_null af_key af_packet sbp2 usbhid hid pl2303 usbserial pcmcia mousedev snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss ipw2200 snd_pcm ieee80211 ieee80211_crypt snd_timer ohci1394 psmouse firmware_class sdhci mmc_core yenta_socket rsrc_nonstatic ieee1394 serio_raw b44 mii pcmcia_core snd pcspkr soundcore ehci_hcd snd_page_alloc uhci_hcd ahci usbcore intel_agp agpgart sg sr_mod cdrom unix

CPU:0
EIP:0060:[]Tainted: P   VLI
EFLAGS: 00010286   (2.6.21-rc3-git4 #5)
EIP is at sysfs_hash_and_remove+0x18/0x111
eax: fff3   ebx: c030b3d4   ecx:    edx: fff3
esi: fff3   edi: fff3   ebp: f62af618   esp: f74e5dec
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process khubd (pid: 1955, ti=f74e4000 task=f74d6a50 task.ti=f74e4000)
Stack: c02d15d8 fff3 c0158595 c030b3d4 fff3 fff3 f62af618 c01843d9 
  c030b3c8 f411f8e0 c0184422 f411f878 0001 f2ca4880 c02005ca f411f8e0 
  c01fba1f f411f878 0001 f2ca4880  c01fba4e f2ca4880 f8a15d0e 
Call Trace:

[] lookup_one_len+0x4d/0x5c
[] remove_files+0x15/0x1e
[] sysfs_remove_group+0x40/0x56
[] device_pm_remove+0x1d/0x5a
[] device_del+0x167/0x18e
[] device_unregister+0x8/0x10
[] destroy_serial+0x80/0xcc [usbserial]
[] destroy_serial+0x0/0xcc [usbserial]
[] kref_put+0x5f/0x6e
[] usb_serial_disconnect+0x81/0xaa [usbserial]
[] kref_put+0x5f/0x6e
[] usb_unbind_interface+0x2a/0x59 [usbcore]
[] __device_release_driver+0x6e/0x8b
[] device_release_driver+0x1d/0x32
[] bus_remove_device+0x71/0x81
[] device_del+0x134/0x18e
[] usb_disable_device+0x5c/0xbb [usbcore]
[] usb_disconnect+0x82/0x104 [usbcore]
[] usb_disconnect+0x70/0x104 [usbcore]
[] hub_thread+0x30b/0x9db [usbcore]
[] schedule+0x47c/0x51f
[] autoremove_wake_function+0x0/0x33
[] hub_thread+0x0/0x9db [usbcore]
[] kthread+0x9b/0xbf
[] kthread+0x0/0xbf
[] kernel_thread_helper+0x7/0x10
===
Code: 8b 40 08 a8 08 74 08 5b 5e 5f e9 6c ac 0f 00 5b 5e 5f c3 55 57 56 53 83 ec 0c 85 c0 89 44 24 04 89 14 24 0f 84 ed 00 00 00 89 c2 <8b> 40 08 85 c0 0f 84 e0 00 00 00 8b 52 50 83 c0 68 89 54 24 08 
EIP: [] sysfs_hash_and_remove+0x18/0x111 SS:ESP 0068:f74e5dec

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

lid switch hangs notebook

2007-03-12 Thread Antonio Mignolli


Hi,
I'm running a slackware 10.2 on a HP/Compaq nx5000.

With kernels <= 2.6.17.3 I didn't have problems.
Starting from 2.6.19 if I close the notebook's video,
or if I press the lid switch,
after a couple of time, or after a few seconds, the o.s. hangs
completely. The only thing to do is a brute power off
by pressing for 5 seconds the power button
(If I press for few it should do a clean shutdown -h).
I've tried 2.6.19, 2.6.19.1, 2.6.19.2, 2.6.20, 2.6.20.1, 2.6.20.2,
and all of them seem to give problems.

I don't have KERNEL_DEBUG set, so I haven't any debug informations,
I will be happy to activate it and provide more info if someone suggests
me which debug parameters I should set.
There's nothing in /var/log/messages and /var/log/syslog,
except for the message "LID switch" added by me
with a logger command in /etc/acpi/acpi_handler.sh.
I've put it after noticing the problem, in order to be sure the event was
detected, and it was.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL-mm 0.28

2007-03-12 Thread Ray Lee


On 3/12/07, David Schwartz <[EMAIL PROTECTED]> wrote:

In no case is much of anything guaranteed, of course. (What can you do if
there's no other process to yield to?)


Perhaps if sched_yield()'s effects were cumulative inside a timeslice,
then eventually the calling task would get pushed far enough down that
others would run.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

bug in select() in linux

2007-03-12 Thread Lluís Batlle


Hello,

I've found a problem in the select() call. The manpage states:
"those in writefds will be watched to see if a write will not block"

I've tried a select() for write against a closed tcp socket (closed by
the other side), and the select call _blocks_.

Any write() call to that socket will _not block_, and will return with EPIPE.

I've seen this happening in 2.4.20 and 2.6.16.29 (xen patched) at least.

Maybe it's a glibc problem - you may know better.

Could you please keep me in carbon copy? I'm not sure I want to
subscribe to LKML.

Regards,
Lluís.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] futex: PI state locking fix

2007-03-12 Thread Ingo Molnar


* Chuck Ebbert <[EMAIL PROTECTED]> wrote:

> > this patch has been tested in -rt. Must-have for v2.6.21.
> 
> Does that fix this:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=224262

yeah, could very much be related.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata extension

2007-03-12 Thread Mark Lord


Vitaliyi wrote:

Good Day

Say i want to implement extended set of ATA commands available to
userspace for building diagnostic tools.
I need 0x40 -- read verify and 0x32 -- write long with error handling,
for example. I was trying ide driver through ioctl's, but seems it
lack of functionality and full of gotchas. Furthermore it oopses
sometimes.


Use the SCSI SG_IO ioctl() with opcode=ATA_16,
which gives you access to the ATA Passthrough mechanism.
This will work for most ATA commands.

I already use it in hdparm and in some other utilities
for scanning/repairing drives.

A notable exeception are the READ/WRITE LONG opcodes,
which require an extra kernel patch from me,
awaiting merge into libata some year.

Cheers

-ml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: _proxy_pda still makes linking modules fail

2007-03-12 Thread Jeremy Fitzhardinge

Andi Kleen wrote:
>> Rusty's pda->per_cpu patch will deal with this once and for all; have
>> 
>
> Not on x86-64.
>   

Have you considered dropping pda in x86-64?  Segment based percpu
doesn't really have any disadvantages.

>> you picked it up yet?
>> 
>
> Not yet.
>   

There will be interactions with my paravirt+xen patches, so I'm
wondering if I should rebase onto those patches, or should we apply them
later and fix everything up?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] futex: PI state locking fix

2007-03-12 Thread Chuck Ebbert

Ingo Molnar wrote:
> Subject: [patch] futex: PI state locking fix
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> testing of -rt by IBM uncovered a locking bug in wake_futex_pi(): the PI 
> state needs to be locked before we access it.
> 
> this patch has been tested in -rt. Must-have for v2.6.21.
> 

Does that fix this:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=224262

[] rt_mutex_next_owner+0x2f/0x40
kernel/rt_mutex_common.c:rt_mutex_top_waiter():74:
BUG_ON(w->lock != lock);
[] do_futex+0x94d/0xbe3
 inlined: futex_unlock_pi()
 inlined: wake_futex_pi()
 kernel/futex.c:wake_futex_pi():569:
new_owner = rt_mutex_next_owner(_state->pi_mutex);
[] sys_futex+0x11d/0x130
[] syscall_call+0x7/0xb


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [SOUND] hda_intel: build fix

2007-03-12 Thread Takashi Iwai

At Mon, 12 Mar 2007 13:53:51 +,
Ralf Baechle wrote:
> 
> On Mon, Mar 12, 2007 at 12:04:30PM +0100, Takashi Iwai wrote:
> 
> > It's no big problem to remove const in these cases, but allowing const
> > with __devinitdata seems the right fix to me...
> 
> Gccs derives the readability of a section used with __attribute(section())
> from the first use, which in case of this driver was a non-const use, so
> gcc made .init.data a r/w section.  Later uses were marked with const,
> so did conflict.  Having to ensure that all members of a section are const
> or are not const is painful, so this is clearly less than desirable
> behaviour on gcc's side.  I think gcc picking the most permissive
> attributes for a section, that is r/w in this case would be far preferable.
> 
> Here is a small test case btw:
> 
> int foo __attribute__ ((__section__ (".init.data"))) = 23;
> const int bar __attribute__ ((__section__ (".init.data"))) = 42;
> 
> Now I'm not a great fan of the patch I've posted but it reflects what real
> world gcc is doing so for the time being I don't see much of a chance to
> The Right Thing (TM).  And the gain from const in this case will be small
> anyway.

Fair enough.  I agree that removing const is the only reasonable fix
right now.   But from semantics, const is a good thing, and people may
try to add it again later if we get rid of them now.  So, how about to
comment out such as /*const*/ in each place to remind that it's
intentional?

Also, in your patch to ice1712, you don't have to remove const from the
codes in snd_ice1712_read_eeprom() and snd_ice1712_probe() functions.
They should work as const pointer.


Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Bitbanging i2c bus driver using the GPIO API

2007-03-12 Thread Haavard Skinnemoen

On Mon, 12 Mar 2007 18:07:59 +0800
"Wu, Bryan" <[EMAIL PROTECTED]> wrote:
> >   static struct i2c_gpio_platform_data i2c_gpio_data = {
> > .sda_pin= GPIO_PIN_FOO,
> > .scl_pin= GPIO_PIN_BAR,
> >   };
> 
> Is this usage right, because 3 flags are added to this structure as
> below:
> 
> struct i2c_gpio_platform_data {
>   unsigned int sda_pin;
>   unsigned int scl_pin;
>   unsigned int sda_is_open_drain:1;
>   unsigned int scl_is_open_drain:1;
>   unsigned int scl_is_output_only:1;
> };

Well, it is the simplest possible example. The last 3 fields will be 0,
which is a valid configuration.

> Thanks a lot,  I will drop our GPIO based I2C driver and try this one on
> our platform.

I hope it works for you.

> > +   if (!pdata->scl_is_output_only)
> > +   bit_data->getscl = i2c_gpio_getscl,
> > +
> > +   bit_data->getsda= i2c_gpio_getsda,
> > +   bit_data->udelay= 5,/* 100 kHz */
> > +   bit_data->timeout   = HZ / 10,  /* 100 ms */
> 
> Can we add these udelay/timeout to struct i2c_gpio_platform_data? And
> let customer to choose these according their specific requirement. We
> use Kconfig to do this, but Jean and David don't like the idea, -:(

Yeah, they need to be a bit more configurable than they currently are.
And I think it makes sense to pass them from the board setup code, since
this is where things depending on board-specific details (signal quality
issues, pullup resistor values, etc.) are supposed to go.

Haavard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: do_generic_mapping_read performance issue

2007-03-12 Thread Nick Piggin

On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
>   Hi,
> 
> > Hi, I am encountering a performance problem, which I have tracked into the 
> > Linux kernel. The problem occurs with my experimental web server that uses 
> > sendfile to repeatedly transmit files.  The files are based on the static 
> > portion of the SPECweb99 fileset and range in size to model a reasonable 
> > workload.  With this workload, a significant number of the requests are 
> > for files of size 4 KB or less.
> > 
> > I have determined that the performance problems occurs in the function
> > do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> > Here is the specific code fragment:
> > 
> > /*
> >  * When (part of) the same page is read multiple times
> >  * in succession, only mark it as accessed the first time.
> >  */
> > if (prev_index != index)
> > mark_page_accessed(page);
>   Actually, the code is like that certainly for two years :).

Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
stage (ie. read from the start of a page -- obviously that also has holes).

> > The implication of this code is that for files of size less than or equal 
> > to a single page, the page associated with such a file is likely to get 
> > evicted from the cache regardless of how frequently it is accessed.  The 
> > reason is that after the first access, prev_index is always zero and index 
> > can only be zero. Hence, mark_page_accessed is never called after the 
> > first time the file is requested.  As a result, the page is evicted from 
> > the cache no matter how frequently it is used.  By changing the kernel to 
> > always call mark_page_accessed for these files, the server throughput is 
> > increased by as much as 20%.
>   Your analysis seems to be right. But to observe this behaviour you have
> to have the file open and just always reread it using the same file
> descriptor, don't you? That's probably not too common...
> 
> > I was wondering if anyone could explain why the call to mark_page_accessed 
> > is conditional? That is, what problem it is trying to solve. It would seem 
> > that in many scenarios, if the same page is accessed repeatedly, then it 
> > would be appropriate to keep that page cached.
>   I also don't know why the condition is there but it's there at least
> for two years so I'm not sure anybody remembers ;). Nick, do you have
> an idea?

Yeah it is there because that is basically how our "use once" detection
handles the case where an app does not read in chunks that are PAGE_SIZE
multiples and PAGE_SIZE aligned.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Mike Galbraith

On Mon, 2007-03-12 at 22:23 +1100, Con Kolivas wrote:

> Mike the cpu is being proportioned out perfectly according to fairness as I 
> mentioned in the prior email, yet X is getting the lower latency scheduling. 
> I'm not sure within the bounds of fairness what more would you have happen to 
> your liking with this test case?

It has been said that "perfection is the enemy of good".  The two
interactive tasks receiving 40% cpu while two niced background jobs
receive 60% may well be perfect, but it's damn sure not good.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Theodore Tso <[EMAIL PROTECTED]> wrote:

> What we probably need in the long-term, and not just for high 
> precision wakeups, is we need a way for waiters (either in the kernel 
> or in userspace) to specify a desired precision in their timers.  Is 
> it, "wake me up in a second, exactly", or "wake me up in a second, 
> plus or minus 10ms"?  (or 50ms?  or 100ms?).

such a facility exists already, see round_jiffies() and 
round_jiffies_relative(). There's some short blurb about it at:

http://kernelnewbies.org/LinuxChanges#head-513ceda14f5d8cf5b8a7c81d7e3821543141ecb0

> This becomes especially important if we want the tickless code to 
> really shine as far as power management is concerned. [...]

yes. That's why we also implemented /proc/timer_stat, and this was 
measured and a few higher-frequency fuzzy waiters were converted to use 
round_jiffies(). Some other waiters were fixed in user-space. It's all 
dependent on actual measurements and circumstances.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin

On Mon, Mar 12, 2007 at 10:12:14AM -0400, Theodore Tso wrote:
> On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote:
> > On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
> > > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
> > > > Ingo Molnar <[EMAIL PROTECTED]> writes:
> > > > > 
> > > > > the only correct approach is the use of hrtimers, and a patch exists 
> > > > > for 
> > > > > that - see below. This has been included in -rt for quite some time.
> > > > 
> > > > But isn't that bad for power management? You'll likely get more
> > > > idle wakeups, won't you?
> > > 
> > > Why so ? It comes more precise, but only once.
> > 
> > When it's clustered around the jiffies interval then wakeups from
> > multiple processes will be somewhat batched. With a precise wakeup you'll
> > get wakeups all over the jiffies period, won't you?
> 
> What we probably need in the long-term, and not just for high
> precision wakeups, is we need a way for waiters (either in the kernel
> or in userspace) to specify a desired precision in their timers.  Is
> it, "wake me up in a second, exactly", or "wake me up in a second,
> plus or minus 10ms"?   (or 50ms?  or 100ms?).

Would this work, or will it just create more confusion for the API user?
I mean, all sleeps can only guarantee "no less than".

Would it be enough for a binary (exact as possible / relaxed if needed)
flag? Or perhaps ternary (exact/relaxed/batched), where relaxed could
add an extra jiffy or so, and batched is really relaxed that may delay
up to double the value of the timeout.

> This becomes especially important if we want the tickless code to
> really shine as far as power management is concerned.  Unfortunately,
> the POSIX timer abstraction doesn't give this kind of flexibility
> easily, so it's going to be a while before we see significant
> userspace adoption of such a kernel feature, but I think it's
> something that would be still worthwhile to add.

But given that we know most userspace API timeouts are broadly just an
"equal to or greater", then we could add another timeout flag to specify
it is a userspace timeout, and make that controllable by sysctl.

Sure it isn't ideal, but for those who really want power / hypervisor
savings, it could be useful.

BTW. my futex man page says timeout's contents "describe the maximum duration
of the wait". Surely that should be *minimum*? Michael cc'ed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen

> This becomes especially important if we want the tickless code to
> really shine as far as power management is concerned.  Unfortunately,
> the POSIX timer abstraction doesn't give this kind of flexibility
> easily, so it's going to be a while before we see significant
> userspace adoption of such a kernel feature, but I think it's
> something that would be still worthwhile to add.

I suspect it would be overkill to specify that on every sleep operation.
99% of applications won't care and for the 1% leftover a global per 
process setting should be fine.
I think a single prctl() and a global sysctl as default would be enough.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-12 Thread Al Boldi

Con Kolivas wrote:
> > > The higher priority one always get 6-7ms whereas the lower priority
> > > one runs 6-7ms and then one larger perfectly bound expiration amount.
> > > Basically exactly as I'd expect. The higher priority task gets
> > > precisely RR_INTERVAL maximum latency whereas the lower priority task
> > > gets RR_INTERVAL min and full expiration (according to the virtual
> > > deadline) as a maximum. That's exactly how I intend it to work. Yes I
> > > realise that the max latency ends up being longer intermittently on
> > > the niced task but that's -in my opinion- perfectly fine as a
> > > compromise to ensure the nice 0 one always gets low latency.
> >
> > I think, it should be possible to spread this max expiration latency
> > across the rotation, should it not?
>
> There is a way that I toyed with of creating maps of slots to use for each
> different priority, but it broke the O(1) nature of the virtual deadline
> management. Minimising algorithmic complexity seemed more important to
> maintain than getting slightly better latency spreads for niced tasks. It
> also appeared to be less cache friendly in design. I could certainly try
> and implement it but how much importance are we to place on latency of
> niced tasks? Are you aware of any usage scenario where latency sensitive
> tasks are ever significantly niced in the real world?

It only takes one negatively nice'd proc to affect X adversely.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: do_generic_mapping_read performance issue

2007-03-12 Thread Jan Kara

  Hi,

> Hi, I am encountering a performance problem, which I have tracked into the 
> Linux kernel. The problem occurs with my experimental web server that uses 
> sendfile to repeatedly transmit files.  The files are based on the static 
> portion of the SPECweb99 fileset and range in size to model a reasonable 
> workload.  With this workload, a significant number of the requests are 
> for files of size 4 KB or less.
> 
> I have determined that the performance problems occurs in the function
> do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> Here is the specific code fragment:
> 
> /*
>  * When (part of) the same page is read multiple times
>  * in succession, only mark it as accessed the first time.
>  */
> if (prev_index != index)
> mark_page_accessed(page);
  Actually, the code is like that certainly for two years :).

> The implication of this code is that for files of size less than or equal 
> to a single page, the page associated with such a file is likely to get 
> evicted from the cache regardless of how frequently it is accessed.  The 
> reason is that after the first access, prev_index is always zero and index 
> can only be zero. Hence, mark_page_accessed is never called after the 
> first time the file is requested.  As a result, the page is evicted from 
> the cache no matter how frequently it is used.  By changing the kernel to 
> always call mark_page_accessed for these files, the server throughput is 
> increased by as much as 20%.
  Your analysis seems to be right. But to observe this behaviour you have
to have the file open and just always reread it using the same file
descriptor, don't you? That's probably not too common...

> I was wondering if anyone could explain why the call to mark_page_accessed 
> is conditional? That is, what problem it is trying to solve. It would seem 
> that in many scenarios, if the same page is accessed repeatedly, then it 
> would be appropriate to keep that page cached.
  I also don't know why the condition is there but it's there at least
for two years so I'm not sure anybody remembers ;). Nick, do you have
an idea?

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!)

2007-03-12 Thread Michael S. Tsirkin

> Quoting Ingo Molnar <[EMAIL PROTECTED]>:
> Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup 
> detected on CPU#0!)
> 
> 
> * Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
> 
> > > could you turn on CONFIG_SLAB_DEBUG as well?
> > > 
> > > that should catch certain types of use-after-free accesses, and 
> > > lockdep will also warn if a still locked object is freed.
> > 
> > Hmm, no, this does not look like use-after-free. I enabled 
> > CONFIG_SLAB_DEBUG, and I still see the same message, so the memory was 
> > not overwritten by slab debugger.
> 
> that's still not conclusive - the memory might not have been allocated 
> by slab again to detect it. Your magic-number check definitely shows 
> some sort of corruption going on, right?

Not necessarily in such a direct way.

I currently think we are somehow getting neighbours where
neigh->dev points to a loopback device - that's type 772,
and this seems to make sense.
I printed out the device name and sure enough it is "lo".

Is it true that sticking the following

static int ipoib_neigh_setup_dev(struct net_device *dev,
 struct neigh_parms *parms)
{
parms->neigh_destructor = ipoib_neigh_destructor;

return 0;
}

in dev->neigh_setup, as ipoib does, guarantees that neighbour->dev will point to
the current device for any neighbour which ipoib_neigh_destructor gets?

That's the assumption IPoIB makes, and it seems broken in this instance.

How could that be?

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] BUILD_BUG_ON_ZERO -> BUILD_BUG_OR_ZERO

2007-03-12 Thread Stefan Richter

Robert P. J. Day wrote:
> On Mon, 12 Mar 2007, Stefan Richter wrote:
>> Rusty Russell wrote:
>> > OTOH, BUILD_BUG_OR_ZERO says what happens: either it's a build bug, or
>> > it's zero.
>>
>> What about ZERO_UNLESS_BUILD_BUG_ON(e)? It's long though...
> 
> how often is this going to be used?  it's not like the tree is
> currently awash in calls to BUILD_BUG_ON_ZERO as it is.

Most of the time it will hidden as a macro-in-a-macro, like in
ARRAY_SIZE().  So the length of the name doesn't matter much.  But then,
the _name_ itself doesn't matter much because authors of public macros
are the primary user group, not John Driverhacker.
-- 
Stefan Richter
-=-=-=== --== -==--
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code

2007-03-12 Thread Srivatsa Vaddagiri

On Sun, Mar 11, 2007 at 12:38:43PM -0700, Paul Jackson wrote:
> The primary reason for the cpuset double locking, as I recall, was because
> cpusets needs to access cpusets inside the memory allocator.  

"needs to access cpusets" - can you be more specific?

Being able to safely walk cpuset->parent list - is it the only access
you are talking of or more?

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Bitbanging i2c bus driver using the GPIO API

2007-03-12 Thread Haavard Skinnemoen

On Sat, 10 Mar 2007 21:15:50 +0100
Jean Delvare <[EMAIL PROTECTED]> wrote:

> I like the idea very much. Would this let us get rid of i2c-ixp2000?
> i2c-ixp4xx? scx200_i2c? Other drivers?

Any platform that implements the generic gpio api should be able to use
this driver. So yes, I hope we might be able to get rid of a few
existing bitbanging drivers.

> > +/*
> > + * Bitbanging i2c bus driver using the GPIO API
> > + *
> > + * Copyright (C) 2006 Atmel Corporation
> 
> I'm told we're in year 2007 ;)

I'm also told that copyright protection lasts infinitely long in
practice ;)

I'll update it. I probably just copied it blindly from a different
driver.

> > +int i2c_gpio_getsda(void *data)
> > +{
> > +   struct i2c_gpio_platform_data *pdata = data;
> > +
> > +   return gpio_get_value(pdata->sda_pin);
> > +}
> 
> 
> What value will you get if the SDA pin is open-drain and currently in
> output mode? Are such GPIO pins actually able to detect that the pin is
> low while they are not themselves driving it low?

I guess that depends on the GPIO controller. But being able to read the
pin state even when the pin is configured as an output is a
prerequisite for using this driver with "open drain" pins, so if the
hardware doesn't support this, the board code should just set
{sda,scl}_is_opendrain to zero.

> > +   bit_data->udelay= 5,/* 100 kHz */
> 
> Actually, no, i2c-algo-bit has a 1/3-2/3 duty cycle, so a complete
> cycle is 3 times the udelay value. So udelay=5 gives you 66 kHz. If
> someone wants to fix that...

Ok. I guess we should move this parameter into the platform data struct
anyway.

> Also, I wouldn't recommend such a low value when SCL cannot be sensed,
> if a slave stretches the line even very briefly, you won't notice and
> havoc will ensue. udelay=50 sounds more reasonable for such half-baked
> busses.

Makes sense.

> > +   ret = platform_driver_probe(_gpio_driver, i2c_gpio_probe);
> > +   if (ret)
> > +   printk("i2c-gpio: probe failed: %d\n", ret);
> 
> Add KERN_ERR or similar.

Will do.

> Would you mind also adding yourself to MAINTAINERS for this driver? I
> would appreciate it.

Sure. I'm hoping this driver won't cause that much maintenance overhead
anyway since all the complicated stuff is in i2c-algo-bit. But I agree
it needs a maintainer.

Haavard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Theodore Tso

On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote:
> On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
> > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
> > > Ingo Molnar <[EMAIL PROTECTED]> writes:
> > > > 
> > > > the only correct approach is the use of hrtimers, and a patch exists 
> > > > for 
> > > > that - see below. This has been included in -rt for quite some time.
> > > 
> > > But isn't that bad for power management? You'll likely get more
> > > idle wakeups, won't you?
> > 
> > Why so ? It comes more precise, but only once.
> 
> When it's clustered around the jiffies interval then wakeups from
> multiple processes will be somewhat batched. With a precise wakeup you'll
> get wakeups all over the jiffies period, won't you?

What we probably need in the long-term, and not just for high
precision wakeups, is we need a way for waiters (either in the kernel
or in userspace) to specify a desired precision in their timers.  Is
it, "wake me up in a second, exactly", or "wake me up in a second,
plus or minus 10ms"?   (or 50ms?  or 100ms?).

This becomes especially important if we want the tickless code to
really shine as far as power management is concerned.  Unfortunately,
the POSIX timer abstraction doesn't give this kind of flexibility
easily, so it's going to be a while before we see significant
userspace adoption of such a kernel feature, but I think it's
something that would be still worthwhile to add.

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/3] Add ability to keep track of callers of symbol_(get|put)

2007-03-12 Thread Trent Piepho

On Sun, 11 Mar 2007, Andrew Morton wrote:
> > On Sat, 10 Mar 2007 02:31:35 -0200 Mauro Carvalho Chehab <[EMAIL 
> > PROTECTED]> wrote:
> > From: Trent Piepho <[EMAIL PROTECTED]>
> >
> > When a module uses symbol_get() to increase the ref count of another
> > module, there is no record what module called symbol_get().  A module
> > can
> > show up as having other users, but there is no way to tell who those
> > users are.
> >
> > This adds that ability to symbol_put() and symbol_get().
>
> One day I'll write a script which unwordwraps patches and then you'll all
> need to find new ways of torturing me.
>
> This patch needed rather a lot of help in the coding-style department.
> Hopefully Rusty can comment on the content, because I'm all exhausted from
> cleaning it up.

New version attached.  Coding-style should be fixed, and hopefully this
will not be word-wrapped.

There were some bugs with NULL as the user in symbol_(get|put), that should
be fixed now.

I've added an error message for when a module tries to symbol_put() a
module that it's not using.  This should keep the putted module's ref count
from being set to -1, which is what happens now.  It should also make it a
lot easier to track down where the extra symbol_put()s are comming from.From: Trent Piepho <[EMAIL PROTECTED]>

Add ability to keep track of callers of symbol_(get|put)

When a module uses symbol_get() to increase the ref count of another
module, there is no record what module called symbol_get().  A module can
show up as having other users, but there is no way to tell who those
users are.

This adds that ability to symbol_put() and symbol_get().

__symbol_get() and __symbol_put() gain another parameter, which specifies
the module that is doing the getting or putting.  symbol_put_addr() is
renamed to __symbol_put_addr() and has the same parameter added.  The
module can be NULL, in which case the symbol's owner's refcount is
incremented without recording who did it, as was the case before.

The macros symbol_get(), symbol_put(), and symbol_put_addr() will use
THIS_MODULE as the getter/putter and so don't have an extra parameter.  A
macro symbol_put_user() is added that allows specifying the putting
module.

The module_use structure that keeps track of one module's use of another
gains a count member.  The module_use will not go away until the count
goes down to zero.  The count wasn't necessary before because a module
could only use another module once, when the module was linked in, and
un-use that module once, when it was unloaded.

When a module calls symbol_get() to get a symbol from module that owns
the symbol, the ref count of the owning module is _not_ incremented if
the getting module was already listed as using the owning module.
Rather, the count of that module_use is incremented.

When a module is loaded and the kernel module linker is resolving
symbols, it will not increment the module_use count for each symbol used,
but will just leave it at one.  We don't count each symbol resolved,
because during module unloading we wouldn't know how many times to
decrement the module_use count.

When the module is unloaded, the module_use count will only be
decremented by one, which should bring it to zero.  If it's not zero,
then the remaining count is the number of symbol_get()s the module did
that were unmatched with a symbol_put().

Signed-off-by: Trent Piepho <[EMAIL PROTECTED]>

diff --git a/include/linux/module.h b/include/linux/module.h
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -169,9 +169,10 @@ struct notifier_block;
 #ifdef CONFIG_MODULES

 /* Get/put a kernel symbol (calls must be symmetric) */
-void *__symbol_get(const char *symbol);
+void *__symbol_get(const char *symbol, struct module *user);
 void *__symbol_get_gpl(const char *symbol);
-#define symbol_get(x) ((typeof())(__symbol_get(MODULE_SYMBOL_PREFIX #x)))
+#define symbol_get(x) ((typeof())(__symbol_get(MODULE_SYMBOL_PREFIX #x, \
+   THIS_MODULE)))

 #ifndef __GENKSYMS__
 #ifdef CONFIG_MODVERSIONS
@@ -388,9 +389,11 @@ extern void __module_put_and_exit(struct

 #ifdef CONFIG_MODULE_UNLOAD
 unsigned int module_refcount(struct module *mod);
-void __symbol_put(const char *symbol);
-#define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x)
-void symbol_put_addr(void *addr);
+void __symbol_put(const char *symbol, struct module *user);
+#define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x, THIS_MODULE)
+#define symbol_put_user(x,u) __symbol_put(MODULE_SYMBOL_PREFIX #x, (u))
+void __symbol_put_addr(void *addr, struct module *user);
+#define symbol_put_addr(x) __symbol_put_addr((x), THIS_MODULE)

 /* Sometimes we know we already have a refcount, and it's easier not
to handle the error case (which only happens with rmmod --wait). */
diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -516,30 +516,54 @@ struct module_use
 {
struct list_head list;
struct module

Re: [patch 4/4] [TULIP] Rev tulip version

2007-03-12 Thread Jeff Garzik


Pekka Enberg wrote:

Hi,

On 3/12/07, Valerie Henson <[EMAIL PROTECTED]> wrote:

--- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c
+++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c
@@ -17,11 +17,11 @@

 #define DRV_NAME   "tulip"
 #ifdef CONFIG_TULIP_NAPI
-#define DRV_VERSION"1.1.14-NAPI" /* Keep at least for test */
+#define DRV_VERSION"1.1.15-NAPI" /* Keep at least for test */
 #else
-#define DRV_VERSION"1.1.14"
+#define DRV_VERSION"1.1.15"
 #endif
-#define DRV_RELDATE"May 11, 2002"
+#define DRV_RELDATE"Feb 27, 2007"


Why not just drop this? What purpose does a per-module revision have
for in-kernel drivers anyway?


It's the maintainer's call.  Sometimes it eases parsing bug reports, and 
tracking changes as your drivers get backported to various enterprise 
operating systems(tm).  Sometimes it just gets in the way.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Wed, Mar 07, 2007 at 03:59:19PM -0600, Serge E. Hallyn wrote:
> > containers patches uses just a single pointer in the task_struct, and
> > all tasks in the same set of containers (across all hierarchies) will
> > share a single container_group object, which holds the actual pointers
> > to container state.
> 
> Yes, that's why this consolidation doesn't make sense to me.
> 
> Especially considering again that we will now have nsproxies pointing to
> containers pointing to... nsproxies.

nsproxies needn't point to containers. It (or as Herbert pointed -
nsproxy->pid_ns) can have direct pointers to resource objects (whatever
struct container->subsys[] points to).


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [ck] Re: RSDL v0.30 cpu scheduler for ... 2.6.18.8 kernel

2007-03-12 Thread Fortier,Vincent [Montreal]

> On Monday 12 March 2007 22:21, Michael Gerdau wrote:
> > > > And here are the backported RSDL 0.30 patches in case any of you

> > > > would still be running an older 2.6.18.8 kernel ...
> >
> > The original mail doesn't seem to have made it to the ck-list.
> >
> > Where would I find the backported RSDL 0.30 patches for 2.6.18.8 ?
> 
> It might have been filtered due to containing attachments, or 
> because the author wasn't a member of the mailing list, sorry.
> 

I've found a place to put theses patches to:
http://linux-dev.qc.ec.gc.ca/kernel/rsdl/2.6.18.8-rsdl-0.30.patch
http://linux-dev.qc.ec.gc.ca/kernel/rsdl/2.6.18.8-rsdl-0.29-0.30.patch


There is also a Debian Etch x86_64 2.6.18-rsdl-0.30 kernel available at:
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/linux-headers-2.6
.18.8-rsdl-0.30-amd64-envcan_2.6.18.8-rsdl-0.30-003_amd64.deb
http://linux-dev.qc.ec.gc.ca/kernel/debian/etch/x86_64/linux-image-2.6.1
8.8-rsdl-0.30-amd64-envcan_2.6.18.8-rsdl-0.30-003_amd64.deb


Hope this helps!

-vin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [SOUND] hda_intel: build fix

2007-03-12 Thread Ralf Baechle

On Mon, Mar 12, 2007 at 12:04:30PM +0100, Takashi Iwai wrote:

> It's no big problem to remove const in these cases, but allowing const
> with __devinitdata seems the right fix to me...

Gccs derives the readability of a section used with __attribute(section())
from the first use, which in case of this driver was a non-const use, so
gcc made .init.data a r/w section.  Later uses were marked with const,
so did conflict.  Having to ensure that all members of a section are const
or are not const is painful, so this is clearly less than desirable
behaviour on gcc's side.  I think gcc picking the most permissive
attributes for a section, that is r/w in this case would be far preferable.

Here is a small test case btw:

int foo __attribute__ ((__section__ (".init.data"))) = 23;
const int bar __attribute__ ((__section__ (".init.data"))) = 42;

Now I'm not a great fan of the patch I've posted but it reflects what real
world gcc is doing so for the time being I don't see much of a chance to
The Right Thing (TM).  And the gain from const in this case will be small
anyway.

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Style Question

2007-03-12 Thread Cong WANG


2007/3/12, David Schwartz <[EMAIL PROTECTED]>:


> NULL has the same bit pattern as the number zero. (I'm not saying the bit
> pattern is all zeroes. And I am not even sure if NULL ought to
> have the same
> pattern as zero.) So C++ could use (void *)0, if it would let itself :p

They don't have to have the same bit pattern. There's no logical reason a
NULL pointer couldn't have all bits set and the number zero have all bits
cleared.

Casts are perrmited to change the bit pattern. For example '(float) 7' can
result in a different bit pattern than '7' and similarly '(void *) 0' can
result in a different bit pattern from '0'.

As a trivial example, consider an LP64 system. NULL will have the bit
pattern of 64 zero bits, while '0' will have the bit pattern of 32 zero
bits.

DS


I agree. C99 standard just says:

6.3.2.3 Pointers

"3  An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer constant. If
a null pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Please pull git390 'for-linus' branch

2007-03-12 Thread Martin Schwidefsky

Please pull from 'for-linus' branch of

git://git390.osdl.marist.edu/pub/scm/linux-2.6.git for-linus

to receive the following updates:

 arch/s390/kernel/compat_wrapper.S |   17 +
 arch/s390/kernel/debug.c  |2 +-
 arch/s390/kernel/early.c  |   10 --
 arch/s390/kernel/ipl.c|9 -
 arch/s390/kernel/syscalls.S   |3 ++-
 drivers/s390/cio/qdio.c   |   26 ++
 include/asm-s390/ipl.h|5 +
 include/asm-s390/unistd.h |3 ++-
 8 files changed, 57 insertions(+), 18 deletions(-)

Heiko Carstens (3):
  [S390] memory detection: fix off by one bug.
  [S390] Wire up compat_sys_epoll_pwait.
  [S390] Wire up sys_utimes.

Jean Delvare (1):
  [S390] strlcpy is smart enough

Michael Holzheu (1):
  [S390] reboot from and dump to SCSI under z/VM fails.

Ursula Braun (1):
  [S390] cio: qdio slsb setup

diff --git a/arch/s390/kernel/compat_wrapper.S 
b/arch/s390/kernel/compat_wrapper.S
index 9790129..32a69a1 100644
--- a/arch/s390/kernel/compat_wrapper.S
+++ b/arch/s390/kernel/compat_wrapper.S
@@ -1665,3 +1665,20 @@ sys_getcpu_wrapper:
llgtr   %r3,%r3 # unsigned *
llgtr   %r4,%r4 # struct getcpu_cache *
jg  sys_getcpu
+
+   .globl  compat_sys_epoll_pwait_wrapper
+compat_sys_epoll_pwait_wrapper:
+   lgfr%r2,%r2 # int
+   llgtr   %r3,%r3 # struct compat_epoll_event *
+   lgfr%r4,%r4 # int
+   lgfr%r5,%r5 # int
+   llgtr   %r6,%r6 # compat_sigset_t *
+   llgf%r0,164(%r15)   # compat_size_t
+   stg %r0,160(%r15)
+   jg  compat_sys_epoll_pwait
+
+   .globl  compat_sys_utimes_wrapper
+compat_sys_utimes_wrapper:
+   llgtr   %r2,%r2 # char *
+   llgtr   %r3,%r3 # struct compat_timeval *
+   jg  compat_sys_utimes
diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c
index eca3fe5..dca6eaf 100644
--- a/arch/s390/kernel/debug.c
+++ b/arch/s390/kernel/debug.c
@@ -268,7 +268,7 @@ debug_info_alloc(char *name, int pages_per_area, int 
nr_areas, int buf_size,
rc->level  = level;
rc->buf_size   = buf_size;
rc->entry_size = sizeof(debug_entry_t) + buf_size;
-   strlcpy(rc->name, name, sizeof(rc->name)-1);
+   strlcpy(rc->name, name, sizeof(rc->name));
memset(rc->views, 0, DEBUG_MAX_VIEWS * sizeof(struct debug_view *));
memset(rc->debugfs_entries, 0 ,DEBUG_MAX_VIEWS *
sizeof(struct dentry*));
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index afca1c6..5e47936 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -141,9 +141,9 @@ static noinline __init void detect_machine_type(void)
machine_flags |= 4;
 }
 
+#ifdef CONFIG_64BIT
 static noinline __init int memory_fast_detect(void)
 {
-
unsigned long val0 = 0;
unsigned long val1 = 0xc;
int ret = -ENOSYS;
@@ -161,9 +161,15 @@ static noinline __init int memory_fast_detect(void)
if (ret || val0 != val1)
return -ENOSYS;
 
-   memory_chunk[0].size = val0;
+   memory_chunk[0].size = val0 + 1;
return 0;
 }
+#else
+static inline int memory_fast_detect(void)
+{
+   return -ENOSYS;
+}
+#endif
 
 #define ADDR2G (1UL << 31)
 
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index d125a4e..f731185 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -839,7 +839,7 @@ static int __init reipl_ccw_init(void)
}
reipl_block_ccw->hdr.len = IPL_PARM_BLK_CCW_LEN;
reipl_block_ccw->hdr.version = IPL_PARM_BLOCK_VERSION;
-   reipl_block_ccw->hdr.blk0_len = sizeof(reipl_block_ccw->ipl_info.ccw);
+   reipl_block_ccw->hdr.blk0_len = IPL_PARM_BLK0_CCW_LEN;
reipl_block_ccw->hdr.pbt = DIAG308_IPL_TYPE_CCW;
/* check if read scp info worked and set loadparm */
if (SCCB_VALID)
@@ -880,8 +880,7 @@ static int __init reipl_fcp_init(void)
} else {
reipl_block_fcp->hdr.len = IPL_PARM_BLK_FCP_LEN;
reipl_block_fcp->hdr.version = IPL_PARM_BLOCK_VERSION;
-   reipl_block_fcp->hdr.blk0_len =
-   sizeof(reipl_block_fcp->ipl_info.fcp);
+   reipl_block_fcp->hdr.blk0_len = IPL_PARM_BLK0_FCP_LEN;
reipl_block_fcp->hdr.pbt = DIAG308_IPL_TYPE_FCP;
reipl_block_fcp->ipl_info.fcp.opt = DIAG308_IPL_OPT_IPL;
}
@@ -930,7 +929,7 @@ static int __init dump_ccw_init(void)
}
dump_block_ccw->hdr.len = IPL_PARM_BLK_CCW_LEN;
dump_block_ccw->hdr.version = IPL_PARM_BLOCK_VERSION;
-   dump_block_ccw->hdr.blk0_len = sizeof(reipl_block_ccw->ipl_info.ccw);
+   dump_block_ccw->hdr.blk0_len =

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Alistair John Strachan

On Monday 12 March 2007 13:25, Frank van Maarseveen wrote:
[snip]
> So, are /dev/hd* going to disappear in a few years? iow, does it make
> sense to _slowly_ start to migrate to /dev/sd*?

How would you propose doing this? I'm sure modern distros with an 
initrd/initramfs probably already do some sort of root detection. Doesn't fix 
the fstab issue, but I suppose this could be auto-generated too.

> The problem is there's no plan B in case of any troubles except rename
> everything back again to boot an old kernel.

I doubt this matters for distributors, as they'll simply switch over when you 
upgrade the distro, and the earliest supported kernel will be the one that 
shipped with the newer version.

I accept that it's a bit of a drag, but it's better to have a standard naming 
convention for all disks, isn't it?

Glad this is working for you.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Fri, Mar 09, 2007 at 02:06:03PM -0800, Paul Jackson wrote:
> >  if you create a 'resource container' to limit the
> >  usage of a set of resources for the processes
> >  belonging to this container, it would be kind of
> >  defeating the purpose, if you'd allow the processes
> >  to manipulate their limits, no?
> 
> Wrong - this is not the only way.
> 
> For instance in cpusets, -any- task in the system, regardless of what
> cpuset it is currently assigned to, might be able to manipulate -any-
> cpuset in the system.
> 
> Yes -- some sufficient mechanism is required to keep tasks from
> escalating their resources or capabilities beyond an allowed point.
> 
> But that mechanism might not be strictly based on position in some
> hierarchy.
> 
> In the case of cpusets, it is based on the permissions on files in
> the cpuset file system (normally mounted at /dev/cpuset), versus
> the current priviledges and capabilities of the task.
> 
> A root priviledged task in the smallest leaf node cpuset can manipulate
> every cpuset in the system.  This is an ordinary and common occurrence.

This assumes that you can see the global vfs namespace right?

What if you are inside a container/vserver which restricts your vfs
namespace? i.e /dev/cpusets seen from one container is not same as what
is seen from another container ..Is that a unrealistic scenario? IMHO
not so. This in-fact lets vservers and containers to work with each
other. So:

/dev/cpuset
|- C1   <- Container A bound to this
|  |- C11
|  |- C12
|
|- C2   <- Container B bound to this
   |- C21
   |- C22


C1 and C2 are two exclusive cpusets and containers/vservers A and B are bound 
to C1/C2 respectively.

>From inside container/vserver A, if you were to look at /dev/cpuset, it will
-appear- as if you are in the top cpuset (with just C11 and C12 child
cpusets). It cannot modify C2 at all (since it has no visibility).

Similarly if you were to look at /dev/cpuset from inside B, it will list only 
C21/C22 with tasks in container B not being able to see C1 at all.

:)


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!)

2007-03-12 Thread Ingo Molnar


* Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:

> > could you turn on CONFIG_SLAB_DEBUG as well?
> > 
> > that should catch certain types of use-after-free accesses, and 
> > lockdep will also warn if a still locked object is freed.
> 
> Hmm, no, this does not look like use-after-free. I enabled 
> CONFIG_SLAB_DEBUG, and I still see the same message, so the memory was 
> not overwritten by slab debugger.

that's still not conclusive - the memory might not have been allocated 
by slab again to detect it. Your magic-number check definitely shows 
some sort of corruption going on, right?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[S390] memory detection: fix off by one bug.

2007-03-12 Thread Martin Schwidefsky

From: Heiko Carstens <[EMAIL PROTECTED]>

[S390] memory detection: fix off by one bug.

diag 260 returns the address of the last addressable byte and not the
size of memory. Since we want the size we have to add 1 to the return
value.
Disable diag 260 for non z/Arch mode since it doesn't work there
anyway.

Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>
Signed-off-by: Martin Schwidefsky <[EMAIL PROTECTED]>
---

 arch/s390/kernel/early.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff -urpN linux-2.6/arch/s390/kernel/early.c 
linux-2.6-patched/arch/s390/kernel/early.c
--- linux-2.6/arch/s390/kernel/early.c  2007-03-12 13:52:25.0 +0100
+++ linux-2.6-patched/arch/s390/kernel/early.c  2007-03-12 13:52:58.0 
+0100
@@ -141,9 +141,9 @@ static noinline __init void detect_machi
machine_flags |= 4;
 }
 
+#ifdef CONFIG_64BIT
 static noinline __init int memory_fast_detect(void)
 {
-
unsigned long val0 = 0;
unsigned long val1 = 0xc;
int ret = -ENOSYS;
@@ -161,9 +161,15 @@ static noinline __init int memory_fast_d
if (ret || val0 != val1)
return -ENOSYS;
 
-   memory_chunk[0].size = val0;
+   memory_chunk[0].size = val0 + 1;
return 0;
 }
+#else
+static inline int memory_fast_detect(void)
+{
+   return -ENOSYS;
+}
+#endif
 
 #define ADDR2G (1UL << 31)
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[S390] Wire up compat_sys_epoll_pwait.

2007-03-12 Thread Martin Schwidefsky

From: Heiko Carstens <[EMAIL PROTECTED]>

[S390] Wire up compat_sys_epoll_pwait.

Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>
Signed-off-by: Martin Schwidefsky <[EMAIL PROTECTED]>
---

 arch/s390/kernel/compat_wrapper.S |   11 +++
 arch/s390/kernel/syscalls.S   |2 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff -urpN linux-2.6/arch/s390/kernel/compat_wrapper.S 
linux-2.6-patched/arch/s390/kernel/compat_wrapper.S
--- linux-2.6/arch/s390/kernel/compat_wrapper.S 2007-03-12 13:52:25.0 
+0100
+++ linux-2.6-patched/arch/s390/kernel/compat_wrapper.S 2007-03-12 
13:53:00.0 +0100
@@ -1665,3 +1665,14 @@ sys_getcpu_wrapper:
llgtr   %r3,%r3 # unsigned *
llgtr   %r4,%r4 # struct getcpu_cache *
jg  sys_getcpu
+
+   .globl  compat_sys_epoll_pwait_wrapper
+compat_sys_epoll_pwait_wrapper:
+   lgfr%r2,%r2 # int
+   llgtr   %r3,%r3 # struct compat_epoll_event *
+   lgfr%r4,%r4 # int
+   lgfr%r5,%r5 # int
+   llgtr   %r6,%r6 # compat_sigset_t *
+   llgf%r0,164(%r15)   # compat_size_t
+   stg %r0,160(%r15)
+   jg  compat_sys_epoll_pwait
diff -urpN linux-2.6/arch/s390/kernel/syscalls.S 
linux-2.6-patched/arch/s390/kernel/syscalls.S
--- linux-2.6/arch/s390/kernel/syscalls.S   2007-03-12 13:52:25.0 
+0100
+++ linux-2.6-patched/arch/s390/kernel/syscalls.S   2007-03-12 
13:53:00.0 +0100
@@ -320,4 +320,4 @@ SYSCALL(sys_tee,sys_tee,sys_tee_wrapper)
 SYSCALL(sys_vmsplice,sys_vmsplice,compat_sys_vmsplice_wrapper)
 NI_SYSCALL /* 310 
sys_move_pages */
 SYSCALL(sys_getcpu,sys_getcpu,sys_getcpu_wrapper)
-SYSCALL(sys_epoll_pwait,sys_epoll_pwait,sys_ni_syscall)
+SYSCALL(sys_epoll_pwait,sys_epoll_pwait,compat_sys_epoll_pwait_wrapper)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[S390] reboot from and dump to SCSI under z/VM fails.

2007-03-12 Thread Martin Schwidefsky

From: Michael Holzheu <[EMAIL PROTECTED]>

[S390] reboot from and dump to SCSI under z/VM fails.

We used wrong length values for ipl and dump hardware structures.
Since z/VM checks the ipl parameters more accurately than LPAR,
the operations fail there.

Signed-off-by: Michael Holzheu <[EMAIL PROTECTED]>
Signed-off-by: Martin Schwidefsky <[EMAIL PROTECTED]>
---

 arch/s390/kernel/ipl.c |9 -
 include/asm-s390/ipl.h |5 +
 2 files changed, 9 insertions(+), 5 deletions(-)

diff -urpN linux-2.6/arch/s390/kernel/ipl.c 
linux-2.6-patched/arch/s390/kernel/ipl.c
--- linux-2.6/arch/s390/kernel/ipl.c2007-03-12 13:52:25.0 +0100
+++ linux-2.6-patched/arch/s390/kernel/ipl.c2007-03-12 13:53:01.0 
+0100
@@ -839,7 +839,7 @@ static int __init reipl_ccw_init(void)
}
reipl_block_ccw->hdr.len = IPL_PARM_BLK_CCW_LEN;
reipl_block_ccw->hdr.version = IPL_PARM_BLOCK_VERSION;
-   reipl_block_ccw->hdr.blk0_len = sizeof(reipl_block_ccw->ipl_info.ccw);
+   reipl_block_ccw->hdr.blk0_len = IPL_PARM_BLK0_CCW_LEN;
reipl_block_ccw->hdr.pbt = DIAG308_IPL_TYPE_CCW;
/* check if read scp info worked and set loadparm */
if (SCCB_VALID)
@@ -880,8 +880,7 @@ static int __init reipl_fcp_init(void)
} else {
reipl_block_fcp->hdr.len = IPL_PARM_BLK_FCP_LEN;
reipl_block_fcp->hdr.version = IPL_PARM_BLOCK_VERSION;
-   reipl_block_fcp->hdr.blk0_len =
-   sizeof(reipl_block_fcp->ipl_info.fcp);
+   reipl_block_fcp->hdr.blk0_len = IPL_PARM_BLK0_FCP_LEN;
reipl_block_fcp->hdr.pbt = DIAG308_IPL_TYPE_FCP;
reipl_block_fcp->ipl_info.fcp.opt = DIAG308_IPL_OPT_IPL;
}
@@ -930,7 +929,7 @@ static int __init dump_ccw_init(void)
}
dump_block_ccw->hdr.len = IPL_PARM_BLK_CCW_LEN;
dump_block_ccw->hdr.version = IPL_PARM_BLOCK_VERSION;
-   dump_block_ccw->hdr.blk0_len = sizeof(reipl_block_ccw->ipl_info.ccw);
+   dump_block_ccw->hdr.blk0_len = IPL_PARM_BLK0_CCW_LEN;
dump_block_ccw->hdr.pbt = DIAG308_IPL_TYPE_CCW;
dump_capabilities |= IPL_TYPE_CCW;
return 0;
@@ -954,7 +953,7 @@ static int __init dump_fcp_init(void)
}
dump_block_fcp->hdr.len = IPL_PARM_BLK_FCP_LEN;
dump_block_fcp->hdr.version = IPL_PARM_BLOCK_VERSION;
-   dump_block_fcp->hdr.blk0_len = sizeof(dump_block_fcp->ipl_info.fcp);
+   dump_block_fcp->hdr.blk0_len = IPL_PARM_BLK0_FCP_LEN;
dump_block_fcp->hdr.pbt = DIAG308_IPL_TYPE_FCP;
dump_block_fcp->ipl_info.fcp.opt = DIAG308_IPL_OPT_DUMP;
dump_capabilities |= IPL_TYPE_FCP;
diff -urpN linux-2.6/include/asm-s390/ipl.h 
linux-2.6-patched/include/asm-s390/ipl.h
--- linux-2.6/include/asm-s390/ipl.h2007-03-12 13:52:45.0 +0100
+++ linux-2.6-patched/include/asm-s390/ipl.h2007-03-12 13:53:01.0 
+0100
@@ -14,9 +14,13 @@
 #define IPL_PARM_BLK_FCP_LEN (sizeof(struct ipl_list_hdr) + \
  sizeof(struct ipl_block_fcp))
 
+#define IPL_PARM_BLK0_FCP_LEN (sizeof(struct ipl_block_fcp) + 8)
+
 #define IPL_PARM_BLK_CCW_LEN (sizeof(struct ipl_list_hdr) + \
  sizeof(struct ipl_block_ccw))
 
+#define IPL_PARM_BLK0_CCW_LEN (sizeof(struct ipl_block_ccw) + 8)
+
 #define IPL_MAX_SUPPORTED_VERSION (0)
 
 #define IPL_PARMBLOCK_START((struct ipl_parameter_block *) \
@@ -58,6 +62,7 @@ struct ipl_block_ccw {
u8  vm_flags;
u8  reserved3[3];
u32 vm_parm_len;
+   u8  reserved4[80];
 } __attribute__((packed));
 
 struct ipl_parameter_block {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[S390] Wire up sys_utimes.

2007-03-12 Thread Martin Schwidefsky

From: Heiko Carstens <[EMAIL PROTECTED]>

[S390] Wire up sys_utimes.

Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>
Signed-off-by: Martin Schwidefsky <[EMAIL PROTECTED]>
---

 arch/s390/kernel/compat_wrapper.S |6 ++
 arch/s390/kernel/syscalls.S   |1 +
 include/asm-s390/unistd.h |3 ++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff -urpN linux-2.6/arch/s390/kernel/compat_wrapper.S 
linux-2.6-patched/arch/s390/kernel/compat_wrapper.S
--- linux-2.6/arch/s390/kernel/compat_wrapper.S 2007-03-12 13:53:01.0 
+0100
+++ linux-2.6-patched/arch/s390/kernel/compat_wrapper.S 2007-03-12 
13:53:02.0 +0100
@@ -1676,3 +1676,9 @@ compat_sys_epoll_pwait_wrapper:
llgf%r0,164(%r15)   # compat_size_t
stg %r0,160(%r15)
jg  compat_sys_epoll_pwait
+
+   .globl  compat_sys_utimes_wrapper
+compat_sys_utimes_wrapper:
+   llgtr   %r2,%r2 # char *
+   llgtr   %r3,%r3 # struct compat_timeval *
+   jg  compat_sys_utimes
diff -urpN linux-2.6/arch/s390/kernel/syscalls.S 
linux-2.6-patched/arch/s390/kernel/syscalls.S
--- linux-2.6/arch/s390/kernel/syscalls.S   2007-03-12 13:53:01.0 
+0100
+++ linux-2.6-patched/arch/s390/kernel/syscalls.S   2007-03-12 
13:53:02.0 +0100
@@ -321,3 +321,4 @@ SYSCALL(sys_vmsplice,sys_vmsplice,compat
 NI_SYSCALL /* 310 
sys_move_pages */
 SYSCALL(sys_getcpu,sys_getcpu,sys_getcpu_wrapper)
 SYSCALL(sys_epoll_pwait,sys_epoll_pwait,compat_sys_epoll_pwait_wrapper)
+SYSCALL(sys_utimes,sys_utimes,compat_sys_utimes_wrapper)
diff -urpN linux-2.6/include/asm-s390/unistd.h 
linux-2.6-patched/include/asm-s390/unistd.h
--- linux-2.6/include/asm-s390/unistd.h 2007-02-04 19:44:54.0 +0100
+++ linux-2.6-patched/include/asm-s390/unistd.h 2007-03-12 13:53:02.0 
+0100
@@ -250,8 +250,9 @@
 /* Number 310 is reserved for new sys_move_pages */
 #define __NR_getcpu311
 #define __NR_epoll_pwait   312
+#define __NR_utimes313
 
-#define NR_syscalls 313
+#define NR_syscalls 314
 
 /* 
  * There are some system calls that are not present on 64 bit, some
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Theodore Tso

On Mon, Mar 12, 2007 at 10:23:06PM +1100, Con Kolivas wrote:
> > > > We are getting good interactive response with a fair scheduler yet
> > > > you seem intent on overloading it to find fault with it.
> > >
> > > I'm not trying to find fault, I'm TESTING AND REPORTING.  Was.
> >
> > Con, could you please take Mike's report of this regression seriously
> > and address it? Thanks,
> 
> Sure. 
> 
> Mike the cpu is being proportioned out perfectly according to fairness as I 
> mentioned in the prior email, yet X is getting the lower latency scheduling. 
> I'm not sure within the bounds of fairness what more would you have happen 
> to your liking with this test case?

Con,

I think what we're discovering is that a "fair scheduler" is
not going to cut it.  After all, running X and ripping CD's and MP3
encoding them is not exactly an esoteric use case.  And like it or
not, "nice" defaults to 4.

I suspect Mike is right; the only way to deal with this
regression is some scheduler hints from the desktop subsystem (i.e., X
and friends).  Yes, X is broken, it's horrible, yadda, yadda, yadda.
It's also what everyone is using, and it's a fact of life.  Just like
we occasionally have had to work around ISA braindamage, and x86
architecture braindamage, and ACPI braindamage all inflicted on us by
Intel.  This is just life, and sometimes the clean, elegant solution
is not enough.

Regards,

- Ted

P.S.  The other solution that might perhaps work is that we need to
change the meaning of what the nice value does.  If we consider "nice"
to be the scheduler hint (from the other direction), then maybe any
niced process should only run a very tiny amount if there are any
non-nice processes ready to run, and that the relative nice values are
used when two niced processes are competing for the CPU.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[S390] cio: qdio slsb setup

2007-03-12 Thread Martin Schwidefsky

From: Ursula Braun <[EMAIL PROTECTED]>

[S390] cio: qdio slsb setup

Make sure set_slsb problems are handled correctly in
qdio_do_qdio_fill_input() and qdio_do_qdio_fill_output.

Signed-off-by: Ursula Braun <[EMAIL PROTECTED]>
Signed-off-by: Martin Schwidefsky <[EMAIL PROTECTED]>
---

 drivers/s390/cio/qdio.c |   26 ++
 1 files changed, 18 insertions(+), 8 deletions(-)

diff -urpN linux-2.6/drivers/s390/cio/qdio.c 
linux-2.6-patched/drivers/s390/cio/qdio.c
--- linux-2.6/drivers/s390/cio/qdio.c   2007-03-12 13:52:36.0 +0100
+++ linux-2.6-patched/drivers/s390/cio/qdio.c   2007-03-12 13:52:56.0 
+0100
@@ -210,9 +210,11 @@ again:
goto again;
}
if (rc < 0) {
-QDIO_DBF_TEXT3(1,trace,"sqberr");
-sprintf(dbf_text,"%2x,%2x,%d,%d",tmp_cnt,*cnt,ccq,q_no);
-QDIO_DBF_TEXT3(1,trace,dbf_text);
+   QDIO_DBF_TEXT3(1,trace,"sqberr");
+   sprintf(dbf_text,"%2x,%2x",tmp_cnt,*cnt);
+   QDIO_DBF_TEXT3(1,trace,dbf_text);
+   sprintf(dbf_text,"%d,%d",ccq,q_no);
+   QDIO_DBF_TEXT3(1,trace,dbf_text);
q->handler(q->cdev,QDIO_STATUS_ACTIVATE_CHECK_CONDITION|
QDIO_STATUS_LOOK_FOR_ERROR,
0, 0, 0, -1, -1, q->int_parm);
@@ -1250,7 +1252,6 @@ qdio_is_inbound_q_done(struct qdio_q *q)
if (!no_used) {
QDIO_DBF_TEXT4(0,trace,"inqisdnA");
QDIO_DBF_HEX4(0,trace,,sizeof(void*));
-   QDIO_DBF_TEXT4(0,trace,dbf_text);
return 1;
}
if (irq->is_qebsm) {
@@ -3371,10 +3372,15 @@ qdio_do_qdio_fill_input(struct qdio_q *q
unsigned int count, struct qdio_buffer *buffers)
 {
struct qdio_irq *irq = (struct qdio_irq *) q->irq_ptr;
+   int tmp = 0;
+
qidx &= (QDIO_MAX_BUFFERS_PER_Q - 1);
if (irq->is_qebsm) {
-   while (count)
-   set_slsb(q, , SLSB_CU_INPUT_EMPTY, );
+   while (count) {
+   tmp = set_slsb(q, , SLSB_CU_INPUT_EMPTY, );
+   if (!tmp)
+   return;
+   }
return;
}
for (;;) {
@@ -3390,11 +3396,15 @@ qdio_do_qdio_fill_output(struct qdio_q *
 unsigned int count, struct qdio_buffer *buffers)
 {
struct qdio_irq *irq = (struct qdio_irq *) q->irq_ptr;
+   int tmp = 0;
 
qidx &= (QDIO_MAX_BUFFERS_PER_Q - 1);
if (irq->is_qebsm) {
-   while (count)
-   set_slsb(q, , SLSB_CU_OUTPUT_PRIMED, );
+   while (count) {
+   tmp = set_slsb(q, , SLSB_CU_OUTPUT_PRIMED, );
+   if (!tmp)
+   return;
+   }
return;
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [discuss] [PATCH 4/4 TRY#3] optimize and simplify get_cycles_sync()

2007-03-12 Thread Joerg Roedel

On Mon, Mar 12, 2007 at 02:29:43PM +0100, Michael Matz wrote:
> Hi Joerg,
> 
> On Mon, 12 Mar 2007, Joerg Roedel wrote:
> 
> > > >+#define RDTSCP ".byte 0x0f, 0x01, 0xf9"
> > > >+alternative_io_two("cpuid\nrdtsc",
> > > >+   "rdtsc", X86_FEATURE_SYNC_RDTSC,
> > > >+   ".byte 0x0f, 0x01, 0xf9", X86_FEATURE_RDTSCP,
> > > >  
> > > 
> > > why not use the RDTSCP macro here?
> > 
> > Does this macro exist?
> 
> Look carefully at your patch again, or at least the four quoted lines 
> above.  You've added it yourself, in exactly the form you'd need in the 
> alternative_io_two() call :-)

Hmmkay, thanks for opening my eyes :-)
I considered defining this macro while writing this patch, but decided
against this because the X86_FEATURE_RDTSCP on the same line should
documenting the opcode sufficiently. I just forgot to remove that
#define :)

Thanks again,
Joerg

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20.2: kernel BUG at fs/nfs/write.c:505!

2007-03-12 Thread Stresslinux Kernel

Hello List,

running the following command

/sbin/grub-install --root-directory=/mnt --no-floppy /dev/sda

from a nfsroot system with kernel 2.6.20.2 (x86_64) results in:

[ cut here ]
kernel BUG at fs/nfs/write.c:505!
invalid opcode:  [1] SMP
CPU 0
Modules linked in: ipv6 genrtc
Pid: 1464, comm: grub-install Not tainted 2.6.20.2 #1
RIP: 0010:[]  [] 
nfs_wait_on_requests_locked+0x43/0xb2
RSP: 0018:81007d669ca8  EFLAGS: 00010246
RAX: f0ba RBX:  RCX: 802917aa
RDX:  RSI: 81007d669cb0 RDI: 810002f5f9d8
RBP: 810002f5f898 R08: 0001 R09: 0286
R10: 7fff R11: 0286 R12: 74d83948d2310001
R13:  R14:  R15: 
FS:  2b24eb8214a0() GS:809d6000() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00596d88 CR3: 7d594000 CR4: 06e0
Process grub-install (pid: 1464, threadinfo 81007d668000, task 
81007e885840)
Stack:  810002f5f898 802917aa  810002f5fb58
 0020 810002f5f898 81007d669da8 80331245
 002e 81007d669e48 0020 
Call Trace:
 [] bd_forget+0x8d/0x8e
 [] nfs_sync_mapping_wait+0xbe/0x1ec
 [] nfs_sync_mapping_range+0x80/0xa1
 [] nfs_getattr+0x2e/0x9b
 [] vfs_getattr+0x1d/0x2b
 [] vfs_lstat_fd+0x2f/0x47
 [] do_page_fault+0x279/0x572
 [] do_sigaction+0x6b/0x1b0
 [] sys_newlstat+0x19/0x31
 [] error_exit+0x0/0x84
 [] system_call+0x7e/0x83


Code: 0f 0b eb fe f0 ff 41 44 c7 85 18 01 00 00 01 00 00 00 48 8b
RIP  [] nfs_wait_on_requests_locked+0x43/0xb2
 RSP 


The nfsroot contains a debootstraped version of debian 3.1 amd64.


My hardware:
a MSI Barebone system -> 
http://www.msi-computer.de/produkte/bare_idx_view.php?Prod_id=515

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  :   Intel(R) Pentium(R) 4 CPU 3.06GHz
stepping: 9
cpu MHz : 3059.077
cache size  : 1024 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor 
ds_cpl tm2 cid cx16 xtpr
lahf_lm
bogomips: 6123.63
clflush size: 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  :   Intel(R) Pentium(R) 4 CPU 3.06GHz
stepping: 9
cpu MHz : 3059.077
cache size  : 1024 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 1
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor 
ds_cpl tm2 cid cx16 xtpr
lahf_lm
bogomips: 6118.50
clflush size: 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

:00:00.0 Host bridge: Intel Corporation 82915G/P/GV/GL/PL/910GL Memory 
Controller Hub (rev 0e)
:00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL 
Integrated Graphics
Controller (rev 0e)
:00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #1 (rev 04)
:00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #2 (rev 04)
:00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #3 (rev 04)
:00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #4 (rev 04)
:00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB2 EHCI
Controller (rev 04)
:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d4)
:00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC 
Interface Bridge (rev 04)
:00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) IDE Controller
(rev 04)
:00:1f.2 IDE interface: Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA 
Controller (rev 04)
:00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus 
Controller (rev 04)
:02:06.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet 
Controller (rev 05)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.

2007-03-12 Thread Jarek Poplawski

On Mon, Mar 12, 2007 at 02:36:46PM +0200, Pekka Enberg wrote:
> On 3/12/07, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> >So, maybe it's less evil to check those NULLs where possible and add
> >some WARN_ONs here and there...
> 
> No, it's much better to oops rather than paper over a bug.
> 

I'm not sure I can understand your intentions - do you mean
always better? In my opinion oops is right only to avoid
some danger. And here is no danger - some routing info,
which is internal multipath_xxx data and can be handled
safely, will not go into its cache.

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [discuss] [PATCH 4/4 TRY#3] optimize and simplify get_cycles_sync()

2007-03-12 Thread Michael Matz

Hi Joerg,

On Mon, 12 Mar 2007, Joerg Roedel wrote:

> > >+#define RDTSCP ".byte 0x0f, 0x01, 0xf9"
> > >+  alternative_io_two("cpuid\nrdtsc",
> > >+ "rdtsc", X86_FEATURE_SYNC_RDTSC,
> > >+ ".byte 0x0f, 0x01, 0xf9", X86_FEATURE_RDTSCP,
> > >  
> > 
> > why not use the RDTSCP macro here?
> 
> Does this macro exist?

Look carefully at your patch again, or at least the four quoted lines 
above.  You've added it yourself, in exactly the form you'd need in the 
alternative_io_two() call :-)

> I couldn't found it in the current git tree. And the rdtscp macros in 
> msr.h use the plain opcode too.

Ciao,
Michael.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] revoke: special mmap handling

2007-03-12 Thread Pekka J Enberg

Hi Honza,

On Mon, 12 Mar 2007, Jan Kara wrote:
> > +#define VM_REVOKED 0x0400  /* Mapping has been revoked */
> > +
>   Is it intended to conflict with VM_ALWAYSDUMP? I'd guess not and if
> yes, it definitely deserves a comment...

Peter Zijlstra spotted this also and it has been fixed in the patches I 
sent to Andrew yesterday.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!)

2007-03-12 Thread Michael S. Tsirkin

> Quoting Ingo Molnar <[EMAIL PROTECTED]>:
> Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup 
> detected on CPU#0!)
> 
> 
> * Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
> 
> > > So either there are other sites that instanciate those objects and
> > > forget about the lock init, or the object is corrupted (use after free?)
> > 
> > OK, thanks for the hint. So I added this:
> 
> > And sure enough it triggers:
> > 
> > [  858.503010] ipoib_neigh_destructor lock c0687880 wrong type 772 
> > !!
> 
> could you turn on CONFIG_SLAB_DEBUG as well?
> 
> that should catch certain types of use-after-free accesses, and lockdep 
> will also warn if a still locked object is freed.

Hmm, no, this does not look like use-after-free.
I enabled CONFIG_SLAB_DEBUG, and I still see the same message, so
the memory was not overwritten by slab debugger.


-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kthread_should_stop_check_freeze

2007-03-12 Thread Cedric Le Goater

Oleg Nesterov wrote:
> On 03/12, Rafael J. Wysocki wrote:
>> On Monday, 12 March 2007 09:14, Pavel Machek wrote:
>>> Can we get better name for this function?
>> Well, I took the name from the Oleg's message.  Can you please suggest
>> something?
> 
> Well, kthread_should_stop_check_freeze() is really awful, I agree :)
> We need something better, but I can't suggest anything.

not much better, but what about kthread_should_stop_or_freeze() ?

cheers,

C.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm2 hangs my opteron during bootup, ACPI?

2007-03-12 Thread Luming Yu


try acpi=off please.

On 3/12/07, Helge Hafting <[EMAIL PROTECTED]> wrote:

I went from 2.6.18-rc5-mm1 to 2.6.21-rc3-mm2
The computer now hangs solid during boot, at this point:

usb 1-1: configuration #1 chosen from 1 choice
drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 2 if 0
alt 0 proto 2 vid 0x04B8 pid 0x0007
usb 1-3: new high speed USB device using ehci_hcd and address 3
pc87360: PC8736x not detected, module not inserted.
md: raid1 personality registered for level 1
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
EDAC MC: Ver: 2.0.1 Sep  2 2006
sdhci: Secure Digital Host Controller Interface driver, 0.12
sdhci: Copyright(c) Pierre Ossman
wbsd: Winbond W83L51xD SD/MMC card interface driver, 1.6
wbsd: Copyright(c) Pierre Ossman
Advanced Linux Sound Architecture Driver Version 1.0.12rc1 (Thu Jun 22
13:55:50 2006 UTC).
ACPI: PCI Interrupt :00:06.0[A] -> GSI 17 (level, low) -> IRQ 17


Here it stops with a dead keyboard.  No sysrq, it is time for the power
button.
A 2.6.18-rc5-mm1 boot continues like this:

gameport: Trident 4DWave is pci:00:06.0/gameport0, speed 1884kHz
ALSA device list:
  #0: Trident TRID4DWAVENX PCI Audio at 0x9400, irq 17
oprofile: using NMI interrupt.
Netfilter messages via NETLINK v0.30.
IPv4 over IPv4 tunneling driver
GRE over IPv4 tunneling driver
ip_conntrack version 2.4 (2043 buckets, 16344 max) - 288 bytes per conntrack
ip_tables: (C) 2000-2006 Netfilter Core Team
joydump: ,-- START .


I'll be trying 2.6.20 next, unless adviced otherwise.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 12:07:18PM +, Alistair John Strachan wrote:
> On Monday 12 March 2007 11:24, Frank van Maarseveen wrote:
> > On Mon, Mar 12, 2007 at 09:54:47AM +0100, Frank van Maarseveen wrote:
> > > 2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
> > > -d 1 /dev/hda):
> > >
> > >   hda: dma_timer_expiry: dma status == 0x20
> > >   hda: DMA timeout retry
> > >   hda: timeout waiting for DMA
> > >   hda: status error: status=0x58 {
> > >   DriveReady
> > >   SeekComplete
> > >   DataRequest
> > >   }
> [snip]
> > This system has SATA but there's only one PATA disk
> 
> Not a solution, unfortunately, but try disabling CONFIG_IDE and using Alan's 
> new PATA drivers. For your Intel systems, this should mean you need only:
> 
> CONFIG_ATA_PIIX
> 
> For both SATA and PATA support. You'll need the appropriate SCSI modules 
> built 
> in (if you say =y), i.e. SCSI disk and SCSI CDROM should be built in.

yes, that worked... after booting with root=/dev/sda2 and s/hda/sda/
/etc/fstab /etc/lilo.conf + lilo. didn't mount a /dev/sr0 for a loong
time.

So, are /dev/hd* going to disappear in a few years? iow, does it make
sense to _slowly_ start to migrate to /dev/sd*?

The problem is there's no plan B in case of any troubles except rename
everything back again to boot an old kernel.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] revoke: special mmap handling

2007-03-12 Thread Jan Kara

  Hi,

> This adds special handling for revoked memory mappings.  We want to
> raise SIGBUS when accessing revoked mappings and return ENODEV when
> trying to remap with mmap(2).
> 
> Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
> ---
>  include/linux/mm.h |2 ++
>  mm/memory.c|3 +++
>  mm/mmap.c  |   12 
>  3 files changed, 13 insertions(+), 4 deletions(-)
> 
> Index: uml-2.6/include/linux/mm.h
> ===
> --- uml-2.6.orig/include/linux/mm.h   2007-03-08 10:24:24.0 +0200
> +++ uml-2.6/include/linux/mm.h2007-03-08 10:24:25.0 +0200
> @@ -170,6 +170,8 @@
>  #define VM_INSERTPAGE0x0200  /* The vma has had 
> "vm_insert_page()" done on it */
>  #define VM_ALWAYSDUMP0x0400  /* Always include in core dumps 
> */
>  
> +#define VM_REVOKED   0x0400  /* Mapping has been revoked */
> +
  Is it intended to conflict with VM_ALWAYSDUMP? I'd guess not and if
yes, it definitely deserves a comment...


Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: RSDL-mm 0.28

2007-03-12 Thread David Schwartz


> > There's a substantial performance hit for not yield, so we probably
> > want to investigate alternate semantics for it. It seems reasonable
> > for apps to say "let me not hog the CPU" without completely expiring
> > them. Imagine you're in the front of the line (aka queue) and you
> > spend a moment fumbling for your wallet. The polite thing to do is to
> > let the next guy in front. But with the current sched_yield, you go
> > all the way to the back of the line.

> Well... are you advocating we change sched_yield semantics to a
> gentler form? This is a cinch to implement but I know how Ingo feels
> about this. It will only encourage more lax coding using sched_yield
> instead of proper blocking (see huge arguments with the ldap people on
> this one who insist it's impossible not to use yield).

The basic point of sched_yield is to allow every other process at the same
static priority level a chance to use the CPU before you get it back. It is
generally an error to use sched_yield to be nice. It's nice to get your work
done when the scheduler gives you the CPU, that's why it gave it to you.

It is proper to use sched_yield as an optimization when it more efficient to
allow another process/thread to run than you, for example, when you
encounter a task you cannot do efficiently at that time because another
thread holds a lock.

It's also useful prior to doing something that can most efficiently be done
without interruption. So a thread that returns from 'sched_yield' should
ideally be given a full timeslice if possible. This may not be sensible if
the 'sched_yield' didn't actuall yield, but then again, if nothing else
wants to run, why not give the only task that does a full slice?

In no case is much of anything guaranteed, of course. (What can you do if
there's no other process to yield to?)

Note that processes that call sched_yield should be rewarded for doing so
just as process that block on I/O are, assuming they do in fact wind up
giving up the CPU when they would otherwise have had it.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [discuss] [PATCH 4/4 TRY#3] optimize and simplify get_cycles_sync()

2007-03-12 Thread Joerg Roedel

On Mon, Mar 12, 2007 at 02:09:18PM +0100, Andi Kleen wrote:
> On Monday 12 March 2007 14:02, Joerg Roedel wrote:
> > On Fri, Mar 09, 2007 at 08:10:03PM +0200, Avi Kivity wrote:
> > > Joerg Roedel wrote:
> > > >From: Joerg Roedel <[EMAIL PROTECTED]>
> > > >
> > > >This patch simplifies the get_cycles_sync() function by removing
> > > >the #ifdefs from it. Further it introduces an optimization for AMD
> > > >processors. There the RDTSCP instruction is used instead of CPUID;RDTSC
> > > >which is helpfull if the kernel runs as a KVM guest. Running as a guest
> > > >makes CPUID very expensive because it causes an intercept of the guest.
> > > >
> > > >  +#define RDTSCP ".byte 0x0f, 0x01, 0xf9"
> > > >+alternative_io_two("cpuid\nrdtsc",
> > > >+   "rdtsc", X86_FEATURE_SYNC_RDTSC,
> > > >+   ".byte 0x0f, 0x01, 0xf9", X86_FEATURE_RDTSCP,
> > > >  
> > > 
> > > why not use the RDTSCP macro here?
> > 
> > Does this macro exist? I couldn't found it in the current git tree. And
> > the rdtscp macros in msr.h use the plain opcode too.
> 
> It doesn't exist. The rdtscp macros are also not used currently, that
> is why nobody's binutils complained.
> 
> Doing the .bytes is ok
> 
> I still don't like the alternative() record complications though.

Do you think of another way to make use of RDTSCP in the get_cycles_sync
function? Using CPUID in a function called such often is bad when
running Linux as a  virtualization guest...
So using RDTSCP there might be a goog idea.

Regards,
Joerg

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL for 2.6.21-rc3- 0.29

2007-03-12 Thread Douglas McNaught

Gene Heskett <[EMAIL PROTECTED]> writes:

> If, and I have previously, I revert to a 2.6.20-ck1 patching, this does 
> not occur.  So my contention is that someplace in this recent progression 
> from 2.6.20 to 2.6.21-rc3, there is a patch which acts to change how 
> c-time is being reported to tar.  Or there is a spillage into c-times 
> when tar does its estimate scans where the output goes to /dev/null.
> Or possibly even this version of tar is doing it differently.  I just 
> looked up how to get the c-times out of ls, and they, as far as ls is 
> concerned, look sane.  But tars actions while running a 2.6.21-rcX kernel 
> certainly are not.  I do have a plain -rc2 I can try, so that will be the 
> next test.  If that also fails in this manner, I'll build a later 
> 2.6.20-2 or whatever to verify that it doesn't so suffer.

You may find 'strace' useful to track down this sort of thing (though
the output can be voluminous).

-Doug
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [discuss] [PATCH 4/4 TRY#3] optimize and simplify get_cycles_sync()

2007-03-12 Thread Andi Kleen

On Monday 12 March 2007 14:02, Joerg Roedel wrote:
> On Fri, Mar 09, 2007 at 08:10:03PM +0200, Avi Kivity wrote:
> > Joerg Roedel wrote:
> > >From: Joerg Roedel <[EMAIL PROTECTED]>
> > >
> > >This patch simplifies the get_cycles_sync() function by removing
> > >the #ifdefs from it. Further it introduces an optimization for AMD
> > >processors. There the RDTSCP instruction is used instead of CPUID;RDTSC
> > >which is helpfull if the kernel runs as a KVM guest. Running as a guest
> > >makes CPUID very expensive because it causes an intercept of the guest.
> > >
> > >  +#define RDTSCP ".byte 0x0f, 0x01, 0xf9"
> > >+  alternative_io_two("cpuid\nrdtsc",
> > >+ "rdtsc", X86_FEATURE_SYNC_RDTSC,
> > >+ ".byte 0x0f, 0x01, 0xf9", X86_FEATURE_RDTSCP,
> > >  
> > 
> > why not use the RDTSCP macro here?
> 
> Does this macro exist? I couldn't found it in the current git tree. And
> the rdtscp macros in msr.h use the plain opcode too.

It doesn't exist. The rdtscp macros are also not used currently, that
is why nobody's binutils complained.

Doing the .bytes is ok

I still don't like the alternative() record complications though.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: RSDL v0.30 cpu scheduler for ... 2.6.18.8 kernel

2007-03-12 Thread Fortier,Vincent [Montreal]

> 
> Hello Vincent,
> 
> As it seems the 5s net gain is during the early phase of the 
> boot process, could you please post longer dmesgs, to let us 
> know where the gain occurs ?

Hi Paul,

Here is an sdiff between the two dmesg:
[0.00] Linux version 2.6.18.8-amd64-envcan-003 (root@ | [
0.00] Linux version 2.6.18.8-rsdl-0.30-amd64-envcan
[0.00] time.c: Detected 2210.220 MHz processor.   | [
0.00] time.c: Detected 2210.215 MHz processor.
[   37.396210] Console: colour dummy device 80x25 | [
32.320335] Console: colour dummy device 80x25
[   37.397484] Dentry cache hash table entries: 262144 (order | [
32.321611] Dentry cache hash table entries: 262144 (order
[   37.399120] Inode-cache hash table entries: 131072 (order: | [
32.323318] Inode-cache hash table entries: 131072 (order:
[   37.399633] Checking aperture...   | [
32.323869] Checking aperture...
[   37.399641] CPU 0: aperture @ 84 size 32 MB| [
32.323877] CPU 0: aperture @ 84 size 32 MB
[   37.399645] Aperture too small (32 MB) | [
32.323881] Aperture too small (32 MB)
[   37.404577] No AGP bridge found| [
32.328813] No AGP bridge found
[   37.421192] Memory: 2054704k/2096000k available (1934k ker | [
32.345833] Memory: 2054704k/2096000k available (1935k ker
[   37.498110] Calibrating delay using timer specific routine | [
32.422230] Calibrating delay using timer specific routine

Here is a bit more detailed info:

Vanilla Version:
[0.00] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[0.00] Built 1 zonelists.  Total pages: 515848
[0.00] Kernel command line: root=LABEL=DebianEtch_64 ro
vga=0x305
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 32768 bytes)
[0.00] Disabling vsyscall due to use of PM timer
[0.00] time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
[0.00] time.c: Detected 2210.220 MHz processor.
[   37.396210] Console: colour dummy device 80x25
[   37.397484] Dentry cache hash table entries: 262144 (order: 9,
2097152 bytes)
[   37.399120] Inode-cache hash table entries: 131072 (order: 8, 1048576
bytes)
[   37.399633] Checking aperture...
[   37.399641] CPU 0: aperture @ 84 size 32 MB
[   37.399645] Aperture too small (32 MB)
[   37.404577] No AGP bridge found
[   37.421192] Memory: 2054704k/2096000k available (1934k kernel code,
40908k reserved, 868k data, 188k init)


RSDL Version:
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 32768 bytes)
[0.00] Disabling vsyscall due to use of PM timer
[0.00] time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
[0.00] time.c: Detected 2210.215 MHz processor.
[   32.320335] Console: colour dummy device 80x25
[   32.321611] Dentry cache hash table entries: 262144 (order: 9,
2097152 bytes)
[   32.323318] Inode-cache hash table entries: 131072 (order: 8, 1048576
bytes)
[   32.323869] Checking aperture...
[   32.323877] CPU 0: aperture @ 84 size 32 MB
[   32.323881] Aperture too small (32 MB)
[   32.328813] No AGP bridge found

There seems to be a some sort of lag happening adter the CPU
detection... It might simply be the call to vga=0x305 which takes less
or more time depending of the kernel?  I have attached a gzip full
dmesg.

> 
> Regards,
> Paul
> 

- vin 


dmesg.2.6.18.8-rsdl-0.30-amd64-envcan.gz
Description: dmesg.2.6.18.8-rsdl-0.30-amd64-envcan.gz

Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-03-12 Thread Andi Kleen


> Andi, have you had a look at this? I'm a bit surprised at the lack of 
> reaction to this find..


FYI the problem is still being analysed behind the scenes. Chip's patch didn't 
fix 
it in all cases unfortunately -- it just changed the timing enough to make it 
happen
less often. The latest evidence points to a DMA mapping management problem
in Linux. Apparently in some cases sata_nv does DMA on an already freed and then
reused mapping.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4 TRY#3] optimize and simplify get_cycles_sync()

2007-03-12 Thread Joerg Roedel

On Fri, Mar 09, 2007 at 08:10:03PM +0200, Avi Kivity wrote:
> Joerg Roedel wrote:
> >From: Joerg Roedel <[EMAIL PROTECTED]>
> >
> >This patch simplifies the get_cycles_sync() function by removing
> >the #ifdefs from it. Further it introduces an optimization for AMD
> >processors. There the RDTSCP instruction is used instead of CPUID;RDTSC
> >which is helpfull if the kernel runs as a KVM guest. Running as a guest
> >makes CPUID very expensive because it causes an intercept of the guest.
> >
> >  +#define RDTSCP ".byte 0x0f, 0x01, 0xf9"
> >+alternative_io_two("cpuid\nrdtsc",
> >+   "rdtsc", X86_FEATURE_SYNC_RDTSC,
> >+   ".byte 0x0f, 0x01, 0xf9", X86_FEATURE_RDTSCP,
> >  
> 
> why not use the RDTSCP macro here?

Does this macro exist? I couldn't found it in the current git tree. And
the rdtscp macros in msr.h use the plain opcode too.

Regards,
Joerg

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm2 hangs my opteron during bootup, ACPI?

2007-03-12 Thread Helge Hafting


I went from 2.6.18-rc5-mm1 to 2.6.21-rc3-mm2
The computer now hangs solid during boot, at this point:

usb 1-1: configuration #1 chosen from 1 choice
drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 2 if 0 
alt 0 proto 2 vid 0x04B8 pid 0x0007

usb 1-3: new high speed USB device using ehci_hcd and address 3
pc87360: PC8736x not detected, module not inserted.
md: raid1 personality registered for level 1
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
EDAC MC: Ver: 2.0.1 Sep  2 2006
sdhci: Secure Digital Host Controller Interface driver, 0.12
sdhci: Copyright(c) Pierre Ossman
wbsd: Winbond W83L51xD SD/MMC card interface driver, 1.6
wbsd: Copyright(c) Pierre Ossman
Advanced Linux Sound Architecture Driver Version 1.0.12rc1 (Thu Jun 22 
13:55:50 2006 UTC).

ACPI: PCI Interrupt :00:06.0[A] -> GSI 17 (level, low) -> IRQ 17


Here it stops with a dead keyboard.  No sysrq, it is time for the power 
button.

A 2.6.18-rc5-mm1 boot continues like this:

gameport: Trident 4DWave is pci:00:06.0/gameport0, speed 1884kHz
ALSA device list:
 #0: Trident TRID4DWAVENX PCI Audio at 0x9400, irq 17
oprofile: using NMI interrupt.
Netfilter messages via NETLINK v0.30.
IPv4 over IPv4 tunneling driver
GRE over IPv4 tunneling driver
ip_conntrack version 2.4 (2043 buckets, 16344 max) - 288 bytes per conntrack
ip_tables: (C) 2000-2006 Netfilter Core Team
joydump: ,-- START .


I'll be trying 2.6.20 next, unless adviced otherwise.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: should RTS init in serial core be tied to CRTSCTS

2007-03-12 Thread Krzysztof Halasa

"Tosoni" <[EMAIL PROTECTED]> writes:

> It has always been the standard for all modems.

Look, I've been using various modems for many years, starting with
self-made 300 bps one and there were basically 3 options:
- no flow control at all (no buffering etc), RTS/CTS disabled/missing,
- XON/XOFF flow control, RTS/CTS disabled/missing,
- RTS/CTS handshaking with RTS = modem can send to computer (with
  option to ignore RTS) and CTS = computer can send to modem.

> The mistake comes from the
> fact that the serial ports has been used extensively to drive things which
> are *not* modems (say, printers and VT100 consoles on Unix systems). Such
> devices did not need the standard-specified RTS function.

VT100 and printers are DTE, connections between DTE (without help
of DCE) are obviously non-standard.

> CCITT V24 says about RTS: "...this signal drives the DCE and sets it to
> transmit data..." (translated from french)
> CCITT V24 does not constraint the DCE to being half or full duplex.
> CCITT V24 says nothing about using RTS to handle flow control.

Circuit 105 - Request to send
Direction: To DCE
Signals on this circuit control the data channel transmit function
of the DCE. The ON condition causes the DCE to assume the data channel
transmit mode. The OFF condition causes the DCE to assume the data
channel non-transmit mode, when all data transferred on circuit 103
have been transmitted.

What do you think are "data channel transmit mode" and "non-transmit"
mode? Obviously the standard doesn't know if it's a radiomodem, RS-485
style multipoint bus or something else but it's clearly half-duplex -
full-duplex devices are always in "transmit" and "receive" mode
simultaneously.

Does your modem drop carrier when RTS goes?

V.24 assumes DTE is always able to receive data. While it's probably
the case with PC and Linux, it may not be true with all DTE. With
strict V.24 DTE has not way to say "I can't take data, stop
transmitting". Now imagine connecting a serial printer to a PC
with a pair of DCE.

>> I've seen such devices quite recently, perhaps ~ 10 years ago.
>> OTOH I think even "current" PC BIOSes use such signaling.
>
> Even Windows implements the CCITT view of RTS, via a flag named "RTS_TOGGLE"

Great, meanwhile we don't have it here, but that only means nobody
is really interested in it.

>> For such signaling, it would perhaps be better to invent another flag,
>> similar to CRTSCTS. The driver would, of course, need some real code
>> for that.
>
> Another flag would help to drive modems, yes.

Which modems, exactly? Normal modems work perfectly fine with current
CRTSCTS and this RTS toggling could only confuse them.
A flag alone is no help for half-duplex devices, they would need
a complete handshaking code.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-12 Thread Con Kolivas

On Monday 12 March 2007 22:26, Al Boldi wrote:
> Con Kolivas wrote:
> > On Monday 12 March 2007 15:42, Al Boldi wrote:
> > > Con Kolivas wrote:
> > > > On Monday 12 March 2007 08:52, Con Kolivas wrote:
> > > > > And thank you! I think I know what's going on now. I think each
> > > > > rotation is followed by another rotation before the higher priority
> > > > > task is getting a look in in schedule() to even get quota and add
> > > > > it to the runqueue quota. I'll try a simple change to see if that
> > > > > helps. Patch coming up shortly.
> > > >
> > > > Can you try the following patch and see if it helps. There's also one
> > > > minor preemption logic fix in there that I'm planning on including.
> > > > Thanks!
> > >
> > > Applied on top of v0.28 mainline, and there is no difference.
> > >
> > > What's it look like on your machine?
> >
> > The higher priority one always get 6-7ms whereas the lower priority one
> > runs 6-7ms and then one larger perfectly bound expiration amount.
> > Basically exactly as I'd expect. The higher priority task gets precisely
> > RR_INTERVAL maximum latency whereas the lower priority task gets
> > RR_INTERVAL min and full expiration (according to the virtual deadline)
> > as a maximum. That's exactly how I intend it to work. Yes I realise that
> > the max latency ends up being longer intermittently on the niced task but
> > that's -in my opinion- perfectly fine as a compromise to ensure the nice
> > 0 one always gets low latency.
>
> I think, it should be possible to spread this max expiration latency across
> the rotation, should it not?

There is a way that I toyed with of creating maps of slots to use for each 
different priority, but it broke the O(1) nature of the virtual deadline 
management. Minimising algorithmic complexity seemed more important to 
maintain than getting slightly better latency spreads for niced tasks. It 
also appeared to be less cache friendly in design. I could certainly try and 
implement it but how much importance are we to place on latency of niced 
tasks? Are you aware of any usage scenario where latency sensitive tasks are 
ever significantly niced in the real world?

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Message ("Your message dated Mon, 12 Mar 2007 07:37:15...")

2007-03-12 Thread L-Soft list server at Binghamton University (1.8e)

Your message dated Mon, 12 Mar  2007 07:37:15 -0500 with subject "Message
could not be delivered" has been submitted to the moderator of the SPORTS
list: John Hartrick <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 01:21:18PM +0100, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> Could you check if this is the same problem as this one:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=8169

Looks like it except that I don't see "lost interrupt" messages here. So,
it might be something different (I don't know).

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin

On Mon, Mar 12, 2007 at 01:21:03PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > > > > the issue is this: your fix reduces the effects of the bug but 
> > > > > it is still fundamentally incomplete because of the use of 
> > > > > timer_list. So
> > > > 
> > > > But using schedule_timeout is not a bug. Userspace timeouts are 
> > > > always defined to be "at least".
> > > 
> > > but what you are adding isnt a plain schedule_timeout(), it is a 
> > > restart block handling loop. And for those restart blocks that 
> > > relate to timeouts, we only use hrtimers. I am not making this up to 
> > > annoy you: take a look at all the current restart block handlers - 
> > > they are hrtimer based, for exactly this reason.
> > 
> > So why do you say it is fundamentally incomplete?
> 
> because i misread your last patch :-) I thought it still has a window 
> for inaccuracy, but you are right: it should be at most 1 jiffy 
> inaccurate, no matter how many times we restart.

OK, no problem.

> still ... the hrtimers patch has been submitted to lkml before yours, 
> and has been tested extensively, so why go the extra side-jump 
> prolonging the jiffies sleep method? The LTP failure has been there 
> since the inception of the futex code i suspect. Going this way also 
> enables the addressing of a more pressing need: the elimination of 
> glibc's forced use of relative futex timeouts.

I guess my arguments are that my patch fixes a bug, which gives it a
higher priority (being a userspace API bug, perhaps even 2.6.21); and
that it will want to be backported while the hrtimer patch will not, so
including the hrtimer patch first means 2 different patches to fix the
same bug.

I'm not trying to make life harder for the hrtimer patch. I will even
volunteer to forward port it on top of the restart fix, if that is an
issue.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.

2007-03-12 Thread Pekka Enberg


On 3/12/07, Jarek Poplawski <[EMAIL PROTECTED]> wrote:

So, maybe it's less evil to check those NULLs where possible and add
some WARN_ONs here and there...


No, it's much better to oops rather than paper over a bug.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Summary of resource management discussion

2007-03-12 Thread Srivatsa Vaddagiri

I happened to read the entire thread (@ http://lkml.org/lkml/2007/3/1/159)
all over again and felt it may be usefull to summarize the discussions so far.

If I have missed any imp. points or falsely represented someone's view
(unintentionally of course!), then I would be glad to be corrected.

1. Which task-grouping mechanism?

[ This question is the most vital one that needs a consensus ]

Resource management generally works by apply resource controls over a -group- 
of tasks (tasks of a user, tasks in a vserver/container etc).

What mechanism do we use to group tasks for res mgmt purposes?

Options:

a. Paul Menage's container(/uh-a-diff-name-pls?) patches

The patches introduce a new pointer in task_struct, struct
container_group *containers, and a new structure 'struct container'.

Tasks pointing to the same 'struct container' object (via their
tsk->containers->container[] pointer) are considered to form
a group associated with that container. The attributes associated
with a container (ex: cpu_limit, rss_limit, cpus/mems_allowed) are 
decided by the options passed to mount command (which binds 
one/more/all resource controllers to a hierarchy).

+ For workload management, where it is desirable to manage resource 
  consumption of a run-time defined (potentially arbitrary) group of 
  tasks, then this patch is handy, as no existing pointers in 
  task_struct can be used to form such a run-time decided group.

- (subjective!) If there is a existing grouping mechanism already (say 
  tsk->nsproxy[->pid_ns]) over which res control needs to be applied, 
  then the new grouping mechanism can be considered redundant (it can 
  eat up unnecessary space in task_struct)

  What may help avoid this redundancy is to re-build existing 
  grouping mechanism (say tsk->nsproxy) using the container patches.
  Serge however expressed some doubts on such a implementation
  (for ex: how will one build hierarchical cpusets and non-hierarchical
  namespaces using that single 'grouping' pointer in task_struct) and 
  also felt it may slow down things a bit from namespaces pov (more 
  dereferences reqd to get to a task's namespace).

b. Reuse existing pointers in task_struct, tsk->nsproxy or better perhaps 
   tsk->nsproxy->pid_ns, as the means to group tasks (rcfs patches)

This is based on the observation that the group of tasks whose resource 
consumption need to be managed is already defined in the kernel by 
existing pointers (either tsk->nsproxy or tsk->nsproxy->pid_ns)

+ reuses existing grouping mechanism in kernel

- mixes resource and name spaces (?)

c. Introduce yet-another new structure ('struct res_ctl?') which houses 
   resource control (& possibly pid_ns?) parameters and a new pointer to this 
   structure in task_struct (Herbert Poetzl).

Tasks that have a pointer to the same 'struct res_ctl' are considered 
to form a group for res mgmt purpose

+ Accessing res ctl information in scheduler fast path is
  optimized (only two-dereferences required)

- If all resource control parameters (cpu, memory, io etc) are
  lumped together in same structure, it makes it hard to
  have resource classes (cpu, mem etc) that are independent of
  each other.

- If we introduce several pointers in task_struct to allow
  separation of resource classes, then it will increase storage space 
  in task_struct and also fork time (we have to take ref count
  on more than one object now). Herbert thinks this is worthy
  tradeoff for the benefit gained in scheduler fast paths.


2. Where do we put resource control parameters for a group?

This depends on 1. So the options are:

a. Paul Menage's patches:

(tsk->containers->container[cpu_ctlr.subsys_id] - X)->cpu_limit

   An optimized version of the above is:
(tsk->containers->subsys[cpu_ctlr.subsys_id] - X)->cpu_limit


b. rcfs
tsk->nsproxy->ctlr_data[cpu_ctlr.subsys_id]->cpu_limit

c. Herbert's proposal
tsl->res_ctl->cpu_limit


3. How are cpusets related to vserver/containers?

Should it be possible to, lets say, create exclusive cpusets and
attach containers to different cpusets?

4. Interface
Filesystem vs system call 

Filesystem:
+ natural way to represent hierarchical data
+ File permission model convenient to delegate
  management of part of a tree to one user
+ Ease of use with scripts

(from Herbet Poetzl):

- performance of filesystem interfaces is quite bad
- you need to do a lot to make the fs consistant for
  e.g. find and friends

Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)

2007-03-12 Thread Oleg Nesterov

On 03/12, Rafael J. Wysocki wrote:
> 
> On Monday, 12 March 2007 09:14, Pavel Machek wrote:
> > 
> > Can we get better name for this function?
> 
> Well, I took the name from the Oleg's message.  Can you please suggest
> something?

Well, kthread_should_stop_check_freeze() is really awful, I agree :)
We need something better, but I can't suggest anything.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > > > the issue is this: your fix reduces the effects of the bug but 
> > > > it is still fundamentally incomplete because of the use of 
> > > > timer_list. So
> > > 
> > > But using schedule_timeout is not a bug. Userspace timeouts are 
> > > always defined to be "at least".
> > 
> > but what you are adding isnt a plain schedule_timeout(), it is a 
> > restart block handling loop. And for those restart blocks that 
> > relate to timeouts, we only use hrtimers. I am not making this up to 
> > annoy you: take a look at all the current restart block handlers - 
> > they are hrtimer based, for exactly this reason.
> 
> So why do you say it is fundamentally incomplete?

because i misread your last patch :-) I thought it still has a window 
for inaccuracy, but you are right: it should be at most 1 jiffy 
inaccurate, no matter how many times we restart.

still ... the hrtimers patch has been submitted to lkml before yours, 
and has been tested extensively, so why go the extra side-jump 
prolonging the jiffies sleep method? The LTP failure has been there 
since the inception of the futex code i suspect. Going this way also 
enables the addressing of a more pressing need: the elimination of 
glibc's forced use of relative futex timeouts.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]Replace 0 with NULL when returning a pointer

2007-03-12 Thread Avi Kivity


Cong WANG wrote:

Use NULL to indicate we are returning a pointer rather than an integer
and to eliminate some sparse warnings.

Signed-off-by: Cong WANG <[EMAIL PROTECTED]>


These are already fixed in my repo.


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.

2007-03-12 Thread Pekka J Enberg

On 3/9/07, David Miller <[EMAIL PROTECTED]> wrote:
> The whole cahce-multipath subsystem has to have it's guts revamped for
> proper error handling.

(Untested patch follows.)

From: Amit Choudhary <[EMAIL PROTECTED]>

Check the return value of kmalloc() in function wrandom_set_nhinfo(),
in file net/ipv4/multipath_wrandom.c.

[EMAIL PROTECTED]: return error status to caller.]
Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 include/net/ip_mp_alg.h  |8 +---
 net/ipv4/multipath_wrandom.c |   19 +++
 net/ipv4/route.c |9 +++--
 3 files changed, 27 insertions(+), 9 deletions(-)

Index: 2.6/include/net/ip_mp_alg.h
===
--- 2.6.orig/include/net/ip_mp_alg.h2007-03-12 14:00:13.0 +0200
+++ 2.6/include/net/ip_mp_alg.h 2007-03-12 14:03:10.0 +0200
@@ -17,7 +17,7 @@ struct ip_mp_alg_ops {
void(*mp_alg_select_route)(const struct flowi *flp,
   struct rtable *rth, struct rtable **rp);
void(*mp_alg_flush)(void);
-   void(*mp_alg_set_nhinfo)(__be32 network, __be32 netmask,
+   int (*mp_alg_set_nhinfo)(__be32 network, __be32 netmask,
 unsigned char prefixlen,
 const struct fib_nh *nh);
void(*mp_alg_remove)(struct rtable *rth);
@@ -58,17 +58,19 @@ static inline void multipath_flush(void)
 #endif
 }
 
-static inline void multipath_set_nhinfo(struct rtable *rth,
+static inline int multipath_set_nhinfo(struct rtable *rth,
__be32 network, __be32 netmask,
unsigned char prefixlen,
const struct fib_nh *nh)
 {
+   int err = 0;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
struct ip_mp_alg_ops *ops = ip_mp_alg_table[rth->rt_multipath_alg];
 
if (ops && ops->mp_alg_set_nhinfo)
-   ops->mp_alg_set_nhinfo(network, netmask, prefixlen, nh);
+   err = ops->mp_alg_set_nhinfo(network, netmask, prefixlen, nh);
 #endif
+   return err;
 }
 
 static inline void multipath_remove(struct rtable *rth)
Index: 2.6/net/ipv4/multipath_wrandom.c
===
--- 2.6.orig/net/ipv4/multipath_wrandom.c   2007-03-12 14:00:33.0 
+0200
+++ 2.6/net/ipv4/multipath_wrandom.c2007-03-12 14:02:17.0 +0200
@@ -216,14 +216,15 @@   last_power = 0;
*rp = decision;
 }
 
-static void wrandom_set_nhinfo(__be32 network,
-  __be32 netmask,
-  unsigned char prefixlen,
-  const struct fib_nh *nh)
+static int wrandom_set_nhinfo(__be32 network,
+ __be32 netmask,
+ unsigned char prefixlen,
+ const struct fib_nh *nh)
 {
const int state_idx = nh->nh_oif % MULTIPATH_STATE_SIZE;
struct multipath_route *r, *target_route = NULL;
struct multipath_dest *d, *target_dest = NULL;
+   int err = 0;
 
/* store the weight information for a certain route */
spin_lock_bh([state_idx].lock);
@@ -240,6 +241,10 @@ static void wrandom_set_nhinfo(__be32 ne
const size_t size_rt = sizeof(struct multipath_route);
target_route = (struct multipath_route *)
kmalloc(size_rt, GFP_ATOMIC);
+   if (!target_route) {
+   err = -ENOMEM;
+   goto error;
+   }
 
target_route->gw = nh->nh_gw;
target_route->oif = nh->nh_oif;
@@ -262,6 +267,10 @@memset(_route->rcu, 0, sizeof(s
target_dest = (struct multipath_dest*)
kmalloc(size_dst, GFP_ATOMIC);
 
+   if (!target_dest) {
+   err = -ENOMEM;
+   goto error;
+   }
target_dest->nh_info = nh;
target_dest->network = network;
target_dest->netmask = netmask;
@@ -275,6 +284,8 @@ memset(_dest->rcu, 0, sizeof(st
 */
 
spin_unlock_bh([state_idx].lock);
+  error:
+   return err;
 }
 
 static void __multipath_free(struct rcu_head *head)
Index: 2.6/net/ipv4/route.c
===
--- 2.6.orig/net/ipv4/route.c   2007-03-12 14:03:24.0 +0200
+++ 2.6/net/ipv4/route.c2007-03-12 14:04:16.0 +0200
@@ -1880,11 +1880,13 @@ for (hop = 0; hop < hopcount; hop++) {
return err;
 
/* forward hop information to multipath impl. */
-   multipath_set_nhinfo(rth,
+   err = multipath_set_nhinfo(rth,

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Bartlomiej Zolnierkiewicz


Hi,

Could you check if this is the same problem as this one:

http://bugzilla.kernel.org/show_bug.cgi?id=8169

Thanks,
Bart

On Monday 12 March 2007, Frank van Maarseveen wrote:
> On Mon, Mar 12, 2007 at 09:54:47AM +0100, Frank van Maarseveen wrote:
> > 
> > 2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
> > -d 1 /dev/hda):
> > 
> > hda: dma_timer_expiry: dma status == 0x20
> > hda: DMA timeout retry
> > hda: timeout waiting for DMA
> > hda: status error: status=0x58 {
> > DriveReady
> > SeekComplete
> > DataRequest
> > }
> 
> I have a totally different PATA based system (P4 HT) with similar symptoms
> except that it seem to recover by switching DMA off during boot after
> 5 errors:
> 
> hda: dma_timer_expiry: dma status == 0x20
> hda: DMA timeout retry
> hda: timeout waiting for DMA
> hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> ide: failed opcode was: unknown
> hda: drive not ready for command
> hda: dma_timer_expiry: dma status == 0x20
> hda: DMA timeout retry
> hda: timeout waiting for DMA
> hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> ide: failed opcode was: unknown
> hda: drive not ready for command
> hda: dma_timer_expiry: dma status == 0x20
> hda: DMA timeout retry
> hda: timeout waiting for DMA
> hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> ide: failed opcode was: unknown
> hda: drive not ready for command
> hda: dma_timer_expiry: dma status == 0x20
> hda: DMA timeout retry
> hda: timeout waiting for DMA
> hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> ide: failed opcode was: unknown
> hda: drive not ready for command
> hda: dma_timer_expiry: dma status == 0x20
> hda: DMA timeout retry
> hda: timeout waiting for DMA
> hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> ide: failed opcode was: unknown
> hda: drive not ready for command
> 
> So in this case it doesn't hang but is not really usable either.
> 
> lspci:
> 00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub 
> Interface (rev 02)
> 00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 
> 02)
> 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
> Controller #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
> Controller #2 (rev 02)
> 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
> Controller #3 (rev 02)
> 00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
> Controller #4 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI 
> Controller (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
> 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface 
> Bridge (rev 02)
> 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE 
> Controller (rev 02)
> 00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 
> 02)
> 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller 
> (rev 02)
> 00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER 
> (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
> 01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] 
> (rev a1)
> 02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet 
> Controller (rev 05)
> 
> This system has SATA but there's only one PATA disk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)

2007-03-12 Thread Nick Piggin

On Mon, Mar 12, 2007 at 12:23:00PM +0300, Dmitriy Monakhov wrote:
> Nick Piggin <[EMAIL PROTECTED]> writes:
> 
> > On Mon, Mar 12, 2007 at 11:55:30AM +0300, Dmitriy Monakhov wrote:
> >> Nick Piggin <[EMAIL PROTECTED]> writes:
> >> 
> >> > On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote:
> >
> >> >> @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb 
> >> >> *iocb, const struct iovec *iov,
> >> >> mutex_lock(>i_mutex);
> >> >> ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
> >> >> >ki_pos);
> >> >> +   /* 
> >> >> +* If __generic_file_aio_write_nolock has failed.
> >> >> +* This may happen because of:
> >> >> +* 1) Bad segment found (failed before actual write attempt)
> >> >> +* 2) Segments are good, but actual write operation failed
> >> >> +*and may have instantiated a few blocks outside i_size.
> >> >> +*   a) in case of buffered write these blocks was already
> >> >> +*  trimmed by generic_file_buffered_write()
> >> >> +*   b) in case of O_DIRECT these blocks weren't trimmed yet.
> >> >> +*
> >> >> +* In case of (2b) these blocks have to be trimmed off again.
> >> >> +*/
> >> >> +   if (unlikely( ret < 0 && file->f_flags & O_DIRECT)) {
> >> >> +   unsigned long nr_segs_avail = nr_segs;
> >> >> +   size_t count = 0;
> >> >> +   if (!generic_segment_checks(iov, _segs_avail, ,
> >> >> +   VERIFY_READ)) {
> >> >> +   /*It is (2b) case, because segments are good*/
> >> >> +   loff_t isize = i_size_read(inode);
> >> >> +   if (pos + count > isize)
> >> >> +   vmtruncate(inode, isize);
> >> >> +   }
> >> >> +   }
> >> >
> >> > OK, but wouldn't this be better to be done in the actual direct IO
> >> > functions themselves? Thus you could be sure that you have the 2b case,
> >> > and the code would be less fragile to something changing?
> >> Ohh, We can't just call vmtruncate() after generic_file_direct_write()
> >> failure while __generic_file_aio_write_nolock() becase where is no 
> >> guarantee
> >> what i_mutex held. In fact all existing fs always invoke 
> >> __generic_file_aio_write_nolock() with i_mutex held in case of S_ISREG 
> >> files,
> >> but this was't explicitly demanded and documented. I've proposed to do it 
> >> in
> >> previous versions of this patch, because it this just document current 
> >> state
> >> of affairs, but David Chinner wasn't agree with it.
> >
> > It seemed like it was documented in the comments that you altered in this
> > patch...
> >
> > How would such a filesystem that did not hold i_mutex propose to fix the
> > problem?
> >
> > The burden should be on those filesystems that might not want to hold
> > i_mutex here, to solve the problem nicely, rather than generic code to take
> > this ugly code.
> Ok then what do you think about this version 
> http://lkml.org/lkml/2006/12/18/103
> witch was posted almost  month ago :)

That seems better, but people might take issue with the fact that it has
to make the check for S_ISREG files. I don't know... people with more
knowledge of the vfs+fs side of things might have better input.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Alistair John Strachan

On Monday 12 March 2007 11:24, Frank van Maarseveen wrote:
> On Mon, Mar 12, 2007 at 09:54:47AM +0100, Frank van Maarseveen wrote:
> > 2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
> > -d 1 /dev/hda):
> >
> > hda: dma_timer_expiry: dma status == 0x20
> > hda: DMA timeout retry
> > hda: timeout waiting for DMA
> > hda: status error: status=0x58 {
> > DriveReady
> > SeekComplete
> > DataRequest
> > }
[snip]
> This system has SATA but there's only one PATA disk

Not a solution, unfortunately, but try disabling CONFIG_IDE and using Alan's 
new PATA drivers. For your Intel systems, this should mean you need only:

CONFIG_ATA_PIIX

For both SATA and PATA support. You'll need the appropriate SCSI modules built 
in (if you say =y), i.e. SCSI disk and SCSI CDROM should be built in.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Con Kolivas

On 12/03/07, Gene Heskett <[EMAIL PROTECTED]> wrote:

On Monday 12 March 2007, Gene Heskett wrote:
>To Con, I knew 2.6.20 worked with your earlier patches, so rather than
>revert all the way, I just rebooted to 2.6.20.2-rdsl-0.30 and I'm going
>to fire off another backup.  I suspect it will work, but will advise the
>next time I wake up.

After posting the above, I thought maybe I'd hit a target in the middle
and build a 2.6.20.2, with your -0.30 patch, but...

I'm going to have to build a 2.6.20.2, because with the rdsl-0.30 patch,
its going to do a level 2 on my /usr/movies directory, which hasn't been
touched in 90 days and has about 8.1GB in it according to du, and its
going to do nearly all of it.  It shouldn't be anything but a directory
listing file. But this is what amstatus is reporting:
coyote:/usr/movies2 7271m dumping  793m ( 10.91%)
(7:26:00)

And its also reporting far more data than exists it seems. As is du,
for /var, which might have 2 gigs, its claiming 3.7!

Honest folks, I'm not smoking anything, I quit 18 years ago.  Back to bed
while this one bombs out too.

--
Cheers, Gene

Gene your last good kernel you said was 2.6.20 based. I don't see a
good reason even to use 2.6.20.2 as a base given that information.

--
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] BUILD_BUG_ON_ZERO -> BUILD_BUG_OR_ZERO

2007-03-12 Thread Robert P. J. Day

On Mon, 12 Mar 2007, Stefan Richter wrote:

> Rusty Russell wrote:
> > On Mon, 2007-03-12 at 08:23 +, Jan Beulich wrote:
> >> I have to admit that I don't see the point here - I can't seem to make
> >> any sense of the OR... Jan
> >
> > At least one other person thought that:
> >
> > #define BUILD_BUG_ON_ZERO(e) BUILD_BUG_ON((e) == 0)
> >
> > OTOH, BUILD_BUG_OR_ZERO says what happens: either it's a build bug, or
> > it's zero.
>
> What about ZERO_UNLESS_BUILD_BUG_ON(e)? It's long though...

how often is this going to be used?  it's not like the tree is
currently awash in calls to BUILD_BUG_ON_ZERO as it is.

rday

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin

On Mon, Mar 12, 2007 at 12:38:29PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > > the issue is this: your fix reduces the effects of the bug but it is 
> > > still fundamentally incomplete because of the use of timer_list. So
> > 
> > But using schedule_timeout is not a bug. Userspace timeouts are always 
> > defined to be "at least".
> 
> but what you are adding isnt a plain schedule_timeout(), it is a restart 
> block handling loop. And for those restart blocks that relate to 
> timeouts, we only use hrtimers. I am not making this up to annoy you: 
> take a look at all the current restart block handlers - they are hrtimer 
> based, for exactly this reason.

So why do you say it is fundamentally incomplete?

> > > instead of trying to fix the bug the wrong way, please try to fix it 
> > > the right way, ontop of an already existing and tested patch, ok? 
> > > That also enables the other neat stuff Thomas talked about.
> > 
> > Well that's nice, but I have a bugfix here which probably needs to get 
> > backported to stable kernels and distro kernels.
> 
> yes but your patch already exists for them which they can pick up.
> 
> really, this is a common Linux principle: fix it completely and fix it 
> the right way. You are applying it yourself on a daily basis when having 
> the maintainer hat on =B-)

I still didn't get anything wrong pointed out with the patch, though.

I'm not arguing against using hrtimers here to fix it the "right way".

> > It should be just as easy to rebase the hrtimer patch on top of my 
> > fix. Considering that you've had it for a year, I don't think it needs 
> > to be added right before my fix.
> 
> your latest patch looks quite kludgy, exactly due to the issues that 
> were mentioned.

I don't see what is kludgy, unless you consider converting to and from
absolute timeouts. But the userspace API is relative time based, so
hrtimers doesn't change that.

> > > hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
> > > behavior has been there for like forever.
> > 
> > People ignore LTP test failures, and programs probably try to avoid 
> > exercising the nuances of the unix signal API, I guess.
> 
> then there's no rush and lets do this the right way, ok?

There is no rush to use hrtimers. I would have thought it fairly important
to actually reach correctness, though. We're not talking about completely
changing the design of something such that it will take a lot of work to
"fix it properly". If that were the issue, then I would consider the
hrtimer conversion as part of the fix.

And if you talk about doing it the right way, then I don't think it is
strictly the right way to reimplement the function, including known bugs,
to be slightly more efficient, and *then* fixing those bugs. I'd actually
consider it better to fix the bugs first, not only because of the backport
issue, but because it generally makes it easier to track the injection and
removal points of bugs in the history.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Gene Heskett

On Monday 12 March 2007, Gene Heskett wrote:
>On Monday 12 March 2007, Radoslaw Szkodzinski wrote:
>>On 3/11/07, Gene Heskett <[EMAIL PROTECTED]> wrote:
>>> On Sunday 11 March 2007, Mike Galbraith wrote:
>>>
>>> Just to comment, I've been running one of the patches between 20-ck1
>>> and this latest one, which is building as I type, but I also run
>>> gkrellm here, version 2.2.9.
>>>
>>> Since I have been running this middle of this series patch, something
>>> is killing gkrellm about once a day, and there is nothing in the logs
>>> to indicate a problem.  I see a blink out of the corner of my eye,
>>> and its gone.  And it always starts right back up from a kmenu click.
>>>
>>> No idea if anyone else is experiencing this or not.
>>>
>>> --
>>> Cheers, Gene
>>
>>I've had such an issue with 0.20 or something. Sometimes, the
>>xfce4-panel would disappear (die) when I displayed its menu.
>>Very rare issue.
>>
>>Doesn't happen with 0.28 anyway. :-) Which looks really good, though
>>I'll update to 0.30.
>
>And I didn't see it for the few hours I was booted to 21-rc3-rsdl-0.29,
>but tar sure went berzackers.
>
>To Con, I knew 2.6.20 worked with your earlier patches, so rather than
>revert all the way, I just rebooted to 2.6.20.2-rdsl-0.30 and I'm going
>to fire off another backup.  I suspect it will work, but will advise the
>next time I wake up.

After posting the above, I thought maybe I'd hit a target in the middle 
and build a 2.6.20.2, with your -0.30 patch, but...

I'm going to have to build a 2.6.20.2, because with the rdsl-0.30 patch, 
its going to do a level 2 on my /usr/movies directory, which hasn't been 
touched in 90 days and has about 8.1GB in it according to du, and its 
going to do nearly all of it.  It shouldn't be anything but a directory 
listing file. But this is what amstatus is reporting:
coyote:/usr/movies2 7271m dumping  793m ( 10.91%) 
(7:26:00)

And its also reporting far more data than exists it seems. As is du, 
for /var, which might have 2 gigs, its claiming 3.7!

Honest folks, I'm not smoking anything, I quit 18 years ago.  Back to bed 
while this one bombs out too.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Operative: "I'm a monster.  What I do is evil, I've no illusions about 
that.  
But it must be done."
--"Serenity"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.

2007-03-12 Thread Jarek Poplawski

On 09-03-2007 08:29, David Miller wrote:
> From: Amit Choudhary <[EMAIL PROTECTED]>
> Date: Thu, 8 Mar 2007 23:22:15 -0800
> 
>> Description: Check the return value of kmalloc() in function 
>> wrandom_set_nhinfo(), in file net/ipv4/multipath_wrandom.c.
>>
>> Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>
> 
> This kind of patch has been submitted several times before and it's
> never accepted because you have to do much more than this to recover
> from the allocation error.
> 
> There is no error status returned to the caller, so the callers assume
> the operation succeeded, and will either OOPS or crash in some other
> way.
> 
> Therefore, just adding some NULL pointer checks and returning is not
> going to fix this bug.
> 
> The whole cahce-multipath subsystem has to have it's guts revamped for
> proper error handling.

But until then it'll unnecessarily spoil linux opinion as regards
stability and waste time of developers to check error messages.
So, maybe it's less evil to check those NULLs where possible and add
some WARN_ONs here and there...

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/8] per backing_dev dirty and writeback page accounting

2007-03-12 Thread Miklos Szeredi

> > I have no idea how serious the scalability problems with this are.  If
> > they are serious, different solutions can probably be found for the
> > above, but this is certainly the simplest.
> 
> Atomic operations to a single per-backing device from all CPUs at once?
> That's a pretty serious scalability issue and it will cause a major
> performance regression for XFS.

OK.  How about just accounting writeback pages?  That should be much
less of a problem, since normally writeback is started from
pdflush/kupdate in large batches without any concurrency.

Or is it possible to export the state of the device queue to mm?
E.g. could balance_dirty_pages() query the backing dev if there are
any outstanding write requests?

> I'd call this a showstopper right now - maybe you need to look at
> something like the ZVC code that Christoph Lameter wrote, perhaps?

That's rather a heavyweight approach for this I think.

The only info balance_dirty_pages() really needs is whether there are
any dirty+writeback bound for the backing dev or not.

It knows about the diry pages, since it calls writeback_inodes() which
scans the dirty pages for this backing dev looking for ones to write
out.  If after returning from writeback_inodes() wbc->nr_to_write
didn't decrease and wbc->pages_skipped is zero then we know that there
are no more dirty pages for the device.  Or at least there are no
dirty pages which aren't already under writeback.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] BUILD_BUG_ON_ZERO -> BUILD_BUG_OR_ZERO

2007-03-12 Thread Stefan Richter

Rusty Russell wrote:
> On Mon, 2007-03-12 at 08:23 +, Jan Beulich wrote:
>> I have to admit that I don't see the point here - I can't seem to make
>> any sense of the OR... Jan
> 
> At least one other person thought that:
> 
>   #define BUILD_BUG_ON_ZERO(e) BUILD_BUG_ON((e) == 0)
> 
> OTOH, BUILD_BUG_OR_ZERO says what happens: either it's a build bug, or
> it's zero.

What about ZERO_UNLESS_BUILD_BUG_ON(e)? It's long though...
-- 
Stefan Richter
-=-=-=== --== -==--
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > the issue is this: your fix reduces the effects of the bug but it is 
> > still fundamentally incomplete because of the use of timer_list. So
> 
> But using schedule_timeout is not a bug. Userspace timeouts are always 
> defined to be "at least".

but what you are adding isnt a plain schedule_timeout(), it is a restart 
block handling loop. And for those restart blocks that relate to 
timeouts, we only use hrtimers. I am not making this up to annoy you: 
take a look at all the current restart block handlers - they are hrtimer 
based, for exactly this reason.

> > instead of trying to fix the bug the wrong way, please try to fix it 
> > the right way, ontop of an already existing and tested patch, ok? 
> > That also enables the other neat stuff Thomas talked about.
> 
> Well that's nice, but I have a bugfix here which probably needs to get 
> backported to stable kernels and distro kernels.

yes but your patch already exists for them which they can pick up.

really, this is a common Linux principle: fix it completely and fix it 
the right way. You are applying it yourself on a daily basis when having 
the maintainer hat on =B-)

> It should be just as easy to rebase the hrtimer patch on top of my 
> fix. Considering that you've had it for a year, I don't think it needs 
> to be added right before my fix.

your latest patch looks quite kludgy, exactly due to the issues that 
were mentioned.

> > hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
> > behavior has been there for like forever.
> 
> People ignore LTP test failures, and programs probably try to avoid 
> exercising the nuances of the unix signal API, I guess.

then there's no rush and lets do this the right way, ok?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3/6] 2.6.21-rc2: known regressions

2007-03-12 Thread Tejun Heo

Mathieu Bérard wrote:
> Jeff Garzik a écrit :
>> Adrian Bunk wrote:
>>> Subject: NCQ problem with ahci and Hitachi drive
>>> References : http://lkml.org/lkml/2007/3/4/178
>>> Submitter  : Mathieu Bérard <[EMAIL PROTECTED]>
>>> Status : unknown
>> according to the last message in that thread, it sounds like ACPI and
>> interrupt problems
>>
> Hi,
> after more testing with a 2.6.21-rc3, it appears that after several ata
> errors the boot process
> somehow continued as normal, after a "NCQ disabled due to excessive
> errors" message.
> "pci=noacpi" or "noacpi" parameters workarounds the problem "irqpoll"
> does nothing.

I was mistaken.  It can't be IRQ routing problem.  I somehow thought the
port was a ata_piix one.  Considering the reported broken NCQ feature on
the device GTF might be mangling with the drive to disable NCQ or
something.  Does giving "libata.noacpi=1" make any difference?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > if HIGH_RES_TIMERS is disabled then that is what happens. But 
> > frankly,
> 
> disabled? I would expect it (= more wakeups) when hrtimers are 
> enabled.

i mean the groupping of timer expiries happens automatically when 
high-res is disabled. When high-res is asked for then, duh, it's enabled 
and you get precise timeouts ;-)

> > most futex waits are without timeouts - if an application cares 
> > about micro-effects like that then you are much better off not using 
> > a per-futex timeout anywa
> 
> That sounds like you're arguing for not using hrtimers here because 
> the applications shouldn't depend on precise timeouts here anyways?!?

I was talking about the "micro-effect" of grouping timer expiries.

> Anyways when you convert more kernel timeouts to hrtimers you should 
> probably add some kind of batching facility that can be globally 
> configured with a sysctl or similar. Then at least laptops (and 
> possibly servers) can opt for more power saving again. For the futexes 
> alone it probably won't matter too much agreed, but I see a trend to 
> more hr.

yeah, we had that in earlier versions, it's trivial - nobody used it. So 
i'll wait for the actual measurements and a patch (or i can do the patch 
too, if someone comes up with the measurements). I've added 
/proc/timer_stats and /proc/timer_info for exactly such reasons.

( note that we've added the facility for even more imprecise sleeps to 
  the timer_list APIs - but for hrtimers it's a lot less clear-cut case. )

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin

On Mon, Mar 12, 2007 at 12:19:58PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > > even if this means more work for you (i'm sorry about that!) i'm 
> > > quite sure we should take Sebastien's hrtimers based implementation 
> > > of futex_wait(), and use the nanosleep method to restart it. There's 
> > > no point in further tweaking the imprecise approach: whenever some 
> > > timeout needs to be restarted, it's a candidate for hrtimers.
> > 
> > Absolute timeout is needed, sure. But once that is done, hrtimers does 
> > not fix a bug, does it?
> 
> the issue is this: your fix reduces the effects of the bug but it is 
> still fundamentally incomplete because of the use of timer_list. So 

But using schedule_timeout is not a bug. Userspace timeouts are always
defined to be "at least".

> instead of trying to fix the bug the wrong way, please try to fix it the 
> right way, ontop of an already existing and tested patch, ok? That also 
> enables the other neat stuff Thomas talked about.

Well that's nice, but I have a bugfix here which probably needs to
get backported to stable kernels and distro kernels.

It should be just as easy to rebase the hrtimer patch on top of my
fix. Considering that you've had it for a year, I don't think it 
needs to be added right before my fix.

> > > until then, glibc already handles timeouts and restarts it manually.
> > 
> > It isn't timeout handling that is buggy, but EINTR behaviour. And 
> > glibc does not handle that here.
> 
> hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
> behavior has been there for like forever.

People ignore LTP test failures, and programs probably try to avoid
exercising the nuances of the unix signal API, I guess.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Mike Galbraith

On Mon, 2007-03-12 at 12:08 +0100, Ingo Molnar wrote:
> * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> 
> > The test scenario was one any desktop user might do with every 
> > expectation responsiveness of the interactive application remain 
> > intact. I understand the concepts here Con, and I'm not knocking your 
> > scheduler. I find it to be a step forward on the one hand, but a step 
> > backward on the other.
> 
> ok, then that step backward needs to be fixed.

btw, this scenario wasn't invented by me, it came from the _every single
day_ usage of my best friend since his conversion to linux (he's in love
now;) a month ago.  After I un-crippled all of the multimedia apps that
came with our distribution, this is the thing he has been doing most.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-12 Thread Al Boldi

Con Kolivas wrote:
> On Monday 12 March 2007 15:42, Al Boldi wrote:
> > Con Kolivas wrote:
> > > On Monday 12 March 2007 08:52, Con Kolivas wrote:
> > > > And thank you! I think I know what's going on now. I think each
> > > > rotation is followed by another rotation before the higher priority
> > > > task is getting a look in in schedule() to even get quota and add it
> > > > to the runqueue quota. I'll try a simple change to see if that
> > > > helps. Patch coming up shortly.
> > >
> > > Can you try the following patch and see if it helps. There's also one
> > > minor preemption logic fix in there that I'm planning on including.
> > > Thanks!
> >
> > Applied on top of v0.28 mainline, and there is no difference.
> >
> > What's it look like on your machine?
>
> The higher priority one always get 6-7ms whereas the lower priority one
> runs 6-7ms and then one larger perfectly bound expiration amount.
> Basically exactly as I'd expect. The higher priority task gets precisely
> RR_INTERVAL maximum latency whereas the lower priority task gets
> RR_INTERVAL min and full expiration (according to the virtual deadline) as
> a maximum. That's exactly how I intend it to work. Yes I realise that the
> max latency ends up being longer intermittently on the niced task but
> that's -in my opinion- perfectly fine as a compromise to ensure the nice 0
> one always gets low latency.

I think, it should be possible to spread this max expiration latency across 
the rotation, should it not?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Con Kolivas

On Monday 12 March 2007 22:08, Ingo Molnar wrote:
> * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > The test scenario was one any desktop user might do with every
> > expectation responsiveness of the interactive application remain
> > intact. I understand the concepts here Con, and I'm not knocking your
> > scheduler. I find it to be a step forward on the one hand, but a step
> > backward on the other.
>
> ok, then that step backward needs to be fixed.
>
> > > We are getting good interactive response with a fair scheduler yet
> > > you seem intent on overloading it to find fault with it.
> >
> > I'm not trying to find fault, I'm TESTING AND REPORTING.  Was.
>
> Con, could you please take Mike's report of this regression seriously
> and address it? Thanks,

Sure. 

Mike the cpu is being proportioned out perfectly according to fairness as I 
mentioned in the prior email, yet X is getting the lower latency scheduling. 
I'm not sure within the bounds of fairness what more would you have happen to 
your liking with this test case?

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 09:54:47AM +0100, Frank van Maarseveen wrote:
> 
> 2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
> -d 1 /dev/hda):
> 
>   hda: dma_timer_expiry: dma status == 0x20
>   hda: DMA timeout retry
>   hda: timeout waiting for DMA
>   hda: status error: status=0x58 {
>   DriveReady
>   SeekComplete
>   DataRequest
>   }

I have a totally different PATA based system (P4 HT) with similar symptoms
except that it seem to recover by switching DMA off during boot after
5 errors:

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command

So in this case it doesn't hang but is not really usable either.

lspci:
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub 
Interface (rev 02)
00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI 
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface 
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller 
(rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 
02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) 
AC'97 Audio Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] 
(rev a1)
02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet 
Controller (rev 05)

This system has SATA but there's only one PATA disk

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [QUICKLIST 0/6] Arch independent quicklists V1

2007-03-12 Thread David Miller

From: Christoph Lameter <[EMAIL PROTECTED]>
Date: Mon, 12 Mar 2007 04:12:32 -0700 (PDT)

> On Sun, 11 Mar 2007, David Miller wrote:
> 
> > I'm going to make the radical declaration that it be perhaps often
> > better to always initialize page table chunks to all zeros on
> > allocation.
> 
> That is the case if most of the page is going to be used soon. If we have 
> sparse access patterns then not zeroing can avoid uselessly bringing 
> cachelines in.

Good point.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 980 matches

Mail list logo