Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Davide Libenzi wrote: I know, we agree there. But that did not fit my Pirates of the Caribbean quote :) Ahh, I'm clearly not cultured enough, I didn't catch that reference. Linus yes, I've seen the movie, but it apparently left more of a mark in other people Torvalds - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch -mm 3/3] RFC: Introduce kobject-owner for refcounting.
On Wed, 2007-04-18 at 11:20 -0400, Alan Stern wrote: On Wed, 18 Apr 2007, Rusty Russell wrote: Hi Alan, Your assertion is correct. I haven't studied the driver core, so I might be off-base here, but you'll note that if the module references the core kmalloc'ed object rather than the other way around it can be done safely. The core can also reference the module, but it must be able to live without it once it's gone (eg. by returning -ENOENT). Live without it once it's gone... Do you mean once the object is gone or once the module is gone? The core in general has no way to know when the module is gone; all it knows about is the object. The trouble arises when the module is gone (whether the core knows it or not) but the object is still present. Hi Alan, I meant that the module is gone: it has told the object (via unregister_xxx) that it's gone. A really poor example is below: ... The example is fine as far as it goes, but it assumes that all interactions with the underlying r-foo object can be done under a spinlock. Of course this isn't true in general. There are certainly other ways of doing it, such as a mutex, a refcnt completion (for function pointers), or disabling preemption across the access and using stop_machine(). Of course, these add complexity. This is the reason that I've always disliked module removal. We have a lot of code to deal with it and it has awkward semantics (unless --wait is used). OTOH, I'm not a fan of the network approach, either: I feel that bringing up an interface should bump the refcnt of the module which implements that interface. Currently taking out e1000 will just kill my eth0. Cheers, Rusty. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Upgraded to 2.6.20.7 - positives
Chuck Ebbert wrote: Denis Vlasenko wrote: * From make menuconfig questions it looks like SATA/PATA rewrite (in the form of libata) is almost finished. Hehe, untangling IDE mess was quite a feat, and Jeff did it. Kudos. ADMA mode on nvidia chipsets still seems broken despite massive amount of SATA fixes backported from 2.6.21... News to me.. pleast post details. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Chris Friesen wrote: Mark Glines wrote: One minor question: is it even possible to be completely fair on SMP? For instance, if you have a 2-way SMP box running 3 applications, one of which has 2 threads, will the threaded app have an advantage here? (The current system seems to try to keep each thread on a specific CPU, to reduce cache thrashing, which means threads and processes alike each get 50% of the CPU.) I think the ideal in this case would be to have both threads on one cpu, with the other app on the other cpu. This gives inter-process fairness while minimizing the amount of task migration required. Solving this sort of issue was one of the reasons for the smpnice patches. More interesting is the case of three processes on a 2-cpu system. Do we constantly migrate one of them back and forth to ensure that each of them gets 66% of a cpu? Depends how keen you are on fairness. Unless the process are long term continuously active tasks that never sleep it's probably not an issue as they'll probably move around enough in the long term for them each to get 66% over the long term. Exact load balancing for real work loads (where tasks are coming and going, sleeping and waking semi randomly and over relatively brief periods) is probably unattainable because by the time you've work out the ideal placement of the currently runnable tasks on the available CPUs it's all changed and the solution is invalid. The best you can hope for that change isn't so great as to completely invalidate the solution and the changes you make as a result are an improvement on the current allocation of processes to CPUs. The above probably doesn't hold for some systems such as those large super computer jobs that run for several days but they're probably best served by explicit allocation of processes to CPUs using the process affinity mechanism. Peter -- Peter Williams [EMAIL PROTECTED] Learning, n. The kind of ignorance distinguishing the studious. -- Ambrose Bierce - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: problem with asm/semaphore.h
liangbowen wrote: Hi I compiled the following code with gcc under FC2 : #include asm/semaphore.h main() { struct semaphore sum; } It doesn't compile, saying storage size of `sem' isn't known. and I looked inside asm/semaphore.h, I saw: #ifndef I386_SEMAPHORE_H #define I386_SEMAPHORE_H #include linux/linkage.h #endif Did I missed something? Please guide me how to fix it. Sincerely You're trying to use a kernel data structure in a user-space program. Don't. The definitions in that header are inside #ifdef __KERNEL__ and so the provided userspace headers remove that part. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CONFIG_PACKET_MMAP should depend on MMU
On 4/18/07, David Howells [EMAIL PROTECTED] wrote: Aubrey Li [EMAIL PROTECTED] wrote: Here, in the attachment I wrote a small test app. Please correct if there is anything wrong, and feel free to improve it. Okay... I have that working... probably. I don't know what output it's supposed to produce, but I see this: # /packet-mmap/sample_packet_mmap 00-00-00-01-00-00-00-8a-00-00-00-8a-00-42-00-50- 38-43-13-a0-00-07-ff-3c-00-00-00-00-00-00-00-00- 00-11-08-00-00-00-00-01-00-01-00-06-00-d0-b7-de- 32-7b-00-00-00-00-00-00-00-00-00-00-00-00-00-00- 00-00-00-90-cc-a2-75-6b-00-d0-b7-de-32-7b-08-00- 45-00-00-7c-00-00-40-00-40-11-b4-13-c0-a8-02-80- c0-a8-02-8d-08-01-03-20-00-68-8e-65-7f-5b-7e-03- 00-00-00-01-00-00-00-00-00-00-00-00-00-00-00-00- 00-00-00-00-00-00-00-00-00-00-00-01-00-00-81-a4- 00-00-00-01-00-00-00-00-00-00-00-00-00-1d-b8-86- 00-00-10-00-ff-ff-ff-ff-00-00-0e-f0-00-00-09-02- 01-cb-03-16-46-26-38-0d-00-00-00-00-46-26-38-1e- 00-00-00-00-46-26-38-1e-00-00-00-00-00-00-00-00- 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00- [repeated] Does that look reasonable? Yes, it's reasonable for me, as long as your host IP is 192.168.2.128 and target IP is 192.168.2.141 See below 00-90-cc-a2-75-6b-|___ MAC Address 00-d0-b7-de-32-7b-| 08-00Type: IP 45-00Ver, IHL, TOS 00-7cIP.total.length 00-00- 40-00- 40TTL 11UDP protocol b4-13Checksum c0-a8-02-80---Source IP: 192.168.2.128 c0-a8-02-8d---Dest IP: 192.168.2.141 snip-- I've attached the preliminary patch. Thanks, I'll take a look and try to see if I can give some feedback. -Aubrey Note four things about it: (1) I've had to add the get_unmapped_area() op to the proto_ops struct, but I've only done it for CONFIG_MMU=n as making it available for CONFIG_MMU=y could cause problems. (2) There's a race between packet_get_unmapped_area() being called and packet_mmap() being called. (3) I've added an extra check into packet_set_ring() to make sure the caller isn't asking for a combination of buffer size and count that will exceed ULONG_MAX. This protects a multiply done elsewhere. (4) The entire data buffer is allocated as one contiguous lump in NOMMU-mode. David --- [PATCH] NOMMU: Support mmap() on AF_PACKET sockets From: David Howells [EMAIL PROTECTED] Support mmap() on AF_PACKET sockets in NOMMU-mode kernels. Signed-Off-By: David Howells [EMAIL PROTECTED] --- include/linux/net.h|7 +++ include/net/sock.h |8 +++ net/core/sock.c| 10 net/packet/af_packet.c | 118 net/socket.c | 77 +++ 5 files changed, 219 insertions(+), 1 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index 4db21e6..9e77cf6 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -161,6 +161,11 @@ struct proto_ops { int (*recvmsg) (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t total_len, int flags); +#ifndef CONFIG_MMU + unsigned long (*get_unmapped_area)(struct file *file, struct socket *sock, +unsigned long addr, unsigned long len, +unsigned long pgoff, unsigned long flags); +#endif int (*mmap) (struct file *file, struct socket *sock, struct vm_area_struct * vma); ssize_t (*sendpage) (struct socket *sock, struct page *page, @@ -191,6 +196,8 @@ extern int sock_sendmsg(struct socket *sock, struct msghdr *msg, extern int sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags); extern int sock_map_fd(struct socket *sock); +extern void sock_make_mappable(struct socket *sock, + unsigned long prot); extern struct socket *sockfd_lookup(int fd, int *err); #define sockfd_put(sock) fput(sock-file) extern int net_ratelimit(void); diff --git a/include/net/sock.h b/include/net/sock.h index 2c7d60c..d91edea 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -841,6 +841,14 @@ extern int sock_no_sendmsg(struct kiocb *, struct socket *, struct msghdr *, size_t); extern int sock_no_recvmsg(struct kiocb *, struct socket *,
Re: CPU_IDLE prevents resuming from STR [was: Re: 2.6.21-rc6-mm1]
On Wed, 2007-04-18 at 19:00 -0400, Joshua Wise wrote: On Tue, 17 Apr 2007, Shaohua Li wrote: Looks there is init order issue of sysfs files. The new refreshed patch should fix your bug. Yes, that did fix the hang on resume from STR -- that now works fine. However: [EMAIL PROTECTED]:/sys/devices/system/cpu/cpuidle$ cat available_drivers current_driver NULL [EMAIL PROTECTED]:/sys/devices/system/cpu/cpuidle$ cat available_governors current_governor ladder ladder it's correct and looks you didn't compile the acpi processor module. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PCI Express MMCONFIG and BIOS Bug messages..
I've seen a lot of systems (including brand new Xeon-based servers from IBM and HP) that output messages on boot like: PCI: BIOS Bug: MCFG area at f000 is not E820-reserved PCI: Not using MMCONFIG. As I understand it, this is sort of a sanity check mechanism to make sure the MCFG address reported is remotely reasonable and intended to be used as such. Problem is, I doubt the BIOS authors would agree that this constitutes a bug. Microsoft is providing a lot of the direction for BIOS writers, and have a look at this presentation PCI Express, Windows, And The Legacy Transition from back in 2004: http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/TW04047_WINHEC2004.ppt On page 14, Existing Windows - Reserve MMCONFIG: Existing Windows versions won’t understand MCFG table * Backwards-compatible range reservation must be used Report range in ACPI Motherboard Resources *_CRS of PNP0C02 node * PNP0C02 must be at \_SB scope * Range must be marked as consumed Do not include range in _CRS of PCI root bus * If included, OS will assume that this range can be allocated to devices E820 table/EFI memory map * Not necessary to describe MMConfig here * For Windows, these are used to describe RAM * No harm in including range as reserved either So Microsoft is explicitly telling the BIOS developers that there is no need to reserve the MMCONFIG space in the E820 table because Windows doesn't care. On that basis it doesn't seem like a valid check to require it to be so reserved, then. Really, I think we should be basing this check on whether the corresponding memory range is reserved in the ACPI resources, like Windows expects. This does require putting more fingers into ACPI from this early boot stage, though.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc6-mm1 ATA HPT37x regression
John == John Stoffel [EMAIL PROTECTED] writes: Ok, so do I need to do anything special with the next -mm release and the next version? Well, let Alan decide that (2Alan: and I said that HPT code is bogus :-). Alan Try drivers/ide/pci/hpt366 - if that works grab a dmesg and let Alan me know. It means that Sergei's DPLL sync code seems to work Alan better than the vendor code and its time to swap it over. John Ok, I'll give that a whirl under 2.6.21-rc7 tonight. I'll build them John in modular so I can switch around more easily. I hope. :] Ok, here's the dmesg output using the hpt366 old IDE driver, 2.6.21-rc7, SMP: [ 160.926355] HPT302: IDE controller at PCI slot :03:06.0 [ 160.928030] ACPI: PCI Interrupt :03:06.0[A] - GSI 18 (level, low) - IRQ 18 [ 160.931212] HPT302: chipset revision 1 [ 160.932801] HPT302: DPLL base: 66 MHz, f_CNT: 100, assuming 33 MHz PCI [ 160.941157] HPT302: using 66 MHz DPLL clock [ 160.942646] HPT302: 100% native mode on irq 18 [ 160.943918] ide2: BM-DMA at 0xe800-0xe807, BIOS settings: hde:DMA, hdf:pi o [ 160.946636] ide3: BM-DMA at 0xe808-0xe80f, BIOS settings: hdg:DMA, hdh:pi o [ 160.949439] Probing IDE interface ide2... [ 161.213560] hde: WDC WD1200JB-00CRA1, ATA DISK drive [ 161.828020] ide2 at 0xecf8-0xecff,0xecf2 on irq 18 [ 161.829616] Probing IDE interface ide3... [ 162.094086] hdg: WDC WD1200JB-00EVA0, ATA DISK drive [ 162.709002] ide3 at 0xece0-0xece7,0xecda on irq 18 Which looks ok to me I guess. It found my MD disks on there and assmebled them, eventually. *grin* I'll reboot and send out the corresponding ATA HPT37x driver dmesg... John - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy
Hi. On Thu, 2007-04-19 at 00:22 +0200, Christian Hesse wrote: On Thursday 19 April 2007, Ingo Molnar wrote: * Christian Hesse [EMAIL PROTECTED] wrote: although probably your suspend2 problem is still not fixed, it's worth a try nevertheless. Which suspend2 patch did you apply, and was it against -rc6 or -rc7? You are right again. ;-) Linux 2.6.21-rc7 Suspend2 2.2.9.11 (applies cleanly to -rc7) CFS v3 (without any additional patches) And it still hangs on suspend. what's the easiest way for me to try suspend2? Apply the patch, reboot into the kernel, then execute what command to suspend? (there's a confusing mismash of initiators of all the suspend variants. Can i drive this by echoing to /sys/power/state?) Perhaps you have to install suspend2-userui as well for the output (I'm not shure whether it works without). Then you can trigger the suspend by echoing to /sys/power/suspend2/do_suspend. Useful informations can be found in the Howto: http://www.suspend2.net/HOWTO I dropped some ccs to not abuse Linus and friends. You can suspend and resume without it. Regards, Nigel signature.asc Description: This is a digitally signed message part
PCI: Unable to handle 64-bit address space for
Hi all, Anyone has idea of this: Why it is displayed on boot? How to fix this? Or at least not to display this message? Using 2.6.9-42.ELsmp. PCI: Probing PCI hardware (bus 00) PCI: Ignoring BAR0-3 of IDE controller :00:1f.1 PCI: Unable to handle 64-bit address space for PCI: Unable to handle 64-bit address space for PCI: Unable to handle 64-bit address space for PCI: Unable to handle 64-bit address space for PCI: Unable to handle 64-bit address space for PCI: Unable to handle 64-bit address space for PCI: Unable to handle 64-bit address space for PCI: Unable to handle 64-bit address space for PCI: Unable to handle 64-bit address space for Thanks for the help, Michael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy
Hi. On Wed, 2007-04-18 at 18:56 -0400, Bob Picco wrote: Ingo Molnar wrote:[Wed Apr 18 2007, 06:02:28PM EDT] * Christian Hesse [EMAIL PROTECTED] wrote: although probably your suspend2 problem is still not fixed, it's worth a try nevertheless. Which suspend2 patch did you apply, and was it against -rc6 or -rc7? You are right again. ;-) Linux 2.6.21-rc7 Suspend2 2.2.9.11 (applies cleanly to -rc7) CFS v3 (without any additional patches) And it still hangs on suspend. what's the easiest way for me to try suspend2? Apply the patch, reboot into the kernel, then execute what command to suspend? (there's a confusing mismash of initiators of all the suspend variants. Can i drive this by echoing to /sys/power/state?) Ingo I had hoped to collect more data with CFS V2. It crashes in scale_nice_down for s2ram when attempting to disable_nonboot_cpus. So part of traceback looks like (typed by hand with obvious omissions): scale_nice_down update_stats_wait_end - not shown in traceback because inlined pick_next_task_fair migration_call task_rq_lock notifier_call_chain _cpu_down disable_nonboot_cpus ... This is standard -rc7 with V2 CFS applied. It could be a completely unrelated issue. I'll attempt to debug further tomorrow. That - and Christian's other reply with the jpg - look to me more like this is an interaction between CFS and cpu hotplugging than Suspend2 itself. Can you also reproduce this with swsusp? Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: 2.6.21-rc6-mm1 ATA HPT37x regression
John == John Stoffel [EMAIL PROTECTED] writes: John == John Stoffel [EMAIL PROTECTED] writes: Ok, so do I need to do anything special with the next -mm release and the next version? Well, let Alan decide that (2Alan: and I said that HPT code is bogus :-). Alan Try drivers/ide/pci/hpt366 - if that works grab a dmesg and let Alan me know. It means that Sergei's DPLL sync code seems to work Alan better than the vendor code and its time to swap it over. John Ok, I'll give that a whirl under 2.6.21-rc7 tonight. I'll build them John in modular so I can switch around more easily. I hope. :] John Ok, here's the dmesg output using the hpt366 old IDE driver, John 2.6.21-rc7, SMP: John [ 160.926355] HPT302: IDE controller at PCI slot :03:06.0 John [ 160.928030] ACPI: PCI Interrupt :03:06.0[A] - GSI 18 (level, low) - IRQ John 18 John [ 160.931212] HPT302: chipset revision 1 John [ 160.932801] HPT302: DPLL base: 66 MHz, f_CNT: 100, assuming 33 MHz PCI John [ 160.941157] HPT302: using 66 MHz DPLL clock John [ 160.942646] HPT302: 100% native mode on irq 18 John [ 160.943918] ide2: BM-DMA at 0xe800-0xe807, BIOS settings: hde:DMA, hdf:pi John o John [ 160.946636] ide3: BM-DMA at 0xe808-0xe80f, BIOS settings: hdg:DMA, hdh:pi John o John [ 160.949439] Probing IDE interface ide2... John [ 161.213560] hde: WDC WD1200JB-00CRA1, ATA DISK drive John [ 161.828020] ide2 at 0xecf8-0xecff,0xecf2 on irq 18 John [ 161.829616] Probing IDE interface ide3... John [ 162.094086] hdg: WDC WD1200JB-00EVA0, ATA DISK drive John [ 162.709002] ide3 at 0xece0-0xece7,0xecda on irq 18 John Which looks ok to me I guess. It found my MD disks on there and John assmebled them, eventually. *grin* John I'll reboot and send out the corresponding ATA HPT37x driver dmesg... And here's the output (much more verbose!) from the hpt37x ATA driver: [ 158.712007] hpt37x: HPT302: Bus clock 33MHz. [ 158.713390] ACPI: PCI Interrupt :03:06.0[A] - GSI 18 (level, low) - IRQ 18 [ 158.716254] ata5: PATA max UDMA/133 cmd 0x0001ecf8 ctl 0x0001ecf2 bmdma 0x000 1e800 irq 18 [ 158.719019] ata6: PATA max UDMA/133 cmd 0x0001ece0 ctl 0x0001ecda bmdma 0x000 1e808 irq 18 [ 158.722257] scsi7 : pata_hpt37x [ 158.878133] ata5.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100 [ 158.879576] ata5.00: 234441648 sectors, multi 16: LBA [ 158.880934] Find mode for 12 reports C829C62 [ 158.882240] Find mode for DMA 69 reports 1C6DDC62 [ 158.888152] ata5.00: configured for UDMA/100 [ 158.889437] scsi8 : pata_hpt37x [ 158.900338] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 [ 158.901660] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx [ 159.047026] ata6.00: ATA-6: WDC WD1200JB-00EVA0, 15.05R15, max UDMA/100 [ 159.048412] ata6.00: 234441648 sectors, multi 16: LBA48 [ 159.050008] Find mode for 12 reports C829C62 [ 159.051371] Find mode for DMA 69 reports 1C6DDC62 [ 159.057079] ata6.00: configured for UDMA/100 [ 159.063655] scsi 7:0:0:0: Direct-Access ATA WDC WD1200JB-00C 17.0 PQ : 0 ANSI: 5 [ 159.067506] SCSI device sdi: 234441648 512-byte hdwr sectors (120034 MB) [ 159.069004] sdi: Write Protect is off [ 159.070412] sdi: Mode Sense: 00 3a 00 00 [ 159.070487] SCSI device sdi: write cache: enabled, read cache: enabled, doesn 't support DPO or FUA [ 159.073427] SCSI device sdi: 234441648 512-byte hdwr sectors (120034 MB) [ 159.074882] sdi: Write Protect is off [ 159.076262] sdi: Mode Sense: 00 3a 00 00 [ 159.076339] SCSI device sdi: write cache: enabled, read cache: enabled, doesn 't support DPO or FUA [ 159.079097] sdi: sdi1 [ 159.097634] sd 7:0:0:0: Attached scsi disk sdi [ 159.099212] sd 7:0:0:0: Attached scsi generic sg9 type 0 [ 159.102344] scsi 8:0:0:0: Direct-Access ATA WDC WD1200JB-00E 15.0 PQ : 0 ANSI: 5 [ 159.106197] SCSI device sdj: 234441648 512-byte hdwr sectors (120034 MB) [ 159.107722] sdj: Write Protect is off [ 159.109188] sdj: Mode Sense: 00 3a 00 00 [ 159.109271] SCSI device sdj: write cache: enabled, read cache: enabled, doesn 't support DPO or FUA [ 159.112455] SCSI device sdj: 234441648 512-byte hdwr sectors (120034 MB) [ 159.114094] sdj: Write Protect is off [ 159.115870] sdj: Mode Sense: 00 3a 00 00 [ 159.115943] SCSI device sdj: write cache: enabled, read cache: enabled, doesn 't support DPO or FUA [ 159.118965] sdj: sdj1 [ 159.138036] sd 8:0:0:0: Attached scsi disk sdj [ 159.139682] sd 8:0:0:0: Attached scsi generic sg10 type 0 In both cases, my RAID1 disks are found and come up cleanly, which is good. Thanks for all the work you guys have done on the IDE stuff, as well as the new libATA stuff. Let me know if you need more testing done here, I've only got a scratch volume on this raid set. John - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the
Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote: Do you have a copy of wireshark or ethereal on hand? If so, could you take a look at whether or not any NFS traffic is going between the client and server once the hang happens? I used the following command tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs to capture http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2 I started the capture before starting the copy and left it to run for a few minutes after the traffic slowed to a crawl. The iostat and vmstat are at: http://iucha.net/nfs/21-rc7-nfs4/iostat http://iucha.net/nfs/21-rc7-nfs4/vmstat It seems that my original problem report had a big mistake! There is no hang, but at some point the write slows down to a trickle (from 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log. Regards, florin -- Bruce Schneier expects the Spanish Inquisition. http://geekz.co.uk/schneierfacts/fact/163 signature.asc Description: Digital signature
Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy
Hi. On Thu, 2007-04-19 at 00:02 +0200, Ingo Molnar wrote: * Christian Hesse [EMAIL PROTECTED] wrote: although probably your suspend2 problem is still not fixed, it's worth a try nevertheless. Which suspend2 patch did you apply, and was it against -rc6 or -rc7? You are right again. ;-) Linux 2.6.21-rc7 Suspend2 2.2.9.11 (applies cleanly to -rc7) CFS v3 (without any additional patches) And it still hangs on suspend. what's the easiest way for me to try suspend2? Apply the patch, reboot into the kernel, then execute what command to suspend? (there's a confusing mismash of initiators of all the suspend variants. Can i drive this by echoing to /sys/power/state?) From subsequent emails, I think you already got your answer, but just in case... Yes, if you enabled Replace swsusp by default and you already had it set up for getting swsusp to resume. If not, and you're using an initrd/ramfs, you'll need to modify it to echo /sys/power/suspend2/do_resume after /sys and /proc are mounted but prior to mounting / and so on. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: [PATCH] sched: implement staircase deadline scheduler further improvements-1
On Thursday 19 April 2007 09:48, Con Kolivas wrote: While the Staircase Deadline scheduler has not been completely killed off and is still in -mm I would like to fix some outstanding issues that I've found since it still serves for comparison with all the upcoming schedulers. While still in -mm can we queue this on top please? A set of staircase-deadline v 0.41 patches will make their way into the usual place for those willing to test it. http://ck.kolivas.org/patches/staircase-deadline/ Oops! Minor thinko! Here is a respin. Please apply this one instead. I better make a 0.42 heh. --- The prio_level was being inappropriately decreased if a higher priority task was still using previous timeslice. Fix that. Task expiration of higher priority tasks was not being taken into account with allocating priority slots. Check the expired best_static_prio level to facilitate that. Explicitly check all better static priority prio_levels when deciding on allocating slots for niced tasks. These changes improve behaviour in many ways. Signed-off-by: Con Kolivas [EMAIL PROTECTED] --- kernel/sched.c | 64 ++--- 1 file changed, 43 insertions(+), 21 deletions(-) Index: linux-2.6.21-rc7-sd/kernel/sched.c === --- linux-2.6.21-rc7-sd.orig/kernel/sched.c 2007-04-19 08:51:54.0 +1000 +++ linux-2.6.21-rc7-sd/kernel/sched.c 2007-04-19 12:03:29.0 +1000 @@ -145,6 +145,12 @@ struct prio_array { */ DECLARE_BITMAP(prio_bitmap, MAX_PRIO + 1); + /* +* The best static priority (of the dynamic priority tasks) queued +* this array. +*/ + int best_static_prio; + #ifdef CONFIG_SMP /* For convenience looks back at rq */ struct rq *rq; @@ -191,9 +197,9 @@ struct rq { /* * The current dynamic priority level this runqueue is at per static -* priority level, and the best static priority queued this rotation. +* priority level. */ - int prio_level[PRIO_RANGE], best_static_prio; + int prio_level[PRIO_RANGE]; /* How many times we have rotated the priority queue */ unsigned long prio_rotation; @@ -669,7 +675,7 @@ static void task_new_array(struct task_s } /* Find the first slot from the relevant prio_matrix entry */ -static inline int first_prio_slot(struct task_struct *p) +static int first_prio_slot(struct task_struct *p) { if (unlikely(p-policy == SCHED_BATCH)) return p-static_prio; @@ -682,11 +688,18 @@ static inline int first_prio_slot(struct * level. SCHED_BATCH tasks do not use the priority matrix. They only take * priority slots from their static_prio and above. */ -static inline int next_entitled_slot(struct task_struct *p, struct rq *rq) +static int next_entitled_slot(struct task_struct *p, struct rq *rq) { + int search_prio = MAX_RT_PRIO, uprio = USER_PRIO(p-static_prio); + struct prio_array *array = rq-active; DECLARE_BITMAP(tmp, PRIO_RANGE); - int search_prio, uprio = USER_PRIO(p-static_prio); + /* +* Go straight to expiration if there are higher priority tasks +* already expired. +*/ + if (p-static_prio rq-expired-best_static_prio) + return MAX_PRIO; if (!rq-prio_level[uprio]) rq-prio_level[uprio] = MAX_RT_PRIO; /* @@ -694,15 +707,22 @@ static inline int next_entitled_slot(str * static_prio are acceptable, and only if it's not better than * a queued better static_prio's prio_level. */ - if (p-static_prio rq-best_static_prio) { - search_prio = MAX_RT_PRIO; + if (p-static_prio array-best_static_prio) { if (likely(p-policy != SCHED_BATCH)) - rq-best_static_prio = p-static_prio; - } else if (p-static_prio == rq-best_static_prio) + array-best_static_prio = p-static_prio; + } else if (p-static_prio == array-best_static_prio) { search_prio = rq-prio_level[uprio]; - else { - search_prio = max(rq-prio_level[uprio], - rq-prio_level[USER_PRIO(rq-best_static_prio)]); + } else { + int i; + + search_prio = rq-prio_level[uprio]; + /* A bound O(n) function, worst case n is 40 */ + for (i = array-best_static_prio; i = p-static_prio ; i++) { + if (!rq-prio_level[USER_PRIO(i)]) + rq-prio_level[USER_PRIO(i)] = MAX_RT_PRIO; + search_prio = max(search_prio, + rq-prio_level[USER_PRIO(i)]); + } } if (unlikely(p-policy == SCHED_BATCH)) { search_prio = max(search_prio, p-static_prio); @@ -718,6 +738,8 @@ static void
Re: [RFC 0/8] Cpuset aware writeback
Christoph Lameter wrote: On Wed, 21 Mar 2007, Ethan Solomita wrote: Christoph Lameter wrote: On Thu, 1 Feb 2007, Ethan Solomita wrote: Hi Christoph -- has anything come of resolving the NFS / OOM concerns that Andrew Morton expressed concerning the patch? I'd be happy to see some progress on getting this patch (i.e. the one you posted on 1/23) through. Peter Zilkstra addressed the NFS issue. I will submit the patch again as soon as the writeback code stabilizes a bit. I'm pinging to see if this has gotten anywhere. Are you ready to resubmit? Do we have the evidence to convince Andrew that the NFS issues are resolved and so this patch won't obscure anything? The NFS patch went into Linus tree a couple of days ago and I have a new version ready with additional support to set per dirty ratios per cpu. There is some interest in adding more VM controls to this patch. I hope I can post the next rev by tomorrow. Any new ETA? I'm trying to decide whether to go back to your original patches or wait for the new set. Adding new knobs isn't as important to me as having something that fixes the core problem, so hopefully this isn't waiting on them. They could always be patches on top of your core patches. -- Ethan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Announce - Staircase Deadline cpu scheduler v0.42
On Thursday 19 April 2007 10:41, Con Kolivas wrote: On Thursday 19 April 2007 09:59, Con Kolivas wrote: Since there is so much work currently ongoing with alternative cpu schedulers, as a standard for comparison with the alternative virtual deadline fair designs I've addressed a few issues in the Staircase Deadline cpu scheduler which improve behaviour likely in a noticeable fashion and released version 0.41. http://ck.kolivas.org/patches/staircase-deadline/2.6.20.7-sd-0.41.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc7-sd-0.41.patch and an incremental for those on 0.40: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc7/sched-impleme nt -staircase-deadline-scheduler-further-improvements.patch Remember to renice X to -10 for nicest desktop behaviour :) Have fun. Oops forgot to cc a few people Nick you said I should still have something to offer so here it is. Peter you said you never saw this design (it's a dual array affair sorry). Gene and Willy you were some of the early testers that noticed the advantages of the earlier designs, Matt you did lots of great earlier testing. WLI you inspired a lot of design ideas. Mike you were the stick. And a few others I've forgotten to mention and include. Version 0.42 http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc7-sd-0.42.patch -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [NETLINK] Don't attach callback to a going-away netlink socket
David Miller [EMAIL PROTECTED] wrote: As discussed in this thread there might be other ways to a approach this, but this fix is good for now. Patch applied, thank you. Actually I was going to suggest something like this: [NETLINK]: Kill CB only when socket is unused Since we can still receive packets until all references to the socket are gone, we don't need to kill the CB until that happens. This also aligns ourselves with the receive queue purging which happens at that point. Original patch by Pavel Emelianov who noticed this race condition. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 0be19b7..914884c 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -139,6 +139,15 @@ static struct hlist_head *nl_pid_hashfn(struct nl_pid_hash *hash, u32 pid) static void netlink_sock_destruct(struct sock *sk) { + struct netlink_sock *nlk = nlk_sk(sk); + + WARN_ON(mutex_is_locked(nlk_sk(sk)-cb_mutex)); + if (nlk-cb) { + if (nlk-cb-done) + nlk-cb-done(nlk-cb); + netlink_destroy_callback(nlk-cb); + } + skb_queue_purge(sk-sk_receive_queue); if (!sock_flag(sk, SOCK_DEAD)) { @@ -147,7 +156,6 @@ static void netlink_sock_destruct(struct sock *sk) } BUG_TRAP(!atomic_read(sk-sk_rmem_alloc)); BUG_TRAP(!atomic_read(sk-sk_wmem_alloc)); - BUG_TRAP(!nlk_sk(sk)-cb); BUG_TRAP(!nlk_sk(sk)-groups); } @@ -450,17 +458,7 @@ static int netlink_release(struct socket *sock) netlink_remove(sk); nlk = nlk_sk(sk); - mutex_lock(nlk-cb_mutex); - if (nlk-cb) { - if (nlk-cb-done) - nlk-cb-done(nlk-cb); - netlink_destroy_callback(nlk-cb); - nlk-cb = NULL; - } - mutex_unlock(nlk-cb_mutex); - - /* OK. Socket is unlinked, and, therefore, - no new packets will arrive */ + /* OK. Socket is unlinked. */ sock_orphan(sk); sock-sk = NULL; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Ingo Molnar wrote: * Peter Williams [EMAIL PROTECTED] wrote: And my scheduler for example cuts down the amount of policy code and code size significantly. Yours is one of the smaller patches mainly because you perpetuate (or you did in the last one I looked at) the (horrible to my eyes) dual array (active/expired) mechanism. That this idea was bad should have been apparent to all as soon as the decision was made to excuse some tasks from being moved from the active array to the expired array. This essentially meant that there would be circumstances where extreme unfairness (to the extent of starvation in some cases) -- the very things that the mechanism was originally designed to ensure (as far as I can gather). Right about then in the development of the O(1) scheduler alternative solutions should have been sought. in hindsight i'd agree. Hindsight's a wonderful place isn't it :-) and, of course, it's where I was making my comments from. But back then we were clearly not ready for fine-grained accurate statistics + trees (cpus are alot faster at more complex arithmetics today, plus people still believed that low-res can be done well enough), and taking out any of these two concepts from CFS would result in a similarly complex runqueue implementation. I disagree. The single priority array with a promotion mechanism that I use in the SPA schedulers can do the job of avoiding starvation with no measurable increase in the overhead. Fairness, nice, good interactive responsiveness can then be managed by how you determine tasks' dynamic priorities. Also, the array switch was just thought to be of another piece of 'if the heuristics go wrong, we fall back to an array switch' logic, right in line with the other heuristics. And you have to accept it, mainline's ability to auto-renice make -j jobs (and other CPU hogs) was quite a plus for developers, so it had (and probably still has) quite some inertia. I agree, it wasn't totally useless especially for the average user. My main problem with it was that the effect of nice wasn't consistent or predictable enough for reliable resource allocation. I also agree with the aims of the various heuristics i.e. you have to be unfair and give some tasks preferential treatment in order to give the users the type of responsiveness that they want. It's just a shame that it got broken in the process but as you say it's easier to see these things in hindsight than in the middle of the melee. Peter -- Peter Williams [EMAIL PROTECTED] Learning, n. The kind of ignorance distinguishing the studious. -- Ambrose Bierce - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: is there any generic GPIO chip framework like IRQ chips?
So, talking about what an (optional) implementation framework might look like (and which could handle the SOC, FPGA, I2C, and MFD cases I've looked at): See patches in following messages ... a preliminary gpio_chip core for such a framework, plus example support for one SOC family's GPIOs, and then updating one board's handling of GPIOs, including over I2C. Just to compare, diffstats for GPIODEV: Now, if they were functionally equivalent, such a comparison would be less of an apples/oranges thing! The most useful comparison would focus on technical aspects of the gpio_chip abstraction itself (i.e. $SUBJECT). it needs work - it doesn't adhere to your own optimization scheme by using lookup table instead of list. I thought it was more important to address the $SUBJECT first: get a working gpio_chip abstraction which covers all the needed functionality. The patch had a hook for implementing such tweaks, but it wasn't used. The next version you'll see lets the platform code use its own existing lookup code, as part of slimming things down a bit. I also decided to take out the debugfs support. you speak about constructor parts which anyone can use to construct whatever GPIO API they like, whereas I'm speaking about exact API implementation which can be used right away. I most certainly did not speak about whatever GPIO API they like!! Quite the contrary, in fact. Please don't put words in my mouth. (You've been doing it quite extensively in this thread; it's rude.) And that core patch I posted was clearly usable right away; otherwise the two examples _using_ it couldn't have worked. Well, besides gpio_keys we here have asic3_keys, samcop_keys, etc. - all that duplication just because the current GPIO API doesn't allow extensibility to more chips. When I get tired of repeating myself, just remember: the current programming interface *DOES* allow such extensibility. That's what it means to be an interface, rather than an implementation: it defines inputs and outputs, allowing any process that conforms to both. In fact, the patches I sent demonstrated exactly that extensibility. Same interface, additional chips; different implementation inside. So you're agreeing that, at a technical level, what I described could be augmented by a caching facility ... giving a programming interface with all the characteristics of your GPIODEV thingie. All you're really disagreeing with is bootstrapping issues; and whether there is in fact a need for such a layer. The only argument I could possibly buy is that it avoids the lookup of (b) ... but that doesn't seem to matter in most cases I've looked at. So, now the most important question is what we all would get with your approach in the end. So, if you could make sure gpiolib.c doesn't contain inefficient implementation, I can make it comparable to existing implementations that work the same way ... e.g. AT91 and OMAP code. Of course, it's not possible to get away from the cost of function indirection, with a generic gpio_chip abstraction. Or those lookup costs; but as you agreed, those costs don't seem to matter much. And if they ever do matter, caching support would be easy to add. and make such extensible implementation available by default for ARM PXA/S3Cxxx/OMAP, then it's for sure cover Handhelds.org's, and many other peoples' usecases, and that would be highly appreciated. If you could do it for 2.6.22 merge window, that would straight ideal. I think having an optional gpio_chip, not unlike what was in that one patch, should be reasonable; also, making it work on some platforms that I use. But I don't think there's much overlap between those platforms and what hh.org uses. - Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote: On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote: Do you have a copy of wireshark or ethereal on hand? If so, could you take a look at whether or not any NFS traffic is going between the client and server once the hang happens? I used the following command tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs to capture http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2 I started the capture before starting the copy and left it to run for a few minutes after the traffic slowed to a crawl. The iostat and vmstat are at: http://iucha.net/nfs/21-rc7-nfs4/iostat http://iucha.net/nfs/21-rc7-nfs4/vmstat It seems that my original problem report had a big mistake! There is no hang, but at some point the write slows down to a trickle (from 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log. Yeah. You only captured the outgoing traffic to the server, but already it looks as if there were 'interesting' things going on. In frames 29346 to 29350, the traffic stops altogether for 5 seconds (I only see keepalives) then it starts up again. Ditto for frames 40477-40482 (another 5 seconds). ... Then at around frame 92072, the client starts to send a bunch of RSTs. Aha I'll bet that reverting the appended patch fixes the problem. The assumption Chuck makes is that if _no_ request bytes have been sent, yet the request is on the 'receive list' then it must be a resend is patently false in the case where the send queue just happens to be full. A better solution would probably be to disconnect the socket following the ETIMEDOUT handling in call_status(). Cheers Trond --- commit 43d78ef2ba5bec26d0315859e8324bfc0be23766 Author: Chuck Lever [EMAIL PROTECTED] Date: Tue Feb 6 18:26:11 2007 -0500 NFS: disconnect before retrying NFSv4 requests over TCP RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request twice on the same connection unless it is the NULL procedure. Section 3.1.1 suggests that the client should disconnect and reconnect if it wants to retry a request. Implement this by adding an rpc_clnt flag that an ULP can use to specify that the underlying transport should be disconnected on a major timeout. The NFSv4 client asserts this new flag, and requests no retries after a minor retransmit timeout. Note that disconnecting on a retransmit is in general not safe to do if the RPC client does not reuse the TCP port number when reconnecting. See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6 Signed-off-by: Chuck Lever [EMAIL PROTECTED] Signed-off-by: Trond Myklebust [EMAIL PROTECTED] diff --git a/fs/nfs/client.c b/fs/nfs/client.c index a3191f0..c46e94f 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, int proto, static int nfs_create_rpc_client(struct nfs_client *clp, int proto, unsigned int timeo, unsigned int retrans, - rpc_authflavor_t flavor) + rpc_authflavor_t flavor, + int flags) { struct rpc_timeout timeparms; struct rpc_clnt *clnt = NULL; @@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, int proto, .program= nfs_program, .version= clp-rpc_ops-version, .authflavor = flavor, + .flags = flags, }; if (!IS_ERR(clp-cl_rpcclient)) @@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const struct nfs_mount_data * * - RFC 2623, sec 2.3.2 */ error = nfs_create_rpc_client(clp, proto, data-timeo, data-retrans, - RPC_AUTH_UNIX); + RPC_AUTH_UNIX, 0); if (error 0) goto error; nfs_mark_client_ready(clp, NFS_CS_READY); @@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp, /* Check NFS protocol revision and initialize RPC op vector */ clp-rpc_ops = nfs_v4_clientops; - error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour); + error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour, + RPC_CLNT_CREATE_DISCRTRY); if (error 0) goto error; memcpy(clp-cl_ipaddr, ip_addr, sizeof(clp-cl_ipaddr)); diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index a1be89d..c7a78ee 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -40,6 +40,7 @@ struct rpc_clnt {
Re: [RFC 0/8] Cpuset aware writeback
On Wed, 18 Apr 2007, Ethan Solomita wrote: Any new ETA? I'm trying to decide whether to go back to your original patches or wait for the new set. Adding new knobs isn't as important to me as having something that fixes the core problem, so hopefully this isn't waiting on them. They could always be patches on top of your core patches. -- Ethan H Sorry. I got distracted and I have sent them to Kame-san who was interested in working on them. I have placed the most recent version at http://ftp.kernel.org/pub/linux/kernel/people/christoph/cpuset_dirty - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote: On Wed, 18 Apr 2007, Matt Mackall wrote: Why is X special? Because it does work on behalf of other processes? Lots of things do this. Perhaps a scheduler should focus entirely on the implicit and directed wakeup matrix and optimizing that instead[1]. I 100% agree - the perfect scheduler would indeed take into account where the wakeups come from, and try to weigh processes that help other processes make progress more. That would naturally give server processes more CPU power, because they help others I don't believe for a second that fairness means give everybody the same amount of CPU. That's a totally illogical measure of fairness. All processes are _not_ created equal. I believe that unless the kernel is told of these inequalities, then it must schedule fairly. And yes, by fairly, I mean fairly among all threads as a base resource class, because that's what Linux has always done (and if you aggregate into higher classes, you still need that per-thread scheduling). So I'm not excluding extra scheduling classes like per-process, per-user, but among any class of equal schedulable entities, fair scheduling is the only option because the alternative of unfairness is just insane. That said, even trying to do fairness by effective user ID would probably already do a lot. In a desktop environment, X would get as much CPU time as the user processes, simply because it's in a different protection domain (and that's really what effective user ID means: it's not about users, it's really about protection domains). And fairness by euid is probably a hell of a lot easier to do than trying to figure out the wakeup matrix. Well my X server has an euid of root, which would mean my X clients can cause X to do work and eat into root's resources. Or as Ingo said, X may not be running as root. Seems like just another hack to try to implicitly solve the X problem and probably create a lot of others along the way. All fairness issues aside, in the context of keeping a very heavily loaded desktop interactive, X is special. That you are trying to think up funny rules that would implicitly give X better priority is kind of indicative of that. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Announce - Staircase Deadline cpu scheduler v0.42
On Thu, Apr 19, 2007 at 12:12:14PM +1000, Con Kolivas wrote: On Thursday 19 April 2007 10:41, Con Kolivas wrote: On Thursday 19 April 2007 09:59, Con Kolivas wrote: Since there is so much work currently ongoing with alternative cpu schedulers, as a standard for comparison with the alternative virtual deadline fair designs I've addressed a few issues in the Staircase Deadline cpu scheduler which improve behaviour likely in a noticeable fashion and released version 0.41. http://ck.kolivas.org/patches/staircase-deadline/2.6.20.7-sd-0.41.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc7-sd-0.41.patch and an incremental for those on 0.40: http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc7/sched-impleme nt -staircase-deadline-scheduler-further-improvements.patch Remember to renice X to -10 for nicest desktop behaviour :) Have fun. Oops forgot to cc a few people Nick you said I should still have something to offer so here it is. Peter you said you never saw this design (it's a dual array affair sorry). Gene and Willy you were some of the early testers that noticed the advantages of the earlier designs, Matt you did lots of great earlier testing. WLI you inspired a lot of design ideas. Mike you were the stick. And a few others I've forgotten to mention and include. Version 0.42 http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc7-sd-0.42.patch OK, I run some tests later today... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, Apr 18, 2007 at 10:49:45PM +1000, Con Kolivas wrote: On Wednesday 18 April 2007 22:13, Nick Piggin wrote: The kernel compile (make -j8 on 4 thread system) is doing 1800 total context switches per second (450/s per runqueue) for cfs, and 670 for mainline. Going up to 20ms granularity for cfs brings the context switch numbers similar, but user time is still a % or so higher. I'd be more worried about compute heavy threads which naturally don't do much context switching. While kernel compiles are nice and easy to do I've seen enough criticism of them in the past to wonder about their usefulness as a standard benchmark on their own. Actually it is a real workload for most kernel developers including you no doubt :) The criticism's of kernbench for the kernel are probably fair in that kernel compiles don't exercise a lot of kernel functionality (page allocator and fault paths mostly, IIRC). However as far as I'm concerned, they're great for testing the CPU scheduler, because it doesn't actually matter whether you're running in userspace or kernel space for a context switch to blow your caches. The results are quite stable. You could actually make up a benchmark that hurts a whole lot more from context switching, but I figure that kernbench is a real world thing that shows it up quite well. Some other numbers on the same system Hackbench: 2.6.21-rc7 cfs-v2 1ms[*] nicksched 10 groups: Time: 1.332 0.743 0.607 20 groups: Time: 1.197 1.100 1.241 30 groups: Time: 1.754 2.376 1.834 40 groups: Time: 3.451 2.227 2.503 50 groups: Time: 3.726 3.399 3.220 60 groups: Time: 3.548 4.567 3.668 70 groups: Time: 4.206 4.905 4.314 80 groups: Time: 4.551 6.324 4.879 90 groups: Time: 7.904 6.962 5.335 100 groups: Time: 7.293 7.799 5.857 110 groups: Time: 10.5958.728 6.517 120 groups: Time: 7.543 9.304 7.082 130 groups: Time: 8.269 10.639 8.007 140 groups: Time: 11.8678.250 8.302 150 groups: Time: 14.8528.656 8.662 160 groups: Time: 9.648 9.313 9.541 Hackbench even more so. A prolonged discussion with Rusty Russell on this issue he suggested hackbench was more a pass/fail benchmark to ensure there was no starvation scenario that never ended, and very little value should be placed on the actual results returned from it. Yeah, cfs seems to do a little worse than nicksched here, but I include the numbers not because I think that is significant, but to show mainline's poor characteristics. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][BUG] Fix possible NULL pointer access in 8250 serial driver
Russell King wrote: NAK. This means that you change the list of ports available on the machine to be limited to only those which are currently open. Utterly useless for debugging, where you normally want people to dump the contents of /proc/tty/driver/*. The original patch was better. Is the original patch sufficient? or is there anything we should correct? Taku Izumi [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[KJ][PATCH] i2c: SPIN_LOCK_UNLOCKED cleanup
SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead Signed-off-by: Milind Arun Choudhary [EMAIL PROTECTED] --- i2c-pxa.c |2 +- i2c-s3c2410.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-pxa.c b/drivers/i2c/busses/i2c-pxa.c index 14e83d0..d5d44ed 100644 --- a/drivers/i2c/busses/i2c-pxa.c +++ b/drivers/i2c/busses/i2c-pxa.c @@ -825,7 +825,7 @@ static const struct i2c_algorithm i2c_pxa_algorithm = { }; static struct pxa_i2c i2c_pxa = { - .lock = SPIN_LOCK_UNLOCKED, + .lock = __SPIN_LOCK_UNLOCKED(i2c_pxa.lock), .adap = { .owner = THIS_MODULE, .algo = i2c_pxa_algorithm, diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c index 556f244..3eb5958 100644 --- a/drivers/i2c/busses/i2c-s3c2410.c +++ b/drivers/i2c/busses/i2c-s3c2410.c @@ -570,7 +570,7 @@ static const struct i2c_algorithm s3c24xx_i2c_algorithm = { }; static struct s3c24xx_i2c s3c24xx_i2c = { - .lock = SPIN_LOCK_UNLOCKED, + .lock = __SPIN_LOCK_UNLOCKED(s3c24xx_i2c.lock), .wait = __WAIT_QUEUE_HEAD_INITIALIZER(s3c24xx_i2c.wait), .adap = { .name = s3c2410-i2c, -- Milind Arun Choudhary - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[KJ][PATCH]SPIN_LOCK_UNLOCKED cleanup in drivers/s390
SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead. Signed-off-by: Milind Arun Choudhary [EMAIL PROTECTED] --- char/vmlogrdr.c |6 +++--- cio/cmf.c |2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/s390/char/vmlogrdr.c b/drivers/s390/char/vmlogrdr.c index b87d3b0..75d61a4 100644 --- a/drivers/s390/char/vmlogrdr.c +++ b/drivers/s390/char/vmlogrdr.c @@ -125,7 +125,7 @@ static struct vmlogrdr_priv_t sys_ser[] = { .recording_name = EREP, .minor_num = 0, .buffer_free= 1, - .priv_lock = SPIN_LOCK_UNLOCKED, + .priv_lock = __SPIN_LOCK_UNLOCKED(sys_ser[0].priv_lock), .autorecording = 1, .autopurge = 1, }, @@ -134,7 +134,7 @@ static struct vmlogrdr_priv_t sys_ser[] = { .recording_name = ACCOUNT, .minor_num = 1, .buffer_free= 1, - .priv_lock = SPIN_LOCK_UNLOCKED, + .priv_lock = __SPIN_LOCK_UNLOCKED(sys_ser[1].priv_lock), .autorecording = 1, .autopurge = 1, }, @@ -143,7 +143,7 @@ static struct vmlogrdr_priv_t sys_ser[] = { .recording_name = SYMPTOM, .minor_num = 2, .buffer_free= 1, - .priv_lock = SPIN_LOCK_UNLOCKED, + .priv_lock = __SPIN_LOCK_UNLOCKED(sys_ser[2].priv_lock), .autorecording = 1, .autopurge = 1, } diff --git a/drivers/s390/cio/cmf.c b/drivers/s390/cio/cmf.c index 90b22fa..28abd69 100644 --- a/drivers/s390/cio/cmf.c +++ b/drivers/s390/cio/cmf.c @@ -476,7 +476,7 @@ struct cmb_area { }; static struct cmb_area cmb_area = { - .lock = SPIN_LOCK_UNLOCKED, + .lock = __SPIN_LOCK_UNLOCKED(cmb_area.lock), .list = LIST_HEAD_INIT(cmb_area.list), .num_channels = 1024, }; -- Milind Arun Choudhary - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] Input: ff, add FF_RAW effect
On 4/18/07, Jiri Slaby [EMAIL PROTECTED] wrote: johann deneux napsal(a): Jiri, Which solution did you chose to implement? From what I remember, we last discussed Dmitry's idea of specifying an axis for an effect, then combine several effects to achieve complex effects. I think you mean motor instead of axis, because I don't push real axes to the devices, but motor's torques... Yes, sorry, I meant motor. -- Johann - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
On Wed, Apr 18, 2007 at 10:45:13PM -0400, Trond Myklebust wrote: On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote: It seems that my original problem report had a big mistake! There is no hang, but at some point the write slows down to a trickle (from 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log. Yeah. You only captured the outgoing traffic to the server, but already it looks as if there were 'interesting' things going on. In frames 29346 to 29350, the traffic stops altogether for 5 seconds (I only see keepalives) then it starts up again. Ditto for frames 40477-40482 (another 5 seconds). ... Then at around frame 92072, the client starts to send a bunch of RSTs. Aha I'll bet that reverting the appended patch fixes the problem. You win! Reverting this patch (on top of your previous 5) allowed the big copy to complete (70GB) as well as successful log-in to gnome! Acked-By: Florin Iucha [EMAIL PROTECTED] Thanks so much for the patience with this elusive bug and stubborn bugreporter! Regards, florin --- commit 43d78ef2ba5bec26d0315859e8324bfc0be23766 Author: Chuck Lever [EMAIL PROTECTED] Date: Tue Feb 6 18:26:11 2007 -0500 NFS: disconnect before retrying NFSv4 requests over TCP RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request twice on the same connection unless it is the NULL procedure. Section 3.1.1 suggests that the client should disconnect and reconnect if it wants to retry a request. Implement this by adding an rpc_clnt flag that an ULP can use to specify that the underlying transport should be disconnected on a major timeout. The NFSv4 client asserts this new flag, and requests no retries after a minor retransmit timeout. Note that disconnecting on a retransmit is in general not safe to do if the RPC client does not reuse the TCP port number when reconnecting. See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6 Signed-off-by: Chuck Lever [EMAIL PROTECTED] Signed-off-by: Trond Myklebust [EMAIL PROTECTED] diff --git a/fs/nfs/client.c b/fs/nfs/client.c index a3191f0..c46e94f 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, int proto, static int nfs_create_rpc_client(struct nfs_client *clp, int proto, unsigned int timeo, unsigned int retrans, - rpc_authflavor_t flavor) + rpc_authflavor_t flavor, + int flags) { struct rpc_timeout timeparms; struct rpc_clnt *clnt = NULL; @@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, int proto, .program= nfs_program, .version= clp-rpc_ops-version, .authflavor = flavor, + .flags = flags, }; if (!IS_ERR(clp-cl_rpcclient)) @@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const struct nfs_mount_data * * - RFC 2623, sec 2.3.2 */ error = nfs_create_rpc_client(clp, proto, data-timeo, data-retrans, - RPC_AUTH_UNIX); + RPC_AUTH_UNIX, 0); if (error 0) goto error; nfs_mark_client_ready(clp, NFS_CS_READY); @@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp, /* Check NFS protocol revision and initialize RPC op vector */ clp-rpc_ops = nfs_v4_clientops; - error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour); + error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour, + RPC_CLNT_CREATE_DISCRTRY); if (error 0) goto error; memcpy(clp-cl_ipaddr, ip_addr, sizeof(clp-cl_ipaddr)); diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index a1be89d..c7a78ee 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -40,6 +40,7 @@ struct rpc_clnt { unsigned intcl_softrtry : 1,/* soft timeouts */ cl_intr : 1,/* interruptible */ + cl_discrtry : 1,/* disconnect before retry */ cl_autobind : 1,/* use getport() */ cl_oneshot : 1,/* dispose after use */ cl_dead : 1;/* abandoned */ @@ -111,6 +112,7 @@ struct rpc_create_args { #define RPC_CLNT_CREATE_ONESHOT (1UL 3) #define RPC_CLNT_CREATE_NONPRIVPORT (1UL 4) #define RPC_CLNT_CREATE_NOPING (1UL 5) +#define
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
Abhijit Bhopatkar wrote: I just wanted to know weather its worth going forward or we have better reasons to discount any such direction? The reason that the wrong pages get swapped out sometimes could be due to a side effect of the way the swappiness policy is implemented. While the VM only reclaims page cache pages, it will still rotate through the anonymous pages on the LRU list, which effectively randomizes the order of those pages on the list. I need to get back to benchmarking my patch to split the lists - anonymous and other swap backed pages on one set of pageout lists, filesystem backed pages on another list. One report I got was that the system is more interactive under very heavy load, and my desktop system at the office seems to behave better than it used to when I get back to it after a few days. Unfortunately my main desktop system at home depends on Xen, so it's not as easy to use that patch there :( -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] Input: ff, add FF_RAW effect
Hi, On Thursday 19 April 2007 00:25, johann deneux wrote: On 4/18/07, Jiri Slaby [EMAIL PROTECTED] wrote: johann deneux napsal(a): Jiri, Which solution did you chose to implement? From what I remember, we last discussed Dmitry's idea of specifying an axis for an effect, then combine several effects to achieve complex effects. I think you mean motor instead of axis, because I don't push real axes to the devices, but motor's torques... Yes, sorry, I meant motor. I have been thinking about this and I don't think that exporting motor data is a good idea, at least not in case of Phantom driver. The fact that there are 3 motors is a hardware implementation detail and it is not interesting for general application. My understanding that the end result of controlling these 3 motors is a force vector (I don't know if there is such english term, this is a literal translation from russian) applied to user's hand. If we are interested in using FF API we need to come up with a way to express this effect without exposing implementation details of one particular device. -- Dmitry - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][BUG] Fix possible NULL pointer access in 8250 serial driver
On Thu, 19 Apr 2007 11:28:37 +0900 izumi [EMAIL PROTECTED] wrote: Russell King wrote: NAK. This means that you change the list of ports available on the machine to be limited to only those which are currently open. Utterly useless for debugging, where you normally want people to dump the contents of /proc/tty/driver/*. The original patch was better. Is the original patch sufficient? or is there anything we should correct? Would it be better to do something like --- a/drivers/serial/serial_core.c~a +++ a/drivers/serial/serial_core.c @@ -1686,9 +1686,12 @@ static int uart_line_info(char *buf, str pm_state = state-pm_state; if (pm_state) uart_change_pm(state, 0); - spin_lock_irq(port-lock); - status = port-ops-get_mctrl(port); - spin_unlock_irq(port-lock); + status = 0; + if (port-info) { + spin_lock_irq(port-lock); + status = port-ops-get_mctrl(port); + spin_unlock_irq(port-lock); + } if (pm_state) uart_change_pm(state, pm_state); mutex_unlock(state-mutex); _ so that a) we treat all uart types in the same way and b) the same problem doesn't occur later with some other driver which is assuming an opened device in its -get_mctrl() handler? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 19 Apr 2007 05:18:07 +0200 Nick Piggin [EMAIL PROTECTED] wrote: And yes, by fairly, I mean fairly among all threads as a base resource class, because that's what Linux has always done Yes, there are potential compatibility problems. Example: a machine with 100 busy httpd processes and suddenly a big gzip starts up from console or cron. Under current kernels, that gzip will take ages and the httpds will take a 1% slowdown, which may well be exactly the behaviour which is desired. If we were to schedule by UID then the gzip suddenly gets 50% of the CPU and those httpd's all take a 50% hit, which could be quite serious. That's simple to fix via nicing, but people have to know to do that, and there will be a transition period where some disruption is possible. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RFC] Throttle swappiness for interactive tasks
I just wanted to know weather its worth going forward or we have better reasons to discount any such direction? The reason that the wrong pages get swapped out sometimes could be due to a side effect of the way the swappiness policy is implemented. While the VM only reclaims page cache pages, it will still rotate through the anonymous pages on the LRU list, which effectively randomizes the order of those pages on the list. In my mind i find it fundamentally wrong to separate anon pages from page cache. It should rather be lot more dependent on which task accessed them last. Although it seems due to some twisted relationships bet anon pages and interactive tasks separating them improves it. Am i missing something here? I need to get back to benchmarking my patch to split the lists - anonymous and other swap backed pages on one set of pageout lists, filesystem backed pages on another list. snip Unfortunately my main desktop system at home depends on Xen, so it's not as easy to use that patch there :( Can you send me those patches please or point me to where i can find those? Abhijit - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] RSS controller based on process containers (v2)
Pavel Emelianov wrote: Peter Zijlstra wrote: *ugh* /me no like. The basic premises seems to be that we can track page owners perfectly (although this patch set does not yet do so), through get/release It looks like you have examined the patches not very carefully before concluding this. These patches DO track page owners. I know that a page may be shared among several containers and thus have many owners so we should track all of them. This is exactly what we decided not to do half-a-year ago. Page sharing accounting is performed in OpenVZ beancounters, and this functionality will be pushed to mainline after this simple container. operations (on _mapcount). This is simply not true for unmapped pagecache pages. Those receive no 'release' event; (the usage by find_get_page() could be seen as 'get'). These patches concern the mapped pagecache only. Unmapped pagecache control is out of the scope of it since we do not want one container to track all the resources. Unmapped pagecache control and swapcache control is part of independent pagecache controller that is being developed. Initial version was posted at http://lkml.org/lkml/2007/3/06/51 I plan to post a new version based on this patchset in a couple of days. --Vaidy Also, you don't seem to balance the active/inactive scanning on a per container basis. This skews the per container working set logic. This is not true. Balbir sent a patch to the first version of this container that added active/inactive balancing to the container. I have included this (a bit reworked) patch into this version and pointed this fact in the zeroth letter. [snip] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] workqueue: debug possible lockups in flush_workqueue
Hi, Here is my patch proposal for detecting possible lockups, when flush_workqueue caller holds a lock (e.g. rtnl_lock) also used in work functions. Regards, Jarek P. Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- diff -Nurp 2.6.21-rc6-mm1-/kernel/workqueue.c 2.6.21-rc6-mm1/kernel/workqueue.c --- 2.6.21-rc6-mm1-/kernel/workqueue.c 2007-04-18 20:07:45.0 +0200 +++ 2.6.21-rc6-mm1/kernel/workqueue.c 2007-04-18 21:29:50.0 +0200 @@ -67,6 +67,12 @@ struct workqueue_struct { /* All the per-cpu workqueues on the system, for hotplug cpu to add/remove threads to each one as cpus come/go. */ static DEFINE_MUTEX(workqueue_mutex); + +#ifdef CONFIG_PROVE_LOCKING +/* Detect possible flush_workqueue() lockup with circular dependency check. */ +static struct lockdep_map flush_dep_map = { .name = flush_dep_map }; +#endif + static LIST_HEAD(workqueues); static int singlethread_cpu __read_mostly; @@ -247,8 +253,15 @@ static void run_workqueue(struct cpu_wor BUG_ON(get_wq_data(work) != cwq); work_clear_pending(work); +#ifdef CONFIG_PROVE_LOCKING + /* lockdep dependency: flush_dep_map (read) before any lock: */ + lock_acquire(flush_dep_map, 0, 0, 1, 2, _THIS_IP_); +#endif f(work); +#ifdef CONFIG_PROVE_LOCKING + lock_release(flush_dep_map, 1, _THIS_IP_); +#endif if (unlikely(in_atomic() || lockdep_depth(current) 0)) { printk(KERN_ERR BUG: workqueue leaked lock or atomic: %s/0x%08x/%d\n, @@ -389,6 +402,14 @@ void fastcall flush_workqueue(struct wor int cpu; might_sleep(); +#ifdef CONFIG_PROVE_LOCKING + /* +* Add lockdep dependency: flush_dep_map (exclusive) +* after any held mutex or rwsem. +*/ + lock_acquire(flush_dep_map, 0, 0, 0, 2, _THIS_IP_); + lock_release(flush_dep_map, 1, _THIS_IP_); +#endif for_each_cpu_mask(cpu, *cpu_map) flush_cpu_workqueue(per_cpu_ptr(wq-cpu_wq, cpu)); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/