Re: [PATCH] slab: deal with NULL pointers passed to kmem_cache_free
Pekka J Enberg a écrit : Thanks for the profile. I still wonder where exactly thouse super-hot call-sites are... In this case, it's a typical network server Each time a packet is sent to or received from network, network stack has to allocate/free a skb (kmem_cache_alloc()/kmem_cache_free() and its data (kmalloc/kfree) Other paths are for example dentries allocations, file allocations, ... really many spots for some workloads. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc4-mm1
Andrew Morton napisał(a): > Temporarily at > > http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/ > Some new details about http://www.ussg.iu.edu/hypermail/linux/kernel/0703.2/1367.html I can reproduce it by running this on AutoTest for profiler in ('oprofile', ): try: print "Testing profiler %s ..." % profiler job.profilers.add(profiler) job.run_test('aiostress',) job.profilers.delete(profiler) except: print "Test of profiler %s failed" % profiler raise I guess that oprofile triggers it. BUG: using smp_processor_id() in preemptible [0001] code: mount/4934 caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] debug_smp_processor_id+0xb3/0xc8 [] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] nmi_create_files+0x2a/0x10e [oprofile] [] oprofile_create_files+0xe6/0xec [oprofile] [] oprofilefs_fill_super+0x78/0x7e [oprofile] [] get_sb_single+0x59/0x9f [] oprofilefs_get_sb+0x1c/0x1e [oprofile] [] vfs_kern_mount+0x81/0xf1 [] do_kern_mount+0x38/0xde [] do_mount+0x605/0x693 [] sys_mount+0x80/0xb5 [] syscall_call+0x7/0xb === l *avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 0xc01169fb is in avail_to_resrv_perfctr_nmi_bit (/mnt/md0/devel/linux-mm/arch/i386/kernel/nmi.c:124). 119 return 0; 120 } 121 122 /* checks for a bit availability (hack for oprofile) */ 123 int avail_to_resrv_perfctr_nmi_bit(unsigned int counter) 124 { 125 BUG_ON(counter > NMI_MAX_COUNTER_BITS); 126 127 return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner))); 128 } BUG: using smp_processor_id() in preemptible [0001] code: mount/4934 caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] debug_smp_processor_id+0xb3/0xc8 [] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] nmi_create_files+0x2a/0x10e [oprofile] [] oprofile_create_files+0xe6/0xec [oprofile] [] oprofilefs_fill_super+0x78/0x7e [oprofile] [] get_sb_single+0x59/0x9f [] oprofilefs_get_sb+0x1c/0x1e [oprofile] [] vfs_kern_mount+0x81/0xf1 [] do_kern_mount+0x38/0xde [] do_mount+0x605/0x693 [] sys_mount+0x80/0xb5 [] syscall_call+0x7/0xb === BUG: using smp_processor_id() in preemptible [0001] code: mount/4934 caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] debug_smp_processor_id+0xb3/0xc8 [] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] nmi_create_files+0x2a/0x10e [oprofile] [] oprofile_create_files+0xe6/0xec [oprofile] [] oprofilefs_fill_super+0x78/0x7e [oprofile] [] get_sb_single+0x59/0x9f [] oprofilefs_get_sb+0x1c/0x1e [oprofile] [] vfs_kern_mount+0x81/0xf1 [] do_kern_mount+0x38/0xde [] do_mount+0x605/0x693 [] sys_mount+0x80/0xb5 [] syscall_call+0x7/0xb === BUG: using smp_processor_id() in preemptible [0001] code: mount/4934 caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] debug_smp_processor_id+0xb3/0xc8 [] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] nmi_create_files+0x2a/0x10e [oprofile] [] oprofile_create_files+0xe6/0xec [oprofile] [] oprofilefs_fill_super+0x78/0x7e [oprofile] [] get_sb_single+0x59/0x9f [] oprofilefs_get_sb+0x1c/0x1e [oprofile] [] vfs_kern_mount+0x81/0xf1 [] do_kern_mount+0x38/0xde [] do_mount+0x605/0x693 [] sys_mount+0x80/0xb5 [] syscall_call+0x7/0xb === SELinux: initialized (dev oprofilefs, type oprofilefs), uses genfs_contexts = [ INFO: inconsistent lock state ] 2.6.21-rc4-mm1 #5 - inconsistent {hardirq-on-W} -> {in-hardirq-W} usage. init/1 [HC1[1]:SC0[0]:HE0:SE1] takes: (oprofilefs_lock){+-..}, at: [] nmi_cpu_setup+0x15/0x4f [oprofile] {hardirq-on-W} state was registered at: [] __lock_acquire+0x4e8/0xceb [] lock_acquire+0x79/0x93 [] _spin_lock+0x35/0x42 [] oprofilefs_ulong_from_user+0x4e/0x74 [oprofile] [] depth_write+0x27/0x43 [oprofile] [] vfs_write+0xd1/0x15a [] sys_write+0x3d/0x72 [] syscall_call+0x7/0xb [] 0x irq event stamp: 1022800 hardirqs last enabled at (1022799): [] restore_nocheck+0x12/0x15 hardirqs last disabled at (1022800): [] call_function_interrupt+0x29/0x38 softirqs last enabled at (1022784): [] __do_softirq+0xe4/0xea softirqs last disabled at (1022779): [] do_softirq+0x39/0x55 l *0xc01042b8 0xc01042b8 is at include2/asm/bitops.h:246. 241 static int test_bit(int nr, const volatile void * addr); 242 #endif 243 244 static __always_inline int constant_test_bit(int nr, const volatile unsigned long *addr) 245 { 246 return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0; 247 } 248 249
Re: [PATCH] slab: deal with NULL pointers passed to kmem_cache_free
On 3/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > > > This is a super-hot path. At some point in time, I wrote: > > Super-hot exactly where? On Tue, 20 Mar 2007, Eric Dumazet wrote: > Don't be silly Pekka ... We have plenty oprofiles results if you dont trust > Andrew. Oh, don't get me wrong, this has certainly nothing to do with "not trusting" Andrew. It's just that "this is a super-hot path" doesn't really help me understand where kmem_cache_free() is so performance sensitive at all. On Tue, 20 Mar 2007, Eric Dumazet wrote: > CPU: AMD64 processors, speed 1992.52 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit > mask of 0x00 (No unit mask) count 10 > samples %symbol name > 1861563 4.7882 tg3_start_xmit_dma_bug > 1375727 3.5386 memcpy_c > 1166438 3.0002 tcp_v4_rcv > 1157334 2.9768 kmem_cache_free > > In this workload (real server), you can see kmem_cache_free() is number four. Thanks for the profile. I still wonder where exactly thouse super-hot call-sites are... On Tue, 20 Mar 2007, Eric Dumazet wrote: > Adding one test and conditional branch in this super-hot function just to > correct a bug in a SCSI driver (or whatever) is not *SANE*. Agreed. Unless we can get kmem_cache_free() out of those hot paths, of course =). Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 0/6] per device dirty throttling
On Mon, Mar 19, 2007 at 04:57:37PM +0100, Peter Zijlstra wrote: > This patch-set implements per device dirty page throttling. Which should solve > the problem we currently have with one device hogging the dirty limit. > > Preliminary testing shows good results: I just ran some higher throughput number on this patchset. Identical 4-disk dm stripes, XFS, 4p x86_64, 16GB RAM, dirty_ratio = 5: One dm stripe: 320MB/s two dm stripes: 310+315MB/s three dm stripes: 254+253+253MB/s (pci-x bus bound) The three stripe test was for 100GB of data to each filesystem - all the writes finished with 1s of each other at 7m4s. Interestingly, the amount of memory in cache for each of these devices was almost exactly the same - about 5.2GB each. Looks good so far Hmmm - small problem - root disk (XFS) got stuck in balance_dirty_pages_ratelimited_nr() after the above write test attempting to unmount the filesystems (i.e. umount trying to modify /etc/mtab got stuck and the root fs locked up) (reboot) None-identical dm stripes, XFS, run alone: Single disk: 80MB/s 2 disk dm stripe: 155MB/s 4 disk dm stripe: 310MB/s Combined, after some runtime: # ls -sh /mnt/dm*/test 10G /mnt/dm0/test 19G /mnt/dm1/test 41G /mnt/dm2/test 15G /mnt/dm0/test 27G /mnt/dm1/test 52G /mnt/dm2/test 18G /mnt/dm0/test 32G /mnt/dm1/test 64G /mnt/dm2/test 24G /mnt/dm0/test 45G /mnt/dm1/test 86G /mnt/dm2/test 27G /mnt/dm0/test 51G /mnt/dm1/test 95G /mnt/dm2/test 29G /mnt/dm0/test 52G /mnt/dm1/test 97G /mnt/dm2/test 29G /mnt/dm0/test 54G /mnt/dm1/test 101G /mnt/dm2/test [done] 35G /mnt/dm0/test 65G /mnt/dm1/test 101G /mnt/dm2/test 38G /mnt/dm0/test 70G /mnt/dm1/test 101G /mnt/dm2/test And so on. Final number: Single disk: 70MB/s 2 disk dm stripe: 130MB/s 4 disk dm stripe: 260MB/s So overall we've lost about 15-20% of the theoretical aggregate perfomrance, but we haven't starved any of the devices over a long period of time. However, looking at vmstat for total throughput, there are periods of time where it appears that the fastest disk goes idle. That is, we drop from an aggregate of about 550MB/s to below 300MB/s for several seconds at a time. You can sort of see this from the file size output above - long term the ratios remain the same, but in the short term we see quite a bit of variability. When the fast disk completed, I saw almost the same thing, but this time it seems like the slow disk (i.e. ~230MB/s to ~150MB/s) stopped for several seconds. I haven't really digested what the patches do, but it's almost like it is throttling a device completely while it allows another to finish writing it's quota (underestimating bandwidth?). (umount after writes hung again. Same root disk thing as before) This is looking promising, Peter. When it is more stable I'll run some more tests Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21 patch] net/sunrpc/svcsock.c: fix a check
On Mon, 19 Mar 2007 10:33:42 +0100 Adrian Bunk <[EMAIL PROTECTED]> wrote: > The return value of kernel_recvmsg() should be assigned to "err", not > compared with the random value of a never initialized "err" > (and the "< 0" check wrongly always returned false since == comparisons > never have a result < 0). > > Spotted by the Coverity checker. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > --- > --- linux-2.6.21-rc3-mm2/net/sunrpc/svcsock.c.old 2007-03-19 > 09:44:40.0 +0100 > +++ linux-2.6.21-rc3-mm2/net/sunrpc/svcsock.c 2007-03-19 09:45:18.0 > +0100 > @@ -779,8 +779,8 @@ svc_udp_recvfrom(struct svc_rqst *rqstp) > } > > clear_bit(SK_DATA, &svsk->sk_flags); > - while ((err == kernel_recvmsg(svsk->sk_sock, &msg, NULL, > - 0, 0, MSG_PEEK | MSG_DONTWAIT)) < 0 || > + while ((err = kernel_recvmsg(svsk->sk_sock, &msg, NULL, > + 0, 0, MSG_PEEK | MSG_DONTWAIT)) < 0 || > (skb = skb_recv_datagram(svsk->sk_sk, 0, 1, &err)) == NULL) { > if (err == -EAGAIN) { > svc_sock_received(svsk); Cute. The compiler must have decided to apply the "(a==b) can never be less than zero" optimisation before performing uninitialised variable analysis. Neil, this one needs runtime testing before we can apply it to 2.6.21, I think. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Complain about missing system calls.
On Mon, 2007-03-19 at 16:42 -0700, Andrew Morton wrote: > hm, did you try running this on x86_64? I don't have any. I only tested it on PowerPC and i386. Others then provided more exclusions for SPARC and maybe ARM, although I'm not sure you have the latter yet. It's not hard to add extra exclusions. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slab: deal with NULL pointers passed to kmem_cache_free
Pekka Enberg a écrit : On 3/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote: This is a super-hot path. Super-hot exactly where? Don't be silly Pekka ... We have plenty oprofiles results if you dont trust Andrew. CPU: AMD64 processors, speed 1992.52 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10 samples %symbol name 1861563 4.7882 tg3_start_xmit_dma_bug 1375727 3.5386 memcpy_c 1166438 3.0002 tcp_v4_rcv 1157334 2.9768 kmem_cache_free In this workload (real server), you can see kmem_cache_free() is number four. Adding one test and conditional branch in this super-hot function just to correct a bug in a SCSI driver (or whatever) is not *SANE*. Numbers talk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel Oops in pl2303_shutdown()
Hello Adrian, reverting d9a7ecacac5f8274d2afce09aadcf37bdb42b93a does help. thanks ! On Mon, 2007-03-19 at 17:34 +0100, Adrian Bunk wrote: > On Mon, Mar 19, 2007 at 05:27:58PM +0200, Zilvinas Valinskas wrote: > > Hello, > > > > Before 2.6.21-rc4 (vanilla) serial was oopsing if I pull usb-serial > > cable while minicom was running. Now it doesn't matter if minicom is > > running or minicom closed, pulling serial cable results in such oops. > >... > > Does the patch from [1] fix it? > > cu > Adrian > > [1] http://lkml.org/lkml/2007/3/13/217 > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
On Sun, 18 Mar 2007 21:50:46 +0100 Stefan Priebe <[EMAIL PROTECTED]> wrote: > Hello! > > We've a very strange Problem with Kernel 2.6.20.x > > If i try to access a SCSI or SATA Disk (tested with Adaptec U320 > ASC-29320, ICP Vortex 9024, Promise TX300) the whole server hangs - no > output - no error on the screen - but it hangs completely. But it does > not happen on all our systems affected are only old 604pin xeons and > socket 940 Opterons. Socket F Opteron or 771 Xeons does work fine. > > I've also testet apci=off pci=routeirq but both does not help. The > systems work fine with 2.6.19.x and before. Well that's a bit sad. Could you please set up netconsole (Documentation/networking/netconsole.txt) and add initcall_debug to the kernel boot command line and then send us the full bootup logs? (Even better: serial console with earlyprintk). If that doesn't shed any light, we might have to ask you to perform a git-bisect search to find the buggy commit, I'm afraid. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm snapshot broken-out-2007-03-18-02-44.tar.gz uploaded
[EMAIL PROTECTED] napisał(a): > The mm snapshot broken-out-2007-03-18-02-44.tar.gz has been uploaded to > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-03-18-02-44.tar.gz > > It contains the following patches against 2.6.21-rc4: PM: Adding info for No Bus:vcsa7 BUG: at kernel/lockdep.c:2430 check_flags() [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] check_flags+0xb7/0x187 [] lock_acquire+0x3a/0x93 [] down_write+0x3a/0x54 [] sys_munmap+0x23/0x3f [] syscall_call+0x7/0xb === irq event stamp: 302470 hardirqs last enabled at (302469): [] syscall_exit_work+0x11/0x26 hardirqs last disabled at (302470): [] ret_from_exception+0x9/0xc softirqs last enabled at (301928): [] __do_softirq+0xe4/0xea softirqs last disabled at (301921): [] do_softirq+0x39/0x55 oprofile: using NMI interrupt. printk: 6 messages suppressed. BUG: using smp_processor_id() in preemptible [0001] code: mount/27913 caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] debug_smp_processor_id+0xb3/0xc8 [] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] nmi_create_files+0x2a/0x10e [oprofile] [] oprofile_create_files+0xe6/0xec [oprofile] [] oprofilefs_fill_super+0x78/0x7e [oprofile] [] get_sb_single+0x59/0x9f [] oprofilefs_get_sb+0x1c/0x1e [oprofile] [] vfs_kern_mount+0x81/0xf1 [] do_kern_mount+0x38/0xde [] do_mount+0x605/0x693 [] sys_mount+0x80/0xb5 [] syscall_call+0x7/0xb === BUG: using smp_processor_id() in preemptible [0001] code: mount/27913 caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] debug_smp_processor_id+0xb3/0xc8 [] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] nmi_create_files+0x2a/0x10e [oprofile] [] oprofile_create_files+0xe6/0xec [oprofile] [] oprofilefs_fill_super+0x78/0x7e [oprofile] [] get_sb_single+0x59/0x9f [] oprofilefs_get_sb+0x1c/0x1e [oprofile] [] vfs_kern_mount+0x81/0xf1 [] do_kern_mount+0x38/0xde [] do_mount+0x605/0x693 [] sys_mount+0x80/0xb5 [] syscall_call+0x7/0xb === BUG: using smp_processor_id() in preemptible [0001] code: mount/27913 caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] debug_smp_processor_id+0xb3/0xc8 [] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] nmi_create_files+0x2a/0x10e [oprofile] [] oprofile_create_files+0xe6/0xec [oprofile] [] oprofilefs_fill_super+0x78/0x7e [oprofile] [] get_sb_single+0x59/0x9f [] oprofilefs_get_sb+0x1c/0x1e [oprofile] [] vfs_kern_mount+0x81/0xf1 [] do_kern_mount+0x38/0xde [] do_mount+0x605/0x693 [] sys_mount+0x80/0xb5 [] syscall_call+0x7/0xb === BUG: using smp_processor_id() in preemptible [0001] code: mount/27913 caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] debug_smp_processor_id+0xb3/0xc8 [] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 [] nmi_create_files+0x2a/0x10e [oprofile] [] oprofile_create_files+0xe6/0xec [oprofile] [] oprofilefs_fill_super+0x78/0x7e [oprofile] [] get_sb_single+0x59/0x9f [] oprofilefs_get_sb+0x1c/0x1e [oprofile] [] vfs_kern_mount+0x81/0xf1 [] do_kern_mount+0x38/0xde [] do_mount+0x605/0x693 [] sys_mount+0x80/0xb5 [] syscall_call+0x7/0xb === SELinux: initialized (dev oprofilefs, type oprofilefs), uses genfs_contexts l *avail_to_resrv_perfctr_nmi_bit+0x2b/0x43 0xc01169fb is in avail_to_resrv_perfctr_nmi_bit (arch/i386/kernel/nmi.c:124). 119 return 0; 120 } 121 122 /* checks for a bit availability (hack for oprofile) */ 123 int avail_to_resrv_perfctr_nmi_bit(unsigned int counter) 124 { 125 BUG_ON(counter > NMI_MAX_COUNTER_BITS); 126 127 return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner))); 128 } quilt patches arch/i386/kernel/nmi.c x86_64-mm-i386-make-nmi-use-perfctr1-for-architectural-perfmon-take-2.patch http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/broken-out-2007-03-18-02-44/mm-config Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Apple SMC driver (hardware monitoring and control)
Hello, Andrew Morton wrote: > On Mon, 19 Mar 2007 13:19:00 +0800 Nicolas Boichat <[EMAIL PROTECTED]> wrote: > > >> This driver provides support for the Apple System Management Controller, >> which >> provides an accelerometer (Apple Sudden Motion Sensor), light sensors, >> temperature sensors, keyboard backlight control and fan control. Only >> Intel-based Apple's computers are supported (MacBook Pro, MacBook, MacMini). >> >> > > It's trivia time: > > [snip, syntax fixed (C++-style comments replaced)] >> +/* Temperature sensors keys. First set for Macbook(Pro), second for Macmini >> */ >> +static const char* temperature_sensors_sets[][8] = { >> +{ "TB0T", "TC0D", "TC0P", "Th0H", "Ts0P", "Th1H", "Ts1P", NULL }, >> +{ "TC0D", "TC0P", NULL } >> +}; >> > > The NULLs here are harmless, but unneeded. > Actually, I think it's safer to keep them. I use these NULL values as an end-of-list marker in applesmc_init: for (i = 0; temperature_sensors_sets[applesmc_temperature_set][i] != NULL; i++) { ... If you remove these NULLs, and, later, add a temperature sensor to the first set without thinking about incrementing the array size, you won't get any warnings, and the code will not work, while if you keep them, you will get a warning (drivers/hwmon/applesmc.c:73: warning: excess elements in array initializer). > > [snip, removed unneeded "= 0" in global variables] > > >> +static DECLARE_MUTEX(applesmc_sem); >> > > Semaphores should be used only when their counting feature is required. I > think thsi can be switched to `struct mutex'. > Fixed. Note: this code comes from hdaps, which was, and is still, using semaphores, it should probably be fixed too. > [snip, "if" and "else" syntax fixed] >> >> +/* >> + * Macro defining helper functions and DEVICE_ATTR for a fan sysfs entries. >> + * - show actual speed >> + * - show/store minimum speed >> + * - show maximum speed >> + * - show safe speed >> + * - show/store target speed >> + * - show/store manual mode >> + */ >> +#define sysfs_fan_speeds_offset(offset) \ >> +static ssize_t show_fan_actual_speed_##offset (struct device *dev, \ >> +struct device_attribute *attr, char *buf) \ >> +{ \ >> +return applesmc_show_fan_speed(dev, buf, FAN_ACTUAL_SPEED, offset); \ >> +} \ >> +static DEVICE_ATTR(fan##offset##_actual_speed, S_IRUGO, \ >> +show_fan_actual_speed_##offset, NULL); \ >> +\ >> +static ssize_t show_fan_minimum_speed_##offset (struct device *dev, \ >> +struct device_attribute *attr, char *buf) \ >> +{ \ >> +return applesmc_show_fan_speed(dev, buf, FAN_MIN_SPEED, offset); \ >> +} \ >> +static ssize_t store_fan_minimum_speed_##offset (struct device *dev, \ >> +struct device_attribute *attr, const char *buf, size_t count) \ >> +{ \ >> +return applesmc_store_fan_speed(dev, buf, count, FAN_MIN_SPEED, >> offset); \ >> +} \ >> +static DEVICE_ATTR(fan##offset##_minimum_speed, S_IRUGO | S_IWUSR, \ >> +show_fan_minimum_speed_##offset, store_fan_minimum_speed_##offset); \ >> +\ >> +static ssize_t show_fan_maximum_speed_##offset (struct device *dev, \ >> +struct device_attribute *attr, char *buf) \ >> +{ \ >> +return applesmc_show_fan_speed(dev, buf, FAN_MAX_SPEED, offset); \ >> +} \ >> +static DEVICE_ATTR(fan##offset##_maximum_speed, S_IRUGO, \ >> +show_fan_maximum_speed_##offset, NULL); \ >> +\ >> +static ssize_t show_fan_safe_speed_##offset (struct device *dev, \ >> +struct device_attribute *attr, char *buf) \ >> +{ \ >> +return applesmc_show_fan_speed(dev, buf, FAN_SAFE_SPEED, offset); \ >> +} \ >> +static DEVICE_ATTR(fan##offset##_safe_speed, S_IRUGO, \ >> +show_fan_safe_speed_##offset, NULL); \ >> +\ >> +static ssize_t show_fan_target_speed_##offset (struct device *dev, \ >> +struct device_attribute *attr, char *buf) \ >> +{ \ >> +return applesmc_show_fan_speed(dev, buf, FAN_TARGET_SPEED, offset); \ >> +} \ >> +static ssize_t store_fan_target_speed_##offset (struct device *dev, \ >> +struct device_attribute *attr, const char *buf, size_t count) \ >> +{ \ >> +return applesmc_store_fan_speed(dev, buf, count, FAN_TARGET_SPEED, >> offset); \ >> +} \ >> +static DEVICE_ATTR(fan##offset##_target_speed, S_IRUGO | S_IWUSR, \ >> +show_fan_target_speed_##offset, store_fan_target_speed_##offset); \ >> +static ssize_t show_fan_manual_##offset (struct device *dev, \ >> +struct device_attribute *attr, char *buf) \ >> +{ \ >> +return applesmc_show_fan_manual(dev, buf, offset); \ >> +} \ >> +static ssize_t store_fan_manual_##offset (struct device *dev, \ >> +struct device_attribute *attr, const char *buf, size_t count) \ >> +{ \ >> +return applesmc_store_fan_manual(dev, buf, coun
Re: [BUG] no boot with 2.6.21-rc3 and later
On Mon, 19 Mar 2007, Bob Tracy wrote: > I applied all of the 2.6.21-rc2-rc3 incremental patch except for the > portion applicable to "drivers/ide" files. The problem seems to be > elsewhere: 2.6.21-rc3 minus the drivers/ide changes still hangs at the > same spot during the boot process. Any ideas where to look next? > Thanks! This really is an excellent opportunity for git-bisect, as you are trivially and immediately able to reproduce the problem. There are not that many patches between rc2 and rc3, so it would require only a few reboots to identify offending patch, and git-bisect is really trivial to use - see the manpage for the example on the process of bisecting. -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slab: deal with NULL pointers passed to kmem_cache_free
On 3/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote: This is a super-hot path. Super-hot exactly where? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slab: deal with NULL pointers passed to kmem_cache_free
On 3/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote: The BUG_ON (at least) should probably be moved into CONFIG_DEBUG_SLAB. No it shouldn't. Letting non-slab pages pass through causes nasty and hard to debug problems which is why we have the BUG_ONs in the first place: http://lkml.org/lkml/2006/5/8/101 Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Apple SMC driver (hardware monitoring and control)
Hello, Bob Copeland wrote: > On 3/14/07, Nicolas Boichat <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I developed, a while ago, a driver the Apple System Management >> Controller, which provides an accelerometer (Apple Sudden Motion >> Sensor), light sensors, temperature sensors, keyboard backlight control >> and fan control on Intel-based Apple's computers (MacBook Pro, MacBook, >> MacMini). > > Hi Nicolas, > > I tried out an earlier version of this patch several months ago just to > play > around with the joystick part of the accelerometer driver on my MacBook, > and > found that it was backwards in the y-direction compared to what Neverball > seemed to want (of course, NB has no way to invert the joystick). I think > I just did something like this in my own copy: > > + y = -y; >input_report_abs(applesmc_idev, ABS_X, x - rest_x); >input_report_abs(applesmc_idev, ABS_Y, y - rest_y); > > I don't claim you necessarily want to change it, but thought I'd pass it > along. I tried neverball on my Macbook Pro 1st generation (Core Duo, not Core 2 Duo), and the x axis in inverted, not the y axis. Could you confirm which axis is inverted on your Macbook? Also, have you tried the modified hdaps-gl, available here: http://mactel-linux.svn.sourceforge.net/viewvc/mactel-linux/trunk/tools/hdaps-gl/ ? Is it working correctly? Thanks, Best regards, Nicolas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
new warning in ata_sg_clean
In todays 2.6.21-rc4+git the following news warning has appeared on my ppc computer: CC [M] drivers/ata/libata-core.o drivers/ata/libata-core.c: In function 'ata_sg_clean': drivers/ata/libata-core.c:3558: warning: unused variable 'dir' -- Meelis Roos ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd
On Mon, Mar 19, 2007 at 11:32:27AM +0100, Marco Berizzi wrote: > Marco Berizzi wrote: > > David Chinner wrote: > > > >> Ok, so an ipsec change. And I see from the history below it > >> really has nothing to do with this problem. it seems the problem > >> has something to do with changes between 2.6.19.1 and 2.6.19.2. > > > > indeed. Yesterday at 13:00 I have switched from 2.6.19.1 to 2.6.19.2 > > (without the ipsec fix) and at about 17:30 linux has crashed again. > > I have recompiled 2.6.19.2 with all kernel debugging options enabled > > and rebooted. Now I'm waiting for the crash... > > Linux has not been crashed. However here is dmesg output > with all debugging option enabled: (search for 'INFO: > possible recursive locking detected'). Is that normal? . > = > [ INFO: possible recursive locking detected ] > 2.6.19.2 #1 > - > rm/470 is trying to acquire lock: > (&(&ip->i_lock)->mr_lock){}, at: [] xfs_ilock+0x5b/0xa1 > > but task is already holding lock: > (&(&ip->i_lock)->mr_lock){}, at: [] xfs_ilock+0x5b/0xa1 > > other info that might help us debug this: > 3 locks held by rm/470: > #0: (&inode->i_mutex/1){--..}, at: [] do_unlinkat+0x70/0x115 > #1: (&inode->i_mutex){--..}, at: [] mutex_lock+0x1c/0x1f > #2: (&(&ip->i_lock)->mr_lock){}, at: [] > xfs_ilock+0x5b/0xa1 > > stack backtrace: > [] dump_trace+0x215/0x21a > [] show_trace_log_lvl+0x1a/0x30 > [] show_trace+0x12/0x14 > [] dump_stack+0x19/0x1b > [] print_deadlock_bug+0xc0/0xcf > [] check_deadlock+0x6a/0x79 > [] __lock_acquire+0x350/0x970 > [] lock_acquire+0x75/0x97 > [] down_write+0x3a/0x54 > [] xfs_ilock+0x5b/0xa1 > [] xfs_lock_dir_and_entry+0x105/0x11b > [] xfs_remove+0x180/0x47f > [] xfs_vn_unlink+0x22/0x4f > [] vfs_unlink+0x9e/0xa2 > [] do_unlinkat+0xa8/0x115 > [] sys_unlink+0x10/0x12 > [] syscall_call+0x7/0xb > [] 0xb7efaa7d > === That's no problem - lockdep just doesn't know that we can nest i_lock (we've got to get the annotations for this sorted out). > Here is the relevant results: > > Phase 2 - found root inode chunk > Phase 3 - ... > agno = 0 > ... > agno = 12 > LEAFN node level is 1 inode 1610612918 bno = 8388608 Hmmm - single bit error in the bno - that reminds of this: http://oss.sgi.com/projects/xfs/faq.html#dir2 So I'd definitely make sure that is repaired Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PNPACPI probes serial twice, messes up serial console
Dell SC1425 x86_64 running in i386 mode (the problem also occurs in x86_64 mode). Kernel 2.6.21-rc4, gcc 4.1.0. Config extract at end. Booting with 'console=tty console=ttyS0,9600'. The serial console on ttyS0 (0x3f8, irq 4) is probed twice, once from serial8250_init() and again from serial_pnp_probe(). The serial console output is correct until the second probe (from PNP) gets to these lines in serial8250_config_port() if (flags & UART_CONFIG_TYPE) autoconfig(up, probeflags); After the call to autoconfig(), the serial console starts printing NUL characters instead of the console output. The number of NUL characters corresponds closely with the number of characters written to the VT console, IOW it outputs each serial character as NUL instead of the correct character. When the kernel boots /sbin/init, the console resets to printing normal characters. AFAICT, the second probe of the UART is doing something nasty to the hardware. This is not a recent problem, I can reproduce the problem on 2.6.16. Booting with pnpacpi=off removes the problem, but that supresses all the PNPACPI code, not just the second probe of the serial devices. Should pnpacpi probe and setup the serial devices even when thay have already been setup? Or this is something strange about the UART in this particular box? FWIW, the serial console is plugged into a serial to USB converter (pl2303), my laptop has no serial ports. That should not make a difference, but just in case it does ... Config extract: X86_32=y GENERIC_TIME=y CLOCKSOURCE_WATCHDOG=y GENERIC_CLOCKEVENTS=y GENERIC_CLOCKEVENTS_BROADCAST=y LOCKDEP_SUPPORT=y STACKTRACE_SUPPORT=y SEMAPHORE_SLEEPERS=y X86=y MMU=y ZONE_DMA=y GENERIC_ISA_DMA=y GENERIC_IOMAP=y GENERIC_BUG=y GENERIC_HWEIGHT=y ARCH_MAY_HAVE_PC_FDC=y DMI=y DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" EXPERIMENTAL=y LOCK_KERNEL=y INIT_ENV_ARG_LIMIT=32 LOCALVERSION="-i386-kaos" LOCALVERSION_AUTO=y SWAP=y SYSVIPC=y SYSVIPC_SYSCTL=y POSIX_MQUEUE=y IKCONFIG=y IKCONFIG_PROC=y SYSFS_DEPRECATED=y CC_OPTIMIZE_FOR_SIZE=y SYSCTL=y EMBEDDED=y SYSCTL_SYSCALL=y KALLSYMS=y KALLSYMS_ALL=y HOTPLUG=y PRINTK=y BUG=y ELF_CORE=y BASE_FULL=y FUTEX=y EPOLL=y SHMEM=y SLAB=y VM_EVENT_COUNTERS=y RT_MUTEXES=y BASE_SMALL=0 MODULES=y MODULE_UNLOAD=y KMOD=y STOP_MACHINE=y BLOCK=y LBD=y LSF=y IOSCHED_NOOP=y IOSCHED_AS=y IOSCHED_DEADLINE=y IOSCHED_CFQ=y DEFAULT_DEADLINE=y DEFAULT_IOSCHED="deadline" TICK_ONESHOT=y HIGH_RES_TIMERS=y SMP=y X86_PC=y MPENTIUM4=y X86_CMPXCHG=y X86_L1_CACHE_SHIFT=7 RWSEM_XCHGADD_ALGORITHM=y GENERIC_CALIBRATE_DELAY=y X86_WP_WORKS_OK=y X86_INVLPG=y X86_BSWAP=y X86_POPAD_OK=y X86_CMPXCHG64=y X86_GOOD_APIC=y X86_INTEL_USERCOPY=y X86_USE_PPRO_CHECKSUM=y X86_TSC=y HPET_TIMER=y HPET_EMULATE_RTC=y NR_CPUS=8 SCHED_SMT=y PREEMPT_NONE=y X86_LOCAL_APIC=y X86_IO_APIC=y X86_MCE=y X86_MCE_NONFATAL=y X86_MCE_P4THERMAL=y MICROCODE=m MICROCODE_OLD_INTERFACE=y X86_MSR=m X86_CPUID=m HIGHMEM4G=y VMSPLIT_3G=y PAGE_OFFSET=0xC000 HIGHMEM=y ARCH_FLATMEM_ENABLE=y ARCH_SPARSEMEM_ENABLE=y ARCH_SELECT_MEMORY_MODEL=y ARCH_POPULATES_NODE_MAP=y SELECT_MEMORY_MODEL=y FLATMEM_MANUAL=y FLATMEM=y FLAT_NODE_MEM_MAP=y SPARSEMEM_STATIC=y SPLIT_PTLOCK_CPUS=4 ZONE_DMA_FLAG=1 MTRR=y IRQBALANCE=y HZ_250=y HZ=250 PHYSICAL_START=0x10 PHYSICAL_ALIGN=0x10 COMPAT_VDSO=y ARCH_ENABLE_MEMORY_HOTPLUG=y PM=y ACPI=y ACPI_PROCFS=y ACPI_BUTTON=m ACPI_FAN=m ACPI_PROCESSOR=m ACPI_BLACKLIST_YEAR=0 ACPI_EC=y ACPI_POWER=y ACPI_SYSTEM=y PCI=y PCI_GOANY=y PCI_BIOS=y PCI_DIRECT=y PCI_MMCONFIG=y PCIEPORTBUS=y PCIEAER=y PCI_MSI=y HT_IRQ=y ISA_DMA_API=y BINFMT_ELF=y BINFMT_MISC=m NET=y PACKET=y PACKET_MMAP=y UNIX=y XFRM=y INET=y IP_MULTICAST=y IP_ADVANCED_ROUTER=y ASK_IP_FIB_HASH=y IP_FIB_HASH=y IP_ROUTE_MULTIPATH=y IP_ROUTE_VERBOSE=y SYN_COOKIES=y INET_XFRM_MODE_BEET=y INET_DIAG=y INET_TCP_DIAG=y TCP_CONG_CUBIC=y DEFAULT_TCP_CONG="cubic" NETFILTER=y NETFILTER_NETLINK=m NETFILTER_NETLINK_LOG=m NETFILTER_XTABLES=y NETFILTER_XT_TARGET_CLASSIFY=m NETFILTER_XT_TARGET_MARK=m NETFILTER_XT_MATCH_COMMENT=m NETFILTER_XT_MATCH_DCCP=m NETFILTER_XT_MATCH_ESP=m NETFILTER_XT_MATCH_LENGTH=m NETFILTER_XT_MATCH_LIMIT=m NETFILTER_XT_MATCH_MAC=m NETFILTER_XT_MATCH_MARK=m NETFILTER_XT_MATCH_MULTIPORT=m NETFILTER_XT_MATCH_PKTTYPE=m NETFILTER_XT_MATCH_QUOTA=m NETFILTER_XT_MATCH_REALM=m NETFILTER_XT_MATCH_SCTP=m NETFILTER_XT_MATCH_STATISTIC=m NETFILTER_XT_MATCH_TCPMSS=m IP_NF_IPTABLES=y IP_NF_MATCH_IPRANGE=m IP_NF_MATCH_TOS=m IP_NF_MATCH_RECENT=m IP_NF_MATCH_ECN=m IP_NF_MATCH_AH=m IP_NF_MATCH_TTL=m IP_NF_MATCH_OWNER=m IP_NF_MATCH_ADDRTYPE=m IP_NF_FILTER=y IP_NF_TARGET_REJECT=y IP_NF_TARGET_ULOG=y VLAN_8021Q=y NET_CLS_ROUTE=y STANDALONE=y PREVENT_FIRMWARE_BUILD=y FW_LOADER=m CONNECTOR=m PNP=y PNP_DEBUG=y PNPACPI=y BLK_DEV_FD=m BLK_DEV_LOOP=m IDE=m IDE_MAX_HWIFS=4 BLK_DEV_IDE=m BLK_DEV_IDEDISK=m IDEDISK_MULTI_MODE=y BLK_DEV_IDECD=m IDE_TASK_IOCTL=y BLK_DEV_IDEPCI=y IDEPCI_SHARE_IRQ=y BLK_DEV_IDEDMA_PCI=y IDEDMA_PCI_AUTO=y BLK_DEV_PIIX=m BLK_DEV_IDEDMA=y I
Re: [PATCH] vt: fix a potential race in the VT_WAITACTIVE handler
On Thu, 15 Mar 2007 15:10:23 +0100 Michal Januszewski <[EMAIL PROTECTED]> wrote: > On a multiprocessor machine the VT_WAITACTIVE ioctl call may return 0 > if fg_console has already been updated in redraw_screen(), but the > console switch itself hasn't been completed. Fix this by checking > fg_console in vt_waitactive() with the console sem held. > > Signed-off-by: Michal Januszewski <[EMAIL PROTECTED]> > > --- > diff --git a/drivers/char/vt_ioctl.c b/drivers/char/vt_ioctl.c > index 3a5d301..00b5b34 100644 > --- a/drivers/char/vt_ioctl.c > +++ b/drivers/char/vt_ioctl.c > @@ -1041,8 +1041,12 @@ int vt_waitactive(int vt) > for (;;) { > set_current_state(TASK_INTERRUPTIBLE); > retval = 0; > - if (vt == fg_console) > + acquire_console_sem(); > + if (vt == fg_console) { > + release_console_sem(); > break; > + } > + release_console_sem(); > retval = -EINTR; > if (signal_pending(current)) > break; > OK. I think. It's hard to tell. I assume that the acquire_console_sem() in here is to synchronise against some other function which also takes acquire_console_sem(), but it is not clear which. So could you please redo this with a comment which tells the reader exactly what's being protected against what, and why? Also, I always feel a bit worried by: set_current_state(TASK_INTERRUPTIBLE); down(...); because if it hits contention, the down() will undo the set_curremt_state(). Now that's normally OK because we loop, and because the semaphore won't normally be 100% contended all the time. Unless someone reimplements down() so it happens to return in state TASK_RUNNING all the time, which they could legitimately do (although this would probably break stuff such as the above). But still, it is nicer to do down(...); set_current_state(TASK_INTERRUPTIBLE); if possible, and I think it is possible here. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Sanitize filesystem NLS handling
OGAWA Hirofumi wrote: "Alexander E. Patrakov" <[EMAIL PROTECTED]> writes: But, anyway, this is a separate issue that my patch doesn't attempt to correct. The conclusion so far is that we disagree, and that there are situations where using utf8 iocharset is the least of all evils, so the warning is not justified enough. Reproducible testcase: Again, I don't care about read at all. And why don't you use "utf8" option, instead of "iocharset=utf8". "iocharset=utf8" is warned until it is fixed. The "utf8" also doesn't work correctly in some case though. Would it be OK for you if I add the mount-time check for iocharset=utf8 to the fat filesystem and silently replace this with the "utf8" option, instead of overly actively warning the users? This way, the sysfs option and the nls_base.iocharset module parameter will still work as I want. I'm talking about two filesystems on a system here, not two encoding on one filesystem. I am also talking about this. Mounting two filesystems with different iocharsets is insane, because this will result in one of the following outcomes: 1) "ls" will show wrong characters in filenames on one of the filesystems 2) one of the two filesystems will contain wrong on-disk data for filenames, that, when misinterpreted by mounting with wrong iocharset, results in seemingly-correct output, but is misunderstood by the properly set up reference implementation (that's what is likely to happen with jfs in your example). Because you didn't change the locale. And it is your policy, right? Yes. This is because I have some files with non-ASCII names in my home directory. Changing the locale would make these filenames look wrong until I change it back. -- Alexander E. Patrakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
On Mon, 2007-03-19 at 21:27 -0700, Greg KH wrote: > On Sat, Mar 17, 2007 at 02:26:57PM +0100, Andi Kleen wrote: > > Arjan van de Ven <[EMAIL PROTECTED]> writes: > > > > > > well we can do the handshake to take ownership like we do much later in > > > boot, but that requires PCI to be there and fully discovered, which we > > > don't have this early. > > > > That's not true - we do early pci discovery. Doing USB handsoff > > there would be quite possible. > > What, we don't do USB "handoff" early enough in the boot process? It's > happening at PCI quirk time now, which I think should be early enough > for everyone (and too early for some who rely on USB keyboards and > initramfs shells...) It happens way after the CPUs are brought up. At this point both the delay loop calibration and the local APIC calibration are already done. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images
On Mon, 2007-03-19 at 20:05 -0500, Matt Mackall wrote: > On Tue, Mar 20, 2007 at 01:42:46AM +0100, Thomas Gleixner wrote: > > On Mon, 2007-03-19 at 17:32 -0500, Matt Mackall wrote: > > > This is exactly the same problem as booting on a desktop PC. But > > > somehow LILO manages. My first Linux box had a hell of a lot less disk > > > than the platform I bootstrapped (and wrote NAND drivers for) last > > > month had in NAND. > > > > No, it is not. You get the absolute sector address of your second stage > > and this is a complete nobrainer. The translation is done in the DISK > > device. > > LILO and friends manage to boot systems that use software RAID and > LVM. There are multiple methods. Some use block lists, some use tiny > boot partitions, etc. All of them are applicable to controllerless NAND. Yes, by using fixed addresses, which is not what I want. > > You simply ignore the fact, that inside each disk, USB Stick, CF-CARD, > > whatever - there is a more or less intellegent controller device, which > > does the mapping to the physical storage location. There is _NO_ such > > thing on a bare FLASH chip. > > How many times do I have to tell you that I wrote a driver for > controllerless NAND just last month? Wow. I'm impressed because I'm pulling my opinion out of thin air. > > How exactly does device mapper: > > > > A) across device wear levelling ? > > The same way UBI does, but encapsulated in a device mapper layer. Does the device mapper do that ? > > B) dynamic partitioning for FLASH aware file systems ? > > See above. Does the device mapper do that ? > > C) across device wear levelling for FLASH aware file systems ? > > See above. Look at your own drawing. > > D) background bit-flip corrections (copying affected blocks and recylce > > the old one) ? > > See above. Repeating patterns do not impress me. Your drawing tells otherwise > > E) allow position independent placement of the second stage bootloader ? > > See way above to my LILO response. Neither LILO nor GRUB have search capabilities for randomly located second stage loaders. > > > > You need to implement a clever journalling block device > > > > emulator in order to keep the data alive and the FLASH not weared out > > > > within no time. You need the wear levelling, otherwise you can throw > > > > away your FLASH in no time. > > > > > > And that's why it's in my picture. > > > > Yes, it is in your picture, but: > > > > 1) it excludes FLASH aware file systems and UBI does not. > > 2) your picture does still not explain how it does achive the above A), > > B), C), D) and E) > > > > Your extra path for partitioning(4) and JFFS2 is just a weird hack, > > which makes your proposal completely absurd. > > No, it's just there to show the flexibility of device mapper. But I have > the sneaking suspicion you have no idea how device mapper works. Sigh. Layering violation == flexibility. > In brief: device mapper takes one or more devices, applies a mapping > to them, and returns a new device. For example, take various spans of > /dev/hda1 and /dev/sda3 and present them as new-device1. Take > new-device1 and transform it with dm-crypt to get new-device2. The > kernel doesn't decide how to do this, any more than it decides where > to mount your filesystems. Userspace does. I know how it works. But your blurb does not answer any of my questions. > > > > > 5. We don't reimplement higher pieces of the stack (dm-crypt, > > > > >snapshot, etc.). > > > > > > > > Why should we reimplement that ? > > > > > > So that you can get encryption and snapshot, etc.? > > > > 1. On top of a clever block device. > > > > 2. UBI can do snapshots by design. > > Oh, so you HAVE reimplemented it. No, it already works > > 3. Encryption should be done on the VFS layer and not below the > > filesystem layer. Doing it inside the block layer or the device mapper > > is broken by design. > > That's highly debatable and not a topic for this thread. I see, you define, what has to be discussed. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #2
Rik van Riel wrote: Split the anonymous and file backed pages out onto their own pageout queues. This we do not unnecessarily churn through lots of anonymous pages when we do not want to swap them out anyway. This should (with additional tuning) be a great step forward in scalability, allowing Linux to run well on very large systems where scanning through the anonymous memory (on our way to the page cache memory we do want to evict) is slowing systems down significantly. This patch has been stress tested and seems to work, but has not been fine tuned or benchmarked yet. For now the swappiness parameter can be used to tweak swap aggressiveness up and down as desired, but in the long run we may want to simply measure IO cost of page cache and anonymous memory and auto-adjust. We apply pressure to each of sets of the pageout queues based on: - the size of each queue - the fraction of recently referenced pages in each queue, not counting used-once file pages - swappiness (file IO is more efficient than swap IO) Please take this patch for a spin and let me know what goes well and what goes wrong. This ignores whether a file page is mapped, doesn't it? Even so, it could be a good approach anyway. There are a couple of little nice improvements you have there, such as treating shmem pages in the same class as anon pages. We found that we needed something similar, so some of those things should go upstream on their own. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.31
On Mon, Mar 19, 2007 at 08:11:55PM -0700, Linus Torvalds wrote: > Quite frankly, I was *planning* on merging RSDL very early after 2.6.21, > but there is one thing that has turned me completely off the whole thing: > > - the people involved seem to be totally unwilling to even admit there >might be a problem. > > This is like alcoholism. If you cannot admit that you might have a > problem, you'll never get anywhere. And quite frankly, the RSDL proponents > seem to be in denial ("we're always better", "it's your problem if the old > scheduler works better", "just one report of old scheduler being better"). > > And the thing is, if people aren't even _willing_ to admit that there may > be issues, there's *no*way*in*hell* I will merge it even for testing. > Because the whole and only point of merging RSDL was to see if it could > replace the old scheduler, and the most important feature in that case is > not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO > FIX THE INEVITABLE PROBLEMS! Linus, you're unfair with Con. He initially was on this position, and lately worked with Mike by proposing changes to try to improve his X responsiveness. But he's ill right now and cannot touch the keyboard, so only his supporters speak for him, and as you know, speech is not code and does not fix problems. Leave him a week or so to relieve and let's see what he can propose. Hopefully a week away from the keyboard will help him think with a more general approach. Also, Mike has already modified the code a bit to get better experience. Also, while I don't agree with starting to renice X to get something usable, it seems real that there's something funny on Mike's system which makes it behave particularly strangely when combined with RSDL, because other people in comparable tests (including me) have found X perfectly smooth even with loads in the tens or even hundreds. I really suspect that we will find a bug in RSDL which triggers the problem and that this fix will help discover another problem on Mike's hardware which was not triggered by mainline. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote: > On Sunday 18 March 2007 03:50, Nick Piggin wrote: > > > > > > > > Yes, I believe that is the case, however I wonder if that is going to > > > > be a problem for you to distinguish between write faults for clean > > > > writable ptes, and write faults for readonly ptes? > > > > > > I wouldn't be able to distinguish them, but am I going to get write > > > faults for clean ptes when vma_wants_writenotify() is false (as seems to > > > be for tmpfs)? I guess not. > > > > > > For tmpfs pages, clean writable PTEs are mapped as writable so they won't > > > give any problem, since vma_wants_writenotify() is false for tmpfs. > > > Correct? > > > > Yes, that should be the case. So would this mean that nonlinear protections > > don't work on regular files? > > They still work in most cases (including for UML), but if the initial mmap() > specified PROT_WRITE, that is ignored, for pages which are not remapped via > remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no > problem. But how are you going to distinguish a write fault on a readonly pte for dirty page accounting vs a read-only nonlinear protection? You can't store any more data in a present pte AFAIK, so you'd have to have some out of band data. At which point, you may as well just forget about vma_wants_writenotify vmas, considering that everybody is using shmem/ramfs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
Zachary Amsden wrote: > For VMI, the default clobber was "cc", and you need a way to allow at > least that, because saving and restoring flags is too expensive on x86. According to lore (Andi, I think), asm() always clobbers cc. > I still don't think this was a good trade. The primary motivation for > clobbering %eax was that Xen wanted a free register to use for > computing the offset into the shared data in the case of SMP > preemptible kernels. Xen no longer needs such a register, they can > use the PDA offset instead. And it does hurt native performance by > unconditionally stealing a register in the four most commonly invoked > paravirt-ops code sequences. Actually, it still does need a temp register. The sequence for cli is: mov %fs:xen_vcpu, %eax movb $1,1(%eax) At some point I hope to move the vcpu structure directly into the pda/percpu variables, at which point it will need no temps. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
On Tue, 2007-20-03 at 01:04 -0400, Lee Revell wrote: > I think CONFIG_TRY_TO_DISABLE_SMI would be excellent for debugging, > not to mention people trying to spec out hardware for RT > applications... There is a SMI disabling module in RTAI, check the smi-module.c in this: https://www.rtai.org/RTAI/rtai-3.5.tar.bz2 More infos: http://www.captain.at/rtai-smi-high-latency.php http://www.captain.at/xenomai-smi-high-latency.php It might make sense to merge this code, at least in the -rt tree. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1 of 2] block_page_mkwrite() Implementation V2
Christoph Hellwig wrote: On Mon, Mar 19, 2007 at 09:11:31PM +1100, Nick Piggin wrote: I've got the patches in -mm now. I hope they will get merged when the the next window opens. I didn't submit the ->page_mkwrite conversion yet, because I didn't have any callers to look at. It is is slightly less trivial than for nopage and nopfn, so having David's block_page_mkwrite is helpful. Yes. I was just wondering whether it makes more sense to do this functionality directly ontop of ->fault instead of converting i over real soon. I would personally prefer that, but I don't want to block David's patch from being merged if the ->fault patches do not get in next cycle. If the fault patches do make it in first, then yes we should do the page_mkwrite conversion before merging David's patch. I'll keep an eye on it, and try to do the right thing. Thanks, Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm snapshot broken-out-2007-03-18-02-44.tar.gz uploaded
Nick Piggin wrote: Andrew Morton wrote: On Tue, 20 Mar 2007 13:47:53 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: Andrew Morton wrote: Hang on a sec... I'll try fixing the thing before you next make a release. Too late. hot-fixes/ awaits thee. Awww... well thanks very much Michal for reporting the bug, I reproduced it easily and it turns out to be a typo. In my testing I never had a lot of writeout going on, so most of the pages will have been truncated in the first loop... Also, noticed another problem in the same general area. Andrew you were indeed right to question the removal of that unmap_mapping_range call, but I think even it alone it wasn't enough... -- SUSE Labs, Novell Inc. The nopage vs invalidate race fix patch did not take care of truncating private COW pages. Mind you, I'm pretty sure this was previously racy even for regular truncate, not to mention vmtruncate_range. Anyway, fix that omission. Index: linux-2.6/mm/memory.c === --- linux-2.6.orig/mm/memory.c +++ linux-2.6/mm/memory.c @@ -1905,7 +1905,18 @@ int vmtruncate(struct inode * inode, lof if (IS_SWAPFILE(inode)) goto out_busy; i_size_write(inode, offset); + + /* +* unmap_mapping_range is called twice, first simply for efficiency +* so that truncate_inode_pages does fewer single-page unmaps. However +* after this first call, and before truncate_inode_pages finishes, +* it is possible for private pages to be COWed, which remain after +* truncate_inode_pages finishes, hence the second unmap_mapping_range +* call must be made for correctness. +*/ + unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1); truncate_inode_pages(mapping, offset); + unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1); goto out_truncate; do_expand: @@ -1943,7 +1954,9 @@ int vmtruncate_range(struct inode *inode mutex_lock(&inode->i_mutex); down_write(&inode->i_alloc_sem); + unmap_mapping_range(mapping, offset, (end - offset), 1); truncate_inode_pages_range(mapping, offset, end); + unmap_mapping_range(mapping, offset, (end - offset), 1); inode->i_op->truncate_range(inode, offset, end); up_write(&inode->i_alloc_sem); mutex_unlock(&inode->i_mutex);
Re: mm snapshot broken-out-2007-03-18-02-44.tar.gz uploaded
Andrew Morton wrote: On Tue, 20 Mar 2007 13:47:53 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: Andrew Morton wrote: Hang on a sec... I'll try fixing the thing before you next make a release. Too late. hot-fixes/ awaits thee. Awww... well thanks very much Michal for reporting the bug, I reproduced it easily and it turns out to be a typo. In my testing I never had a lot of writeout going on, so most of the pages will have been truncated in the first loop... -- SUSE Labs, Novell Inc. Fix typo in do_no_page vs invalidate race fix patch. Index: linux-2.6/mm/truncate.c === --- linux-2.6.orig/mm/truncate.c +++ linux-2.6/mm/truncate.c @@ -235,7 +235,7 @@ void truncate_inode_pages_range(struct a wait_on_page_writeback(page); if (page_mapped(page)) { unmap_mapping_range(mapping, - (loff_t)page_indexnext)
Re: [patch 00/31] 2.6.20-stable review
On Monday 19 March 2007, Greg KH wrote: >This is the start of the stable review cycle for the 2.6.20.4 release. >There are 31 patches in this series, all will be posted as a response >to this one. If anyone has any issues with these being applied, please >let us know. If anyone is a maintainer of the proper subsystem, and >wants to add a Signed-off-by: line to the patch, please respond with it. > >These patches are sent out with a number of different people on the >Cc: line. If you wish to be a reviewer, please email [EMAIL PROTECTED] >to add your name to the list. If you want to be off the reviewer list, >also email us. > >Responses should be made by Thursday March, 22, 15:00:00 UTC. >Anything received after that time might be too late. BINGO! One of these 31 patches may be the guilty party that's playing tricks with tar's mind. I'm running 2.6.20.4-rc1 on an older athlon xp2800 with a gig of ram. Amanda has gotten through the estimate phase and is now doing the backup. It will fail, out of tape. Here is an amstatus output as its running right now. coyote:/GenesAmandaHelper-0.5 3 planner: [dumps way too big, 350850 KB, must skip incremental dumps] coyote:/GenesAmandaHelper-0.6 1 planner: [dumps way too big, 184977 KB, must skip incremental dumps] coyote:/bin 1 planner: [dumps way too big, 1110 KB, must skip incremental dumps] coyote:/boot 13m wait for dumping coyote:/dev 1 planner: [dumps way too big, 290 KB, must skip incremental dumps] coyote:/etc 1 planner: [dumps way too big, 18291 KB, must skip incremental dumps] coyote:/home 0 1018m wait for dumping coyote:/lib 3 planner: [dumps way too big, 11705 KB, must skip incremental dumps] coyote:/opt 15m wait for dumping coyote:/root 3 planner: [dumps way too big, 785963 KB, must skip incremental dumps] coyote:/sbin 1 planner: [dumps way too big, 10 KB, must skip incremental dumps] coyote:/tmp 4 32m wait for dumping coyote:/usr/X11R6 12m wait for dumping coyote:/usr/bin 1 planner: [dumps way too big, 339170 KB, must skip incremental dumps] coyote:/usr/dlds 1 planner: [dumps way too big, 2140 KB, must skip incremental dumps] coyote:/usr/dlds-misc 30m wait for dumping coyote:/usr/dlds-rpms 1 planner: [dumps way too big, 3130 KB, must skip incremental dumps] coyote:/usr/dlds-tgzs 1 planner: [dumps way too big, 10 KB, must skip incremental dumps] coyote:/usr/games 00m wait for dumping coyote:/usr/include 1 planner: [dumps way too big, 10557 KB, must skip incremental dumps] coyote:/usr/kerberos 10m wait for dumping coyote:/usr/lib 1 planner: [dumps way too big, 474409 KB, must skip incremental dumps] coyote:/usr/libexec 2 planner: [dumps way too big, 11285 KB, must skip incremental dumps] coyote:/usr/local 2 279m wait for dumping coyote:/usr/man 10m wait for dumping coyote:/usr/movies2 7271m dumping 5485m ( 75.44%) (0:12:47) coyote:/usr/music 1 planner: [dumps way too big, 2448290 KB, must skip incremental dumps] coyote:/usr/pix 2 17m wait for dumping coyote:/usr/sbin 1 planner: [dumps way too big, 3254 KB, must skip incremental dumps] coyote:/usr/share 3 planner: [dumps way too big, 40514 KB, must skip incremental dumps] coyote:/usr/src 3 6822m wait for dumping coyote:/var 1 366m wait for dumping SUMMARY part real estimated size size partition : 32 estimated : 3231973m flush : 0 0m failed : 1816155m ( 50.53%) wait for dumping: 13 8547m ( 26.73%) dumping to tape : 00m ( 0.00%) dumping : 1 5485m 7271m ( 75.44%) ( 17.16%) dumped : 0 0m 0m ( 0.00%) ( 0.00%) wait for writing: 0 0m 0m ( 0.00%) ( 0.00%) wait to flush : 0 0m 0m (100.00%) ( 0.00%) writing to tape : 0 0m 0m ( 0.00%) ( 0.00%) failed to tape : 0 0m 0m ( 0.00%) ( 0.00%) taped : 0 0m 0m ( 0.00%) ( 0.00%) tape 1: 0 0m 0m ( 0.00%) Dailys-19 8 dumpers idle : not-idle taper idle network free kps: 6800 holding space : 71118m (100.00%) dumper0 busy : 0:00:00 ( 0.00%) 0 dumpers busy : 0:00:00 ( 0.00%) 1 dumper busy : 0:00:00 ( 0.00%) The directory shown on line one of this report actually has: [EMAIL PROTECTED] /]# du -h /GenesAmandaHelper-0.5/ 1.6G/GenesAmanda
Re: [PATCH] powerpc minor pagefault optimization with kprobes enabled
> I've attached a patch below the optimizes this code path for powerpc, > but the scheme applies to all architectures aswell. It just rips out all > the callachin madness, and does as good as it gets in the pagefault > handler: NAK, patch on the way to get rid of all the debugger() crap by using this very hook. Anton - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
On 3/16/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote: Yes, this is probably caused by SMM code trying to emulate a PS/2 keyboard from a (maybe connected or not) USB keyboard. Unfortunately we have no way to disable this BIOS misfeature in the early boot process. https://mail.rtai.org/pipermail/rtai/2003-March/002949.html http://www.embeddedrelated.com/usenet/embedded/show/50333-1.php I think CONFIG_TRY_TO_DISABLE_SMI would be excellent for debugging, not to mention people trying to spec out hardware for RT applications... Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc4-mm1
Temporarily at http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/ - Restored the RSDL CPU scheduler (a new version thereof) Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail [EMAIL PROTECTED] - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Occasional snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits list. Changes since 2.6.21-rc3-mm1: origin.patch git-acpi.patch git-alsa.patch git-arm-master.patch git-arm.patch git-avr32.patch git-cifs.patch git-cpufreq.patch git-powerpc.patch git-drm.patch git-dvb.patch git-gfs2-nmw.patch git-hid.patch git-ia64.patch git-ieee1394.patch git-infiniband.patch git-input.patch git-kbuild.patch git-kvm.patch git-leds.patch git-libata-all.patch git-md-accel.patch git-mmc.patch git-mtd.patch git-ubi.patch git-netdev-all.patch git-ioat.patch git-ocfs2.patch git-parisc.patch git-selinux.patch git-pciseg.patch git-s390.patch git-sh.patch git-scsi-misc.patch git-scsi-rc-fixes.patch git-unionfs.patch git-wireless.patch git-ipwireless_cs.patch git-gccbug.patch git trees -uml-hostfs-fix-double-free.patch -uml-hostfs-make-hostfs=-option-work-as-a-jail-as-intended.patch -uml-fix-a-memory-leak-in-the-multicast-driver.patch -uml-remove-dead-code-about-os_usr1_signal-and-os_usr1_process.patch -uml-mark-both-consoles-as-con_anytime.patch -uml-fix-confusion-irq-early-reenabling.patch -uml-activate_fd-return-enomem-only-when-appropriate.patch -uml-fix-errno-usage.patch -x86_64-fix-2618-regression-ptrace_oldsetoptions-should-be-accepted.patch -bluetooth-fix-socket-locking-in-hci_sock_dev_event.patch -add-epoll-compat_-code-to-fs-compatc.patch -check_partition-fix-error-check.patch -uml-arch_prctl-should-set-thread-fs.patch -connector-bugfix-for-cn_call_callback.patch -26-altix-console-fix-for-config_debug_shirq-usage.patch -ecryptfs-nested-locking-annotation.patch -swsusp-disable-nonboot-cpus-before-entering-platform-suspend.patch -paravirt-build-fixes.patch -acpi-disabled-due-to-dmi-failure-or-blacklisted-year-should-be-noted-as-is-done-with-other-acpi-blacklisting.patch -git-alsa-oops-fix.patch -avr32-dma-mappingh.patch -gregkh-driver-device-symlink.patch -gregkh-driver-platform-reorder-platform_device_del.patch -gregkh-driver-remove-devfs-from-maintainers.patch -gregkh-driver-driver-core-export-device_rename.patch -gregkh-driver-uio-irq.patch -scheduled-removal-of-sa_xxx-interrupt-flags-fixups-4.patch -make-drivers-char-drm-drm_vmcdrm_io_prot-static.patch -fix-saa7146_clipping_mem-size.patch -drivers-media-video-cpia_ppc-dont-use-_work_nar.patch -dvb-core-fix-several-locking-related-problems.patch -saa7134-fix-modules=n-compilation.patch -ivtv-warning-fix.patch -jdelvare-i2c-i2c-03-use-i2c_adapterdevparent-for-messages.patch -jdelvare-i2c-i2c-i801-restore-initial-state.patch -jdelvare-i2c-ds1374-check-for-workqueue-creation.patch -crash-on-evdev-disconnect.patch -expose-set_mode-method-so-it-can-be-wrapped.patch -ata_piix-remove-ugly-layering-violation.patch -pata_cmd640-multiple-updates.patch -ide-cmd64x-fix-recovery-time-calculation-take2.patch -mtd-maps-ck804xromc-pci_module_init-to-pci_register_driver.patch -mtd-chips-oops-in-cfi_amdstd_sync.patch -mtd-esb2-check-for-closed-rom-window.patch -dilnetpc-fix-warning.patch -mtd-correct-misspelled-preprocessor-variable.patch -git-netdev-all-ipw2200-fix.patch -mv643xx-ethernet-driver-irq-registration-fix.patch -via-rhine-set-avoid_d3-for-broken-bioses.patch -netxen-fix-warnings.patch -e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq.patch -e1000-fix-firmware-handover-bits.patch -e1000-fix-stop-raw-interrupts-disabled-nag-from-rt.patch -tulip-fix
Re: mm snapshot broken-out-2007-03-18-02-44.tar.gz uploaded
On Tue, 20 Mar 2007 13:47:53 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Mon, 19 Mar 2007 17:58:52 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > >>The kernel without Nick's patchset but with the assert runs OK too. Under > >>the principle of mm-has-been-too-flakey-lately, I'll drop the patches: > >> > >>mm-debug-check-for-the-fault-vs-invalidate-race.patch > >>mm-simplify-filemap_nopage.patch > >>mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch > >>mm-merge-populate-and-nopage-into-fault-fixes-nonlinear.patch > >>mm-merge-populate-and-nopage-into-fault-fixes-nonlinear-tidy.patch > >>mm-merge-nopfn-into-fault.patch > >>mm-merge-nopfn-into-fault-fix.patch > >>mm-remove-legacy-cruft.patch > > > > > > ug, too many rejects. I'll leave them in, minus > > mm-debug-check-for-the-fault-vs-invalidate-race.patch > > > > Hang on a sec... I'll try fixing the thing before you next make a > release. > Too late. hot-fixes/ awaits thee. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
Christoph Lameter writes: > +static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void > *)) > +{ ... > + p = (void *)__get_free_page(flags | __GFP_ZERO); This will cause problems on 64-bit powerpc, at least with 4k pages, since the pmd and pgd levels only use 1/4 of a page. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
On Sat, Mar 17, 2007 at 02:26:57PM +0100, Andi Kleen wrote: > Arjan van de Ven <[EMAIL PROTECTED]> writes: > > > > well we can do the handshake to take ownership like we do much later in > > boot, but that requires PCI to be there and fully discovered, which we > > don't have this early. > > That's not true - we do early pci discovery. Doing USB handsoff > there would be quite possible. What, we don't do USB "handoff" early enough in the boot process? It's happening at PCI quirk time now, which I think should be early enough for everyone (and too early for some who rely on USB keyboards and initramfs shells...) thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysctl: vfs_cache_divisor
Randy Dunlap wrote: The we duplicate all the relevant /proc knobs: cat /proc/sys/vm/dirty_ratio 30 cat /proc/sys/vm/hires-dirty_ratio/ 30 Or we do something else ;) Sounds better. I wasn't very keen on the userspace interface that this exposed. Will look at those. Okay... may be I could throw a spanner in the machinery, and suggest another option: perhaps we should add a way to do sysctl which can handle fractional (fixed-point) values... more coherent/detailed message tomorrow. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
On Mon, 2007-03-19 at 18:00 -0800, Zachary Amsden wrote: > Rusty Russell wrote: > > *This* was the reason that the current hand-coded calls only clobber % > > eax. It was a compromise between native (no clobbers) and others (might > > need a reg). > > I still don't think this was a good trade. ... > Xen no longer needs such a register Hmm, well, if VMI is happy, Xen is happy, and lguest is happy, then perhaps we're better off with a cc-only clobber rule? Certainly makes life simpler. > > Now, since we decided to allow paravirt_ops operations to be normal C > > (ie. the patching is optional and done late), we actually push and pop % > > ecx and %edx. This makes the call site 10 bytes long, which is a nice > > size for patching anyway (enough for a movl $0, , a-la lguest's > > cli, or movw $0, %gs: if we supported SMP). > > You can do it in 11 bytes with no clobbers and normal C semantics by > linking to a direct address instead of calling to an indirect, but then > you need some gross fixup technology in paravirt_patch: > > if (call_addr == (void*)native_sti) { > ... > } Well, I don't think we need such hacks: since we have to use handcoded asm and mark the callsites anyway, marking what they're calling is trivial. The other idea from "btfixup" is that we can do the patching *much* earlier, so we don't need the initial code to be valid at all if we wanted to: we just need room to patch in a call insn. We could then generate trampolines which do the necessary pushes & pops automatically for backends which want to use C calling conventions. Perhaps it's time for code and benchmarks? Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
David Miller <[EMAIL PROTECTED]> writes: > From: Linus Torvalds <[EMAIL PROTECTED]> > Date: Mon, 19 Mar 2007 20:18:14 -0700 (PDT) > >> > > Please don't subject us to another couple months of hair-pulling only >> > > to have Linus yank the thing out again, there are certainly more >> > > useful things to spend time on :-) >> >> Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it >> out, I simply won't merge it. It was more than just totally buggy code, it >> was an inability of the people to understand that even bugfree code >> isn't enough - you have to be able to also handle buggy data. > > Thank you. Hmm.. I know the feeling I have had a similar rant about the kexec on panic code path. The code is still no where near as paranoid about normal kernel things not working as it could be, but by ranting about it periodically the people doing the work are gradually making it better. I'm conflicted about the dwarf unwinder. I was off doing other things at the time so I missed the pain, but I do have a distinct recollection of the back traces on x86_64 being distinctly worse the on i386. Lately I haven't seen that so it may be I was misinterpreting what I was seeing, and the compiler optimizations were what gave me such weird back traces. But if the quality of our backtraces has gone down and dwarf unwinder could give us better back traces it is likely worth pursuing. Of course it would need to start with the assumption that it's tables may be borked (the kernel is busted after all) and be much more careful than Andi's last attempt. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] [patch 29/31] Input: i8042 - fix AUX IRQ delivery check
On Mon, Mar 19, 2007 at 05:48:55PM -0400, Dmitry Torokhov wrote: > On 3/19/07, Greg KH <[EMAIL PROTECTED]> wrote: > > -stable review patch. If anyone has any objections, please let us know. > > > > -- > > > > From: Dmitry Torokhov <[EMAIL PROTECTED]> > > > > Input: i8042 - fix AUX IRQ delivery check > > > > On boxes that do not implement AUX LOOP command we can not > > verify AUX IRQ delivery and must assume that it is wired > > properly. > > > > Greg, > > There is another piece missing in AUX delivery test, commit > > 3ca5de6dd4ec5a139b2b8f00dce3e4726ca91af1 > > Unfortunately I can't send you a patch at the moment but if you could > get it from the mainline that would be great. Thanks for letting me know, I've added it to the queue now. greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 03/31] Fix user copy length in ipv6_sockglue.c
On Mon, Mar 19, 2007 at 03:01:25PM -0700, Chris Wright wrote: > * Greg KH ([EMAIL PROTECTED]) wrote: > > From: Chris Wright <[EMAIL PROTECTED]> > > > > [IPV6] fix ipv6_getsockopt_sticky copy_to_user leak > > > > User supplied len < 0 can cause leak of kernel memory. > > Use unsigned compare instead. > > You can drop this one. It's dependent on a patch > that is not in 2.6.20. Ok, thanks for letting me know, it is now dropped. greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Linus Torvalds <[EMAIL PROTECTED]> Date: Mon, 19 Mar 2007 20:18:14 -0700 (PDT) > > > Please don't subject us to another couple months of hair-pulling only > > > to have Linus yank the thing out again, there are certainly more > > > useful things to spend time on :-) > > Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it > out, I simply won't merge it. It was more than just totally buggy code, it > was an inability of the people to understand that even bugfree code > isn't enough - you have to be able to also handle buggy data. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.22 3/3] Add LED trigger to libata core
Tony Vroon wrote: The first user of ata_ac_issue_prot_with_ledtrigger, the ServerWorks Frodo/ Apple K2 driver. Used by the IDE LED trigger on G5 towers. Respin of an earlier patch, based on comments by Tejun Heo & Alan Cox. Just two comments. 1. IMHO, ata_qc_issue_prot_ledtrigger() without 'with' is good enough. This is just my personal preference. Feel free to ignore it. 2. Patch #1 and #2 should be merged. They're one logical change of adding ata_qc_issue_prot_with_ledtrigger(). Patch #3 is a logically separate change of using it, but unless it's a wide conversion, implementing something and using something can be merged. So, please merge #1 and #2 and possibly #3. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
On Mon, 19 Mar 2007, Andi Kleen wrote: > > Initially we had some bugs that accounted for near all failures, but they > were all fixed in the latest version. No. The real bugs were that the people involved wouldn't even accept that unwinding information was inevitably buggy and/or incomplete. That much more fundamental bug never got fixed, as far as I know. I'm not going to merge anything that depends on unwind tables as things stand. The pain just isn't worth it. > > Please don't subject us to another couple months of hair-pulling only > > to have Linus yank the thing out again, there are certainly more > > useful things to spend time on :-) Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it out, I simply won't merge it. It was more than just totally buggy code, it was an inability of the people to understand that even bugfree code isn't enough - you have to be able to also handle buggy data. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.31
On Mon, 19 Mar 2007, Xavier Bestel wrote: > > >> Stock scheduler wins easily, no contest. > > > > > > What happens when you renice X ? > > > > Dunno -- not necessary with the stock scheduler. > > Could you try something like renice -10 $(pidof Xorg) ? Could you try something as simple and accepting that maybe this is a problem? Quite frankly, I was *planning* on merging RSDL very early after 2.6.21, but there is one thing that has turned me completely off the whole thing: - the people involved seem to be totally unwilling to even admit there might be a problem. This is like alcoholism. If you cannot admit that you might have a problem, you'll never get anywhere. And quite frankly, the RSDL proponents seem to be in denial ("we're always better", "it's your problem if the old scheduler works better", "just one report of old scheduler being better"). And the thing is, if people aren't even _willing_ to admit that there may be issues, there's *no*way*in*hell* I will merge it even for testing. Because the whole and only point of merging RSDL was to see if it could replace the old scheduler, and the most important feature in that case is not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO FIX THE INEVITABLE PROBLEMS! See? Can you people not see that the way you're doing that "RSDL is perfect" chorus in the face of people who report problems, you're just making it totally unrealistic that it will *ever* get merged. So unless somebody steps up to the plate and actually *talks* about the problem reports, and admits that maybe RSDL will need some tweaking, I'm not going to merge it. Because there is just _one_ thing that is more important than code - and that is the willingness to fix the code... Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm snapshot broken-out-2007-03-18-02-44.tar.gz uploaded
Andrew Morton wrote: On Mon, 19 Mar 2007 17:58:52 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: The kernel without Nick's patchset but with the assert runs OK too. Under the principle of mm-has-been-too-flakey-lately, I'll drop the patches: mm-debug-check-for-the-fault-vs-invalidate-race.patch mm-simplify-filemap_nopage.patch mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch mm-merge-populate-and-nopage-into-fault-fixes-nonlinear.patch mm-merge-populate-and-nopage-into-fault-fixes-nonlinear-tidy.patch mm-merge-nopfn-into-fault.patch mm-merge-nopfn-into-fault-fix.patch mm-remove-legacy-cruft.patch ug, too many rejects. I'll leave them in, minus mm-debug-check-for-the-fault-vs-invalidate-race.patch Hang on a sec... I'll try fixing the thing before you next make a release. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ignore this posting
Just trying to generate an example bounce so Intel can fix their attachment email filters, ignore me. #!/bin/sh # # Usage: git suck path-to-tree # # Pull all patches relative to 'origin' from the tree specified # and apply them to the current directory tree, keeping all changelog # and authorship information identical. It will update the dates # of the changes of course. (cd $1; git format-patch --suffix=.txt origin) || exit 1 for i in $1/*.txt do sed 's/\[PATCH\] //' <$i >tmp.patch git-applymbox -k tmp.patch || exit 1 done rm -f tmp.patch
RE: UDP packets scheduling
> can anyone suggest me a proper way how to schedule UDP packets to > transmit at > some given rate? > > E.g., I have two boxes both having 10 GE interfaces. One box is able to > transmit at 9.9Gbps, the other one is able to receive only at > about 5.5Gbps. > Flow control must be turned off for some other reason. UDP is not a very good choice of protocol for this purpose. UDP pushes the transmit timing job into user-space, where it cannot be done particularly well. > How can I put delay between subsequent msg sends to achieve desired > packet rate without loses, e.g., 3.5Gbps without bursts? Even nanosleep() > with the lowest possible delay seems to be too much delay. Busy loop with > clock_gettime(3) works OK on SMP boxes, but on UP it causes problems. Why do you want to avoid bursts? You're going to be bursting between 10Gb/s and 0 anyway. It sounds like you're deliberately putting impossible requirements on yourself choosing the worst possible protocol and demanding the pacing be perfect. I don't think the technology to do that is here yet, but why would you possibly need it? 10GE cards tend to have large buffers precisely because it's not possible to get the timing even. Why is burstiness a problem? DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [6/6] 2.6.21-rc4: known regressions
From: Adrian Bunk <[EMAIL PROTECTED]> Date: Sun, 18 Mar 2007 19:49:38 +0100 > Subject: ipv6 crash > References : http://lkml.org/lkml/2007/3/10/2 > Submitter : Len Brown <[EMAIL PROTECTED]> > Status : unknown This is caused by some problem in the router round-robin code in net/ipv6/route.c:rt6_select() Somehow it NULLs out fn->leaf, and then fib6_add_1() crashes dererencing that NULL pointer as is seen in the report. Deleting the router round-robin list mangling code in rt6_select() makes the crash go away, but such a change causes regressions in the ipv6 conformance tests. Thomas Graf discovered this bug some time ago, but we still haven't come up with a fix suitable for upstream :-/ This bug has been there for a very long time and is not a regression of 2.6.21 I'll see if I can come up with something to fix this properly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
On Wed, 2007-03-14 at 16:33 -0700, Siddha, Suresh B wrote: > On Tue, Mar 13, 2007 at 05:08:59AM -0700, Nick Piggin wrote: > > I would agree that it points to MySQL scalability issues, however the > > fact that such large gains come from tcmalloc is still interesting. > > What glibc version are you, Anton and others are using? > > Does that version has this fix included? > > Dynamically size mmap treshold if the program frees mmaped blocks. > > http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/malloc/malloc.c.diff?r1=1.158&r2=1.159&cvsroot=glibc > Last week, I reproduced it on RHEL4U3 with glibc 2.3.4-2.19. Today, I installed RHEL5GA and reproduced it again. RHEL5GA uses glibc 2.5-12 which already includes the dynamically size mmap threshold patch, so this patch doesn’t resolve the issue. The problem is really relevant to malloc/free of glibc multi-thread. My paxville has 16 logical cpu (dual core+HT). I disabled HT by hot removing the last 8 logical processors. I captured the schedule status. When sysbench thread=8 (best performance), there are about 3.4% context switches caused by __down_read/__down_write_nested. When sysbench thread=10 (best performance), the percentage becomes 11.83%. I captured the thread status by gdb. When sysbench thread=10, usually 2 threads are calling mprotect/mmap. When sysbench thread=8, there are no threads calling mprotect/mmap. Such capture has random behavior, but I tried for many times. I think the increased percentage of context switch related to __down_read/__down_write_nested is caused by mprotect/mmap. mprotect/mmap accesses the semaphore of vm, so there are some contentions on the sema which make performance down. The strace shows mysqld often calls mprotect/mmap with the same data length 61440. That’s another evidence. Gdb showed such mprotect is called by init_io_malloc=>my_malloc=>malloc=>init_malloc=>mprotect. Mmap is caused by __init_free=>mmap. I checked the source codes of glibc and found the real call chains are malloc=>init_malloc=>grow_heap=>mprotect and __init_free=>heap_trim=>mmap. I guess the transaction processing of mysql/sysbench is: mysql accepts a connection and initiates a block for the connection. After processing a couple of transactions, sysbench closes the connection. Then, restart the procedure. So why are there so many mprotect/mmap? Glibc uses arena to speedup malloc/free at multi-thread environment. mp.trim_threshold only controls main_arena. In function __init_free, FASTBIN_CONSOLIDATION_THRE might be helpful, but it’s a fixed value. The *ROOT CAUSE* is dynamic thresholds don’t apply to non-main arena. To verify my idea, I created a small patch. When freeing a block, always check mp_.trim_threshold even though it might not be in main arena. The patch is just to verify my idea instead of the final solution. --- glibc-2.5-20061008T1257_bak/malloc/malloc.c 2006-09-08 00:06:02.0 +0800 +++ glibc-2.5-20061008T1257/malloc/malloc.c 2007-03-20 07:41:03.0 +0800 @@ -4607,10 +4607,13 @@ _int_free(mstate av, Void_t* mem) } else { /* Always try heap_trim(), even if the top chunk is not large, because the corresponding heap might go away. */ + if ((unsigned long)(chunksize(av->top)) >= + (unsigned long)(mp_.trim_threshold)) { heap_info *heap = heap_for_ptr(top(av)); assert(heap->ar_ptr == av); heap_trim(heap, mp_.top_pad); + } } } With the patch, I recompiled glibc and reran sysbench/mysql. The result is good. When thread number is larger than 8, the tps and response time(avg) are smooth, and don't drop severely. Is there anyone being able to test it on AMD machine? Yanmin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: MPT Fusion LSI22320 , Domain validation loops .
On Saturday, March 17, 2007 2:33 PM, James W. Laferriere wrote: > Hello All , I am have been having this problem since I > purchased the > controller and after changing out the disks I thought were > the problem . > I am still getting the continous : > > mptscsih: ioc1: attempting task abort! (sc=f7a64500) > scsi 3:0:4:0: > command: Inquiry: 12 00 00 00 60 00 > mptbase: Initiating ioc1 recovery > mptscsih: ioc1: task abort: SUCCESS (sc=f7a64500) > target3:0:4: Domain Validation detected failure, dropping back > target3:0:4: Domain Validation skipping write tests > target3:0:4: Ending Domain Validation > target3:0:4: asynchronous > target3:0:5: Beginning Domain Validation > mptscsih: ioc0: attempting target reset! (sc=f7a64380) > > The acutual device id's change and the driver > continously resets the > busses & starts all over . > > The disks are in a HP DS-SL13R-BA 4354R 14drive ultra3 > racKmount cabinet > w/ dualbus & dualps , Which seems to present a ID6 , That > does not show up in > any of the bus scans . > > Now I have previously had the same cabinet with 18gb > disks which had the > same problem with this controller . BUT I also have a LSI > Logic / Symbios > Logic 53c1010 66MHz Ultra3 dual SCSI bus Adapter which works > flawlessly with the > 18gb disks in this very same cabinet . > The cables for connecting the adapter(s) to tha cabinet > are less than 24 > inches in length . > > Would anyone please shed some light on what it is I am > doing wrong or > need to do or ? Too have this controller recognise these > disk drives in > this cabinet . There is a seperate mailing list for scsi releated issues, e.g. [EMAIL PROTECTED] I've posted a patch to address your issue several times, however it seems its not been picked up by the scsi subsystem maintainer. The last time it was posted was here: http://marc.info/?l=linux-scsi&m=117089244809072&w=2 An alternative is you could obtain our latest drivers from the LSI download site, where these drivers should have this patch http://www.lsilogic.com/cm/DownloadSearch.do. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm snapshot broken-out-2007-03-18-02-44.tar.gz uploaded
On Mon, 19 Mar 2007 17:58:52 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > The kernel without Nick's patchset but with the assert runs OK too. Under > the principle of mm-has-been-too-flakey-lately, I'll drop the patches: > > mm-debug-check-for-the-fault-vs-invalidate-race.patch > mm-simplify-filemap_nopage.patch > mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch > mm-merge-populate-and-nopage-into-fault-fixes-nonlinear.patch > mm-merge-populate-and-nopage-into-fault-fixes-nonlinear-tidy.patch > mm-merge-nopfn-into-fault.patch > mm-merge-nopfn-into-fault-fix.patch > mm-remove-legacy-cruft.patch ug, too many rejects. I'll leave them in, minus mm-debug-check-for-the-fault-vs-invalidate-race.patch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
Rusty Russell wrote: On Mon, 2007-03-19 at 11:38 -0700, Linus Torvalds wrote: On Mon, 19 Mar 2007, Eric W. Biederman wrote: True. You can use all of the call clobbered registers. Quite often, the biggest single win of inlining is not so much the code size (although if done right, that will be smaller too), but the fact that inlining DOES NOT CLOBBER AS MANY REGISTERS! For VMI, the default clobber was "cc", and you need a way to allow at least that, because saving and restoring flags is too expensive on x86. Thanks Linus. *This* was the reason that the current hand-coded calls only clobber % eax. It was a compromise between native (no clobbers) and others (might need a reg). I still don't think this was a good trade. The primary motivation for clobbering %eax was that Xen wanted a free register to use for computing the offset into the shared data in the case of SMP preemptible kernels. Xen no longer needs such a register, they can use the PDA offset instead. And it does hurt native performance by unconditionally stealing a register in the four most commonly invoked paravirt-ops code sequences. Now, since we decided to allow paravirt_ops operations to be normal C (ie. the patching is optional and done late), we actually push and pop % ecx and %edx. This makes the call site 10 bytes long, which is a nice size for patching anyway (enough for a movl $0, , a-la lguest's cli, or movw $0, %gs: if we supported SMP). You can do it in 11 bytes with no clobbers and normal C semantics by linking to a direct address instead of calling to an indirect, but then you need some gross fixup technology in paravirt_patch: if (call_addr == (void*)native_sti) { ... } I think we should probably try to do it in 12 bytes. Freeing eax to the inline caller is likely to make up the 2 bytes of space more we have to nop. One thing I always tried to get in VMI was to encapsulate the actual code which went through the business of computing arguments that were not even used in the native case. Unfortunately, that seems impossible in the current design, but I don't think it is an issue because I don't think there is actually a way to express: SWITCHABLE_CODE_BLOCK_BEGIN { /* arbitrary C code for native */ } SWITCHABLE_CODE_BLOCK_ALTERNATIVE { /* arbitrary C code for something else */ } Dave's linker suggestion is probably the best for things like that. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm snapshot broken-out-2007-03-18-02-44.tar.gz uploaded
On Mon, 19 Mar 2007 22:37:46 +0100 "Michal Piotrowski" <[EMAIL PROTECTED]> wrote: > On 19/03/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > > On Mon, 19 Mar 2007 20:23:40 +0100 > > Michal Piotrowski <[EMAIL PROTECTED]> wrote: > > > > > [EMAIL PROTECTED] napisał(a): > > > > The mm snapshot broken-out-2007-03-18-02-44.tar.gz has been uploaded to > > > > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-03-18-02-44.tar.gz > > > > > > > > It contains the following patches against 2.6.21-rc4: > > > > > > > > > > [ cut here ] > > > kernel BUG at mm/filemap.c:123! > > > invalid opcode: [#1] > > > PREEMPT SMP > > > last sysfs file: devices/platform/w83627hf.656/temp2_input > > > Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nfsd exportfs lockd > > > nfs_acl autofs4 sunrpc af_packet nf_conntrack_netbios_ns ipt_REJECT > > > nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter > > > ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 > > > binfmt_misc thermal processor fan container nvram snd_intel8x0 > > > snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event > > > snd_seq snd_seq_device snd_pcm_oss evdev snd_mixer_oss snd_pcm intel_agp > > > agpgart snd_timer snd soundcore i2c_i801 snd_page_alloc ide_cd cdrom rtc > > > unix > > > CPU:0 > > > EIP:0060:[]Not tainted VLI > > > EFLAGS: 00010002 (2.6.21-rc4-mm1 #13) > > > EIP is at __remove_from_page_cache+0x42/0x4a > > > eax: 0001 ebx: ca263a58 ecx: c043c968 edx: 0001 > > > esi: c6ad3480 edi: ebp: c968dde8 esp: c968dde0 > > > ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > > > Process bash-shared-map (pid: 12273, ti=c968c000 task=c78bc030 > > > task.ti=c968c000) > > > Stack: ca263a68 c6ad3480 c968ddf8 c016161b c6ad3480 00da c968de04 > > > c016824d > > >c6ad3480 c968de88 c0168525 1000 d17dc000 > > > 0005a91a > > > ca263a58 005b 091a 0110 c54eb5e0 > > > 0004 > > > Call Trace: > > > [] show_trace_log_lvl+0x1a/0x2f > > > [] show_stack_log_lvl+0x9d/0xac > > > [] show_registers+0x1ed/0x34c > > > [] die+0x11d/0x234 > > > [] do_trap+0x8a/0xa3 > > > [] do_invalid_op+0x97/0xa1 > > > [] error_code+0x7c/0x84 > > > [] remove_from_page_cache+0x35/0x40 > > > [] truncate_complete_page+0x38/0x42 > > > [] truncate_inode_pages_range+0x2ce/0x2fe > > > [] truncate_inode_pages+0x1a/0x1c > > > [] vmtruncate+0x40/0xbb > > > [] inode_setattr+0x5c/0x137 > > > [] ext3_setattr+0x19c/0x1f8 > > > [] notify_change+0x139/0x2ec > > > [] do_truncate+0x53/0x6c > > > [] do_sys_ftruncate+0x135/0x150 > > > [] sys_ftruncate64+0x1b/0x1d > > > [] syscall_call+0x7/0xb > > > > Ugly - it's hard to determine which patch might have caused that, but I > > bet it was Nick ;) > > > > How hard is it to reproduce? > > I think that it's very easy - run bash_shared_mapping from AutoTest > for a few seconds. > Yeah, a simple `bash-shared-mapping foo 1' goes splat after a few seconds. Which indicates that the patchset just isn't working as intended, I think. Nick, did you ever run bash-shared-mapping on it? You should - it's kinda evil. I could just drop the BUG_ON, or I could drop the whole patch series. The kernel with Nick's patchset but without the assert seems to run OK. But presumably it's anonymising mapped pages, which is bad. The kernel without Nick's patchset but with the assert runs OK too. Under the principle of mm-has-been-too-flakey-lately, I'll drop the patches: mm-debug-check-for-the-fault-vs-invalidate-race.patch mm-simplify-filemap_nopage.patch mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch mm-merge-populate-and-nopage-into-fault-fixes-nonlinear.patch mm-merge-populate-and-nopage-into-fault-fixes-nonlinear-tidy.patch mm-merge-nopfn-into-fault.patch mm-merge-nopfn-into-fault-fix.patch mm-remove-legacy-cruft.patch A rollup against rc4 which includes the above patches and which is suitable for raising fixups against is at http://userweb.kernel.org/~akpm/np.gz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
Zachary Amsden wrote: > Jeremy Fitzhardinge wrote: >> If we then work out in each direction and see matched push/pops, >> then we know what registers can be trashed in the call. This also >> allows us to determine the callsite size, and therefore how much space >> we need for inlining. >> > > No, that is a very dangerous suggestion. You absolutely *cannot* do > this safely without explicitly marking the start EIP of this code. > You *must* use metadata to do that. It is never safe to disassemble > backwards or "rewind" EIP for x86 code. What do you mean the instruction before is "mov $0x52515000,%eax"? Yeah, you're right. Oh well. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] no boot with 2.6.21-rc3 and later
Jiri wrote: > Looks like it's related to some change in drivers/ide. As there have been > only 13 patches in this area between rc2 and rc3, it should take only 3 or > 4 reboots to figure the offending patch using git-bisect - could you > please give it a try? I applied all of the 2.6.21-rc2-rc3 incremental patch except for the portion applicable to "drivers/ide" files. The problem seems to be elsewhere: 2.6.21-rc3 minus the drivers/ide changes still hangs at the same spot during the boot process. Any ideas where to look next? Thanks! -- --- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org [EMAIL PROTECTED] --- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/13] signal/timer/event fds v7 - anonymous inode source ...
On Tue, 20 Mar 2007, Thomas Gleixner wrote: > > + error = -ENFILE; > > + file = get_empty_filp(); > > + if (!file) > > + goto eexit_1; > > make this "return -ENFILE;" please Done > > + inode = aino_getinode(); > > + if (IS_ERR(inode)) { > > + error = PTR_ERR(inode); > > + goto eexit_2; > > Can you please use a bit more descriptive labels ? > > e.g: > goto out_filp; Done > > +static int ainofs_delete_dentry(struct dentry *dentry) > > +{ > > + /* > > +* We faked vfs to believe the dentry was hashed when we created it. > > +* Now we restore the flag so that dput() will work correctly. > > +*/ > > + dentry->d_flags |= DCACHE_UNHASHED; > > + return 1; > > +} > > Please put either "struct ainofs_dentry_operations ..." below the next > function or move ainofs_delete_dentry() above "struct > ainofs_dentry_operations ..." > > It's annoying to lookup the protoypes and implemenation back and forth. I prefer to have all data declarations at the beginning. but if you can manage to have that requirement in the Coding Style, I'll change it ;) > > +static struct inode *aino_getinode(void) > > +{ > > + return igrab(aino_inode); > > +} > > Please use "igrab(aino_inode);" directly in this one single place above. > That saves us a prototype and an useless static function with no value. Done > > +/* > > + * A single inode exist for all aino files. On the contrary of pipes, > > + * aino inodes has no per-instance data associated, so we can avoid > > + * the allocation of multiple of them. > > + */ > > +static struct inode *aino_mkinode(void) > > +{ > > + int error = -ENOMEM; > > + struct inode *inode = new_inode(aino_mnt->mnt_sb); > > + > > + if (!inode) > > + goto eexit_1; > > return ERR_PTR(-ENOMEM); Done > > + aino_mnt = kern_mount(&aino_fs_type); > > + if (IS_ERR(aino_mnt)) > > + goto epanic; > > + > > + aino_inode = aino_mkinode(); > > + if (IS_ERR(aino_inode)) > > + goto epanic; > > + > > + return 0; > > + > > +epanic: > > + panic("aino_init() failed\n"); > > Panic ? It's not life critical - is it ? > > A printk(KERN_ERR...) and a return -Exx would be sufficient. Done. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/13] signal/timer/event fds v7 - timerfd core ...
On Tue, 20 Mar 2007, Eric Dumazet wrote: > Davide Libenzi a écrit : > > > +struct timerfd_ctx { > > + struct hrtimer tmr; > > + ktime_t tintv; > > + spinlock_t lock; > > + wait_queue_head_t wqh; > > + unsigned long ticks; > > +}; > > > +static struct kmem_cache *timerfd_ctx_cachep; > > > + timerfd_ctx_cachep = kmem_cache_create("timerfd_ctx_cache", > > + sizeof(struct timerfd_ctx), > > + 0, SLAB_PANIC, NULL, NULL); > > > Do we really expect thousands of active timerfd_ctx ? > > If not, using kmalloc()/kfree() would be fine, because sizeof(struct > timerfd_ctx) is so small. > > on SMP / NUMA platforms, each new kmem_cache is rather expensive. (memory > allocated at kmem_cache_create(), but also memory used when cache is not > empty, with slabs in freelist for each cpu/node) > > Using a general cache might be cheaper : No memory overhead for yet another > kmem_cache. > > I know individual caches are good to spot memory leaks, but in timerfd case, > you dont have mem leaks, do you ? :) Silly you, of course not :) Yes, I gues I can use kmalloc/kfree for those fds ... - Davide
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007 18:03:54 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Mon, 19 Mar 2007, Andrew Morton wrote: > > > > See the patch. We are only touching 2 cachelines instead of 32. So even > > > without considering the page allocator overhead and the slab allocator > > > overhead (which will make the situation even better) its superior. > > > > That's not proof, it is handwaving. I could wave right back at you and > > claim that the benefit from returning a cache-hot pte page back to the page > > allocator for reuse exceeds the benefit which you waved at me above. > > No you cannot make that claim. That would mean that you have to touch > 32 pages which is inferior. For pte pages (which are far more common), more than a single cacheline will be in cache. Yes, a common quicklist implementation is good. But no quicklist implementation at all is better. You say that will be slower, and you may well be right, but I say let's demonstrate that (please) rather than speculating. Then we can look at the difference and decide whether it is worth the additional complexity of this special-purpose private allocator. > > You may well be right, but nothing is proven, afaict. > > Nothing can be proven except within a rigorously defined mathematical > system but even there we are limited by such things as Russel's paradox. > > Its obvious that this is right. And there has been significant work > invested into retaining page table pages on i386, sparc64 and ia64 for > exactly the specified. I believe that work predated per-cpu-pages. > This patch does not change that at all for these 3 > arches. There is no doubt about the correctness of the approach here. > > > > You do not think that our current way of handling ptes is okay? If we do > > > not zero the ptes then we need to separate munmap from process shutdown. > > > > Yep. It's possible that process shutdown is a sufficiently common and > > costly special-case for it to be worth special-casing. > > Ok great idea but what does this have to do with this patch? This patch > simply generalizes something that has been there for ages. It has a lot to do with this patch. If we decide that it is useful to optimise the full-mm teardown case then we will need to zero these pages when we start to use them so we might as well get them straight from the page allocator. Hence this patch goes into the bitbucket. > > > The advantage of the quicklists is that it does not require a rework of > > > the pte serialization. > > > > No, these are unrelated. We can get pte pages from the page allocator and > > zero them without touching the munmap handling. > > > > But it's possible that if we _were_ to optimise the munmap handling as > > suggested, the end result would be superior. > > Andrew, this is utter crap and unrelated to this work. The main thing here > is to generalize something that various arches already do and to avoid the > page struct handling collisions. You use pie-in-the-sky to argue against > consolidating code and fixing up usage conflicts of the slab with arch > code? It is not pie-in-the-sky to ask "is this code still useful?". - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
Jeremy Fitzhardinge wrote: For example, say we wanted to put a general call for sti into entry.S, where its expected it won't touch any registers. In that case, we'd have a sequence like: push %eax push %ecx push %edx call paravirt_cli pop %edx pop %ecx pop %eax If we parse the relocs, then we'd find the reference to paravirt_cli. If we look at the byte before and see 0xe8, then we can see if its a call. If we then work out in each direction and see matched push/pops, then we know what registers can be trashed in the call. This also allows us to determine the callsite size, and therefore how much space we need for inlining. No, that is a very dangerous suggestion. You absolutely *cannot* do this safely without explicitly marking the start EIP of this code. You *must* use metadata to do that. It is never safe to disassemble backwards or "rewind" EIP for x86 code. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images
On Tue, Mar 20, 2007 at 01:42:46AM +0100, Thomas Gleixner wrote: > On Mon, 2007-03-19 at 17:32 -0500, Matt Mackall wrote: > > > > If a static volume is simply a non-dynamic volume, then device mapper > > > > can do that too. And countless other things. Which is not an aside. > > > > UBI growing to do all the things that device mapper does is exactly > > > > the thing we should be seeking to avoid. > > > > > > No it can't and device mapper sits on top of block devices. FLASH is no > > > block device. Period. > > > > Which of the following two properties does it lack? > > > > - discrete blocks > > - non-sequential access to blocks > > > > When you do the obvious s/blocks/eraseblocks/, this appears to be > > true. > > It appears to be, but it is not. You enforce semantics on a device, > which it does not have. > > > Saying "but I can't do I/O smaller than the blocksize" doesn't change > > this any more than it would for disks. > > There is a huge difference. Disk block size is 512 byte and FLASH block > size is min 16KiB and up to 256KiB. > > Just do the math: > > Write sampling data streams in 2KiB chunks to your uber devicemapper on > a 1GiB device with 64KiB erase block size: > > Fine grained FLASH aware writes allow 32 chunks in a block without > erasing the block. > > Your method erases the block 32 times to write the same amount of data. Sigh. That's the current /dev/mtdblock method, not my method. You're too fixated on what you think I'm saying to hear what I'm saying. > > Saying "but I can do smaller I/O efficiently in some circumstances" > > also doesn't change it. > > We can do it under _any_ circumstances and that _does_ change it. > Implementing a clever block device layer on top of UBI is simple and > would provide FLASH page sized I/O, i.e. 2Kib in the above example. Yes. I know. I've written a complete (non-Linux) FTL. I know what's entailed. > > In historical UNIX, some tapes were block devices too. Because they > > supported seek(). > > I'm impressed. How exactly are "some tapes" comparable to FLASH chips ? > > Your next proposal is to throw away MTD-utils and use "mt" instead ? Don't be an ass. I'm pointing out that not all block devices are disks. > > > Device mapper can not provide a simple easy to decode scheme for boot > > > loaders. We need to be able to boot out of 512 - 2048 byte of NAND FLASH > > > and be able to find the kernel or second stage boot loader in this > > > unordered device. > > > > > > And no, fixed addresses do not work. Do you want to implement device > > > mapper into your Initialial Bootloader stage ? > > > > This is exactly the same problem as booting on a desktop PC. But > > somehow LILO manages. My first Linux box had a hell of a lot less disk > > than the platform I bootstrapped (and wrote NAND drivers for) last > > month had in NAND. > > No, it is not. You get the absolute sector address of your second stage > and this is a complete nobrainer. The translation is done in the DISK > device. LILO and friends manage to boot systems that use software RAID and LVM. There are multiple methods. Some use block lists, some use tiny boot partitions, etc. All of them are applicable to controllerless NAND. > You simply ignore the fact, that inside each disk, USB Stick, CF-CARD, > whatever - there is a more or less intellegent controller device, which > does the mapping to the physical storage location. There is _NO_ such > thing on a bare FLASH chip. How many times do I have to tell you that I wrote a driver for controllerless NAND just last month? > How exactly does device mapper: > > A) across device wear levelling ? The same way UBI does, but encapsulated in a device mapper layer. > B) dynamic partitioning for FLASH aware file systems ? See above. > C) across device wear levelling for FLASH aware file systems ? See above. > D) background bit-flip corrections (copying affected blocks and recylce > the old one) ? See above. > E) allow position independent placement of the second stage bootloader ? See way above to my LILO response. > > > You need to implement a clever journalling block device > > > emulator in order to keep the data alive and the FLASH not weared out > > > within no time. You need the wear levelling, otherwise you can throw > > > away your FLASH in no time. > > > > And that's why it's in my picture. > > Yes, it is in your picture, but: > > 1) it excludes FLASH aware file systems and UBI does not. > 2) your picture does still not explain how it does achive the above A), > B), C), D) and E) > > Your extra path for partitioning(4) and JFFS2 is just a weird hack, > which makes your proposal completely absurd. No, it's just there to show the flexibility of device mapper. But I have the sneaking suspicion you have no idea how device mapper works. In brief: device mapper takes one or more devices, applies a mapping to them, and returns a new device. For example, take various spans of /dev/hda1 and /dev/sda3 and presen
Re: BUG lapic: Can't boot on battery (2.6.21-rc{1,2,3,4})
On Mon, 2007-03-19 at 22:51 +0100, Stefan Prechtel wrote: > 2007/3/19, Thomas Gleixner <[EMAIL PROTECTED]>: > > On Mon, 2007-03-19 at 21:35 +0100, Stefan Prechtel wrote: > > >CPU0 CPU1 > > > 0: 28289 0 local-APIC-edge-fasteio timer > > > ... > > > LOC: 28237 28236 > > > > > > after a read: (I hope that is this what you want :-) > > >CPU0 CPU1 > > > 0: 30344 0 local-APIC-edge-fasteio timer > > > ... > > > LOC: 30292 30291 > > > > Is this with AC plugged in ? If yes, please provide the same numbers for > > battery mode. > > Yes. And here is the output for battery mode (2.6.20): >CPU0 CPU1 > 0: 292153 0 local-APIC-edge-fasteio timer > LOC: 292114 292113 > >CPU0 CPU1 > 0: 293263 0 local-APIC-edge-fasteio timer > LOC: 293224 293223 Hmm. Can you please apply the following patch on top of 2.6.20 and check, if the WARN_ON_ONCE triggers when you boot w/o AC plugged ? Thanks, tglx Index: linux-2.6.20/arch/i386/kernel/apic.c === --- linux-2.6.20.orig/arch/i386/kernel/apic.c +++ linux-2.6.20/arch/i386/kernel/apic.c @@ -1174,6 +1174,8 @@ void switch_APIC_timer_to_ipi(void *cpum cpumask_t mask = *(cpumask_t *)cpumask; int cpu = smp_processor_id(); + WARN_ON_ONCE(1); + if (cpu_isset(cpu, mask) && !cpu_isset(cpu, timer_bcast_ipi)) { disable_APIC_timer(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007, Andrew Morton wrote: > > + > > +#ifdef CONFIG_QUICKLIST > > + > > +#ifndef CONFIG_NR_QUICK > > +#define CONFIG_NR_QUICK 1 > > +#endif > > No, please don't define config items like this. Do it in Kconfig. They can be set up in the arch specific Kconfig. Ok. I moved the #ifndef .. #endif into mm/Kconfig. > These guys seem to have multiple callsites for ia64 at least and probably > would benefit from being uninlined. Then they would no longer be optimizable. Right now one can compile out the constructor / destructor support and provide a constant list number as well as constant gfp masks. This can be very small and benefit tremendously from inlining. Many arches do not need some features and there are only a few call sites. > > +void quicklist_check(int nr, void (*dtor)(void *)); > > +unsigned long quicklist_total_size(void); > > + > > +#else > > +void quicklist_check(int nr, void (*dtor)(void *)) > > +{ > > +} > > + > > +unsigned long quicklist_total_size(void) > > +{ > > + return 0; > > +} > > +#endif > > That obviouslty won't link and wasn't tested. Making these static inline > will help. Hmmm... We could drop these conmpletely. If an arch does not use quicklists then they should not be calling these. > > +#include > > +#include > > + > > +DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK]; > > If we uninline those big inlines, this can perhaps be made static. Yeah but we want the inlines. > > > +#define MIN_PAGES 25 > > +#define MAX_FREES_PER_PASS 16 > > +#define FRACTION_OF_NODE_MEM 16 > > Are these constants optimal for all architectures? I added them as parameters to quicklist_trim so that an arch can specify their own settings. > > + return min(pages_to_free, (long)MAX_FREES_PER_PASS); > > +} > > min_t and max_t are the standard way of avoiding that warning. Or stick a > UL on the constants (which is probably better). We do not need those since the constants are now parameters. > > > +void quicklist_check(int nr, void (*dtor)(void *)) > > +{ > > + long pages_to_free; > > + struct quicklist *q; > > + > > + q = &get_cpu_var(quicklist)[nr]; > > + if (q->nr_pages > MIN_PAGES) { > > + pages_to_free = min_pages_to_free(q); > > + > > + while (pages_to_free > 0) { > > + void *p = quicklist_alloc(nr, 0, NULL); > > + > > + if (dtor) > > + dtor(p); > > + free_page((unsigned long)p); > > + pages_to_free--; > > + } > > + } > > + put_cpu_var(quicklist); > > +} > > The use of a literal 0 as a gfp_t is a bit ugly. I assume that we don't > care because we should never actually call into the page allocator for this > caller. But it's not terribly clear because there is no commentary > describing what this function is supposed to do. Right. Will add comments. > The name foo_check() is unfortunate: it implies that the function checks > something (ie: has no side-effects). But this function _does_ change > things and perhaps should be called quicklist_trim() or something like > that. Tradition. Dave initially named it check_pgt_cache it seems. > This function lacks any commentary, but I was able to work it out. I > think. Some nice comments would be, umm, nice. ok. Here is a fixup patch: Index: linux-2.6.21-rc3-mm2/include/linux/quicklist.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/quicklist.h 2007-03-19 17:41:42.0 -0700 +++ linux-2.6.21-rc3-mm2/include/linux/quicklist.h 2007-03-19 17:47:34.0 -0700 @@ -13,10 +13,6 @@ #ifdef CONFIG_QUICKLIST -#ifndef CONFIG_NR_QUICK -#define CONFIG_NR_QUICK 1 -#endif - struct quicklist { void *page; int nr_pages; @@ -77,18 +73,11 @@ static inline void quicklist_free(int nr put_cpu_var(quicklist); } -void quicklist_check(int nr, void (*dtor)(void *)); -unsigned long quicklist_total_size(void); +void quicklist_trim(int nr, void (*dtor)(void *), + unsigned long min_pages, unsigned long max_free); -#else -void quicklist_check(int nr, void (*dtor)(void *)) -{ -} +unsigned long quicklist_total_size(void); -unsigned long quicklist_total_size(void) -{ - return 0; -} #endif #endif /* LINUX_QUICKLIST_H */ Index: linux-2.6.21-rc3-mm2/mm/Kconfig === --- linux-2.6.21-rc3-mm2.orig/mm/Kconfig2007-03-19 17:41:42.0 -0700 +++ linux-2.6.21-rc3-mm2/mm/Kconfig 2007-03-19 17:42:49.0 -0700 @@ -220,3 +220,7 @@ config DEBUG_READAHEAD Say N for production servers. +config NR_QUICK + depends on QUICKLIST + default 1 + Index: linux-2.6.21-rc3-mm2/mm/quicklist.c === --- linux-2.6.21-rc3-mm2.orig/mm/quicklist.c2007-03-19 17:41:42.0 -0700 +++ linux-2.6.
Re: [RFC][PATCH] split file and anonymous page queues #2
Rik van Riel wrote: Split the anonymous and file backed pages out onto their own pageout queues. This we do not unnecessarily churn through lots of anonymous pages when we do not want to swap them out anyway. Please take this patch for a spin and let me know what goes well and what goes wrong. In order to make testing easier, I have put some kernel RPMs up on http://people.redhat.com/riel/vmsplit/ Any benchmark results are welcome, especially bad ones. I want to make sure this thing runs as well as the current VM in every situation, while also fixing the problems described in my previous mail. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007, Andrew Morton wrote: > > See the patch. We are only touching 2 cachelines instead of 32. So even > > without considering the page allocator overhead and the slab allocator > > overhead (which will make the situation even better) its superior. > > That's not proof, it is handwaving. I could wave right back at you and > claim that the benefit from returning a cache-hot pte page back to the page > allocator for reuse exceeds the benefit which you waved at me above. No you cannot make that claim. That would mean that you have to touch 32 pages which is inferior. > You may well be right, but nothing is proven, afaict. Nothing can be proven except within a rigorously defined mathematical system but even there we are limited by such things as Russel's paradox. Its obvious that this is right. And there has been significant work invested into retaining page table pages on i386, sparc64 and ia64 for exactly the specified. This patch does not change that at all for these 3 arches. There is no doubt about the correctness of the approach here. > > You do not think that our current way of handling ptes is okay? If we do > > not zero the ptes then we need to separate munmap from process shutdown. > > Yep. It's possible that process shutdown is a sufficiently common and > costly special-case for it to be worth special-casing. Ok great idea but what does this have to do with this patch? This patch simply generalizes something that has been there for ages. > > The advantage of the quicklists is that it does not require a rework of > > the pte serialization. > > No, these are unrelated. We can get pte pages from the page allocator and > zero them without touching the munmap handling. > > But it's possible that if we _were_ to optimise the munmap handling as > suggested, the end result would be superior. Andrew, this is utter crap and unrelated to this work. The main thing here is to generalize something that various arches already do and to avoid the page struct handling collisions. You use pie-in-the-sky to argue against consolidating code and fixing up usage conflicts of the slab with arch code? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: chrdev_open lifetime question
On Wed, 7 Mar 2007 17:23:05 -0500, "Dmitry Torokhov" <[EMAIL PROTECTED]> wrote: > It seems that if a process keeps a character device open then other > processes will also be able to get into filp->f_op->open(inode,filp) > in chrdev_open() even after a driver called cdev_del() as part of its > unwind procedure. Is this correct or am I missing something? I see no replies in the archives. Have you got any private ones? Also, what's the context? -- Pete - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/13] signal/timer/event fds v7 - anonymous inode source ...
Davide, On Mon, 2007-03-19 at 16:47 -0700, Davide Libenzi wrote: > This patch add an anonymous inode source, to be used for files that need > and inode only in order to create a file*. We do not care of having an > inode for each file, and we do not even care of having different names in > the associated dentries (dentry names will be same for classes of file*). > This allow code reuse, and will be used by epoll, signalfd and timerfd > (and whatever else there'll be). > > +int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile, > +char const *name, const struct file_operations *fops, void *priv) > +{ > + struct qstr this; > + struct dentry *dentry; > + struct inode *inode; > + struct file *file; > + int error, fd; > + > + error = -ENFILE; > + file = get_empty_filp(); > + if (!file) > + goto eexit_1; make this "return -ENFILE;" please > + inode = aino_getinode(); > + if (IS_ERR(inode)) { > + error = PTR_ERR(inode); > + goto eexit_2; Can you please use a bit more descriptive labels ? e.g: goto out_filp; > + } > + > + error = get_unused_fd(); > + if (error < 0) > + goto eexit_3; e.g: goto out_inode; > + fd = error; > + > + /* > + * Link the inode to a directory entry by creating a unique name > + * using the inode sequence number. > + */ > + error = -ENOMEM; > + this.name = name; > + this.len = strlen(name); > + this.hash = 0; > + dentry = d_alloc(aino_mnt->mnt_sb->s_root, &this); > + if (!dentry) > + goto eexit_4; e.g: goto out_fd; > +static int ainofs_delete_dentry(struct dentry *dentry) > +{ > + /* > + * We faked vfs to believe the dentry was hashed when we created it. > + * Now we restore the flag so that dput() will work correctly. > + */ > + dentry->d_flags |= DCACHE_UNHASHED; > + return 1; > +} Please put either "struct ainofs_dentry_operations ..." below the next function or move ainofs_delete_dentry() above "struct ainofs_dentry_operations ..." It's annoying to lookup the protoypes and implemenation back and forth. > +static struct inode *aino_getinode(void) > +{ > + return igrab(aino_inode); > +} Please use "igrab(aino_inode);" directly in this one single place above. That saves us a prototype and an useless static function with no value. > +/* > + * A single inode exist for all aino files. On the contrary of pipes, > + * aino inodes has no per-instance data associated, so we can avoid > + * the allocation of multiple of them. > + */ > +static struct inode *aino_mkinode(void) > +{ > + int error = -ENOMEM; > + struct inode *inode = new_inode(aino_mnt->mnt_sb); > + > + if (!inode) > + goto eexit_1; return ERR_PTR(-ENOMEM); > + inode->i_fop = &aino_fops; > +} > + > +static int ainofs_get_sb(struct file_system_type *fs_type, int flags, > + const char *dev_name, void *data, struct vfsmount *mnt) > +{ > + return get_sb_pseudo(fs_type, "aino:", NULL, AINOFS_MAGIC, mnt); > +} Please put either "struct file_system_type aino_fs_typ ..." below this function or move ainofs_get_sb() above "struct file_system_type aino_fs_typ ..." > +static int __init aino_init(void) > +{ > + > + if (register_filesystem(&aino_fs_type)) > + goto epanic; > + > + aino_mnt = kern_mount(&aino_fs_type); > + if (IS_ERR(aino_mnt)) > + goto epanic; > + > + aino_inode = aino_mkinode(); > + if (IS_ERR(aino_inode)) > + goto epanic; > + > + return 0; > + > +epanic: > + panic("aino_init() failed\n"); Panic ? It's not life critical - is it ? A printk(KERN_ERR...) and a return -Exx would be sufficient. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM doesn't work anymore in 2.6.21
On Monday 19 March 2007 22:43:20 you wrote: > Hi, > > On Monday, 19 March 2007 13:50, Tobias Doerffel wrote: > > Hi, > > > > Suspend to RAM used to work fine on my computer (Intel Core Duo, 1 GB > > RAM, Intel 82801G (ICH7-chipset) mainboard, NVIDIA-gfx-card, > > tg3-ethernet) up to 2.6.20.3. But no matter which rc of 2.6.21 I use, > > suspend to RAM doesn't work anymore. Up to rc3 even suspending stopped at > > "suspending console" which appearently seems to be fixed in rc4. I tried > > rc4-git4 with minimal config (no dyndicks, no HRT, no MSI, no sound, no > > bluetooth, no PCMCIA, no WLAN, no USB, no cpufreq) but still I can't > > resume properly. Caps works and I can login through SSH. Back to a more > > complete config (sound, MMC, WLAN, PCMCIA - still no dynticks or HRT - > > see attachment "config") I get exactly the same behaviour. > > > > When logged in through SSH after resume I saved output of dmesg (which > > includes full power management debug messages), see > > attachement "dmesg-resume". The system basically seems to be back but lot > > of things do not work such as loading/unloading e.g. my WLAN-driver > > (ipw3945), running "top" or "dstat" etc. "uptime" always returns 0 min, > > even with power management debug disabled. > > > > Kernel: > > Linux version 2.6.21-rc4 (gcc version 4.1.2 20061115 (prerelease) (Debian > > 4.1.1-21)) #23 SMP PREEMPT Mon Mar 19 12:27:56 CET 2007 I made some further investigations on this issue. A complete bisect between 2.6.20 and 2.6.21-rc4-git4 stops at a stage (a4bbb810dedaecf74d54b16b6dd3c33e95e1024c) where I'm not able to compile the kernel anymore because of compiling-errors in arch/i386/kernel/setup.c (ACPI-related compiling errors). Stepping some revisions back until it compiled again resume didn't work either. So I started all over again with bisect only on arch/i386 and ended up at ceb6c46839021d5c7c338d48deac616944660124 as the bad commit. But this file seems to be some kind of finalization of a series of patches ("ACPICA: Remove duplicate table manager") so I guess it's hard to debug this thing... > Can you please do > > # echo test > /sys/power/disk > # echo disk > /sys/power/state > > (the system should freeze tasks, suspend devices, disable nonboot CPUs, > wait for 5 seconds, enable nonboot CPUs, resume devices, thaw tasks and > return to your command prompt) and see if you can reproduce the problem? Same problem here. Works fine in 2.6.20 as well as before ceb6c46839021d5c7c338d48deac616944660124. Doesn't work on recent 2.6.21-rc4-git4. Any more information I can give? Tobias pgpsi2xdTnbth.pgp Description: PGP signature
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007 17:44:28 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Mon, 19 Mar 2007, Andrew Morton wrote: > > > Please provide proof that quicklists are superior to simply going direct to > > the page allocator for these pages. > > See the patch. We are only touching 2 cachelines instead of 32. So even > without considering the page allocator overhead and the slab allocator > overhead (which will make the situation even better) its superior. That's not proof, it is handwaving. I could wave right back at you and claim that the benefit from returning a cache-hot pte page back to the page allocator for reuse exceeds the benefit which you waved at me above. You may well be right, but nothing is proven, afaict. > > > I doubt it. The zeroing is a by product of our way of serializing pte > > > handling. Its going to be difficult to change that. > > > > Nick didn't think so, and I don't see the problem either. > > You do not think that our current way of handling ptes is okay? If we do > not zero the ptes then we need to separate munmap from process shutdown. Yep. It's possible that process shutdown is a sufficiently common and costly special-case for it to be worth special-casing. > > We'll save on some bus traffic by avoiding the writeback, but how much > > effect that will have we don't know. Presumably little. > > The advantage of the quicklists is that it does not require a rework of > the pte serialization. No, these are unrelated. We can get pte pages from the page allocator and zero them without touching the munmap handling. But it's possible that if we _were_ to optimise the munmap handling as suggested, the end result would be superior. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm2
On Mon, 19 Mar 2007 17:39:15 -0700 Andrew Morton wrote: > On Mon, 19 Mar 2007 17:27:11 -0700 > Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > On Wed, 7 Mar 2007 20:19:15 -0800 Andrew Morton wrote: > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm2/ > > > > > > - This is the same as 2.6.21-rc3-mm1, except Con's CPU scheduler changes > > > were dropped. > > > > > > This is for A/B comparison purposes, and because those changes crashed > > > on > > > one test setup. > > > > I don't quite see why this error is happening. Looks like all > > the nested #includes should handle it... > > > > CONFIG_KEXEC=y > > CONFIG_CRASH_DUMP=y > > CONFIG_UTRACE=y > > # PTRACE=n > > # PROC_FS=n > > > > In file included from arch/x86_64/kernel/crash.c:19: > > include/linux/elfcore.h: In function 'elf_core_copy_regs': > > include/linux/elfcore.h:103: error: dereferencing pointer to incomplete type > > include/linux/elfcore.h:103: error: dereferencing pointer to incomplete type > > make[1]: *** [arch/x86_64/kernel/crash.o] Error 1 > > make: *** [arch/x86_64/kernel] Error 2 > > Perhaps it's complaining about undefined pt_regs. But it's there in > asm/ptrace.h > which is included by linux/ptrace.h. Perhaps there's an include snafu which > is > causing that inclusion to not work. > > Dunno. Please send full .config to Roland ;) attached. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** config-ptrace-elfcore Description: Binary data
[RFC][PATCH] split file and anonymous page queues #2
Split the anonymous and file backed pages out onto their own pageout queues. This we do not unnecessarily churn through lots of anonymous pages when we do not want to swap them out anyway. This should (with additional tuning) be a great step forward in scalability, allowing Linux to run well on very large systems where scanning through the anonymous memory (on our way to the page cache memory we do want to evict) is slowing systems down significantly. This patch has been stress tested and seems to work, but has not been fine tuned or benchmarked yet. For now the swappiness parameter can be used to tweak swap aggressiveness up and down as desired, but in the long run we may want to simply measure IO cost of page cache and anonymous memory and auto-adjust. We apply pressure to each of sets of the pageout queues based on: - the size of each queue - the fraction of recently referenced pages in each queue, not counting used-once file pages - swappiness (file IO is more efficient than swap IO) Please take this patch for a spin and let me know what goes well and what goes wrong. More info on the patch can be found on: http://linux-mm.org/PageReplacementDesign Signed-off-by: Rik van Riel <[EMAIL PROTECTED]> Changelog: - Fix page_anon() to put all the file pages really on the file list. - Fix get_scan_ratio() to return more stable numbers, by properly keeping track of the scanned anon and file pages. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. --- linux-2.6.20.x86_64/fs/proc/proc_misc.c.vmsplit 2007-03-19 12:00:11.0 -0400 +++ linux-2.6.20.x86_64/fs/proc/proc_misc.c 2007-03-19 12:00:23.0 -0400 @@ -147,43 +147,47 @@ static int meminfo_read_proc(char *page, * Tagged format, for easy grepping and expansion. */ len = sprintf(page, - "MemTotal: %8lu kB\n" - "MemFree: %8lu kB\n" - "Buffers: %8lu kB\n" - "Cached: %8lu kB\n" - "SwapCached: %8lu kB\n" - "Active: %8lu kB\n" - "Inactive: %8lu kB\n" + "MemTotal: %8lu kB\n" + "MemFree:%8lu kB\n" + "Buffers:%8lu kB\n" + "Cached: %8lu kB\n" + "SwapCached: %8lu kB\n" + "Active(anon): %8lu kB\n" + "Inactive(anon): %8lu kB\n" + "Active(file): %8lu kB\n" + "Inactive(file): %8lu kB\n" #ifdef CONFIG_HIGHMEM - "HighTotal:%8lu kB\n" - "HighFree: %8lu kB\n" - "LowTotal: %8lu kB\n" - "LowFree: %8lu kB\n" -#endif - "SwapTotal:%8lu kB\n" - "SwapFree: %8lu kB\n" - "Dirty:%8lu kB\n" - "Writeback:%8lu kB\n" - "AnonPages:%8lu kB\n" - "Mapped: %8lu kB\n" - "Slab: %8lu kB\n" - "SReclaimable: %8lu kB\n" - "SUnreclaim: %8lu kB\n" - "PageTables: %8lu kB\n" - "NFS_Unstable: %8lu kB\n" - "Bounce: %8lu kB\n" - "CommitLimit: %8lu kB\n" - "Committed_AS: %8lu kB\n" - "VmallocTotal: %8lu kB\n" - "VmallocUsed: %8lu kB\n" - "VmallocChunk: %8lu kB\n", + "HighTotal: %8lu kB\n" + "HighFree: %8lu kB\n" + "LowTotal: %8lu kB\n" + "LowFree:%8lu kB\n" +#endif + "SwapTotal: %8lu kB\n" + "SwapFree: %8lu kB\n" + "Dirty: %8lu kB\n" + "Writeback: %8lu kB\n" + "AnonPages: %8lu kB\n" + "Mapped: %8lu kB\n" + "Slab: %8lu kB\n" + "SReclaimable: %8lu kB\n" + "SUnreclaim: %8lu kB\n" + "PageTables: %8lu kB\n" + "NFS_Unstable: %8lu kB\n" + "Bounce: %8lu kB\n" + "CommitLimit:%8lu kB\n" + "Committed_AS: %8lu kB\n" + "VmallocTotal: %8lu kB\n" + "VmallocUsed:%8lu kB\n" + "VmallocChunk: %8lu kB\n", K(i.totalram), K(i.freeram), K(i.bufferram), K(cached), K(total_swapcache_pages), - K(global_page_state(NR_ACTIVE)), - K(global_page_state(NR_INACTIVE)), + K(global_page_state(NR_ACTIVE_ANON)), + K(global_page_state(NR_INACTIVE_ANON)), + K(global_page_state(NR_ACTIVE_FILE)), + K(global_page_state(NR_INACTIVE_FILE)), #ifdef CONFIG_HIGHMEM K(i.totalhigh), K(i.freehigh), --- linux-2.6.20.x86_64/fs/mpage.c.vmsplit 2007-02-04 13:44:54.0 -0500 +++ linux-2.6.20.x86_64/fs/mpage.c 2007-03-19 12:00:23.0 -0400 @@ -408,12 +408,12 @@ mpage_readpages(struct address_space *ma &first_logical_block, get_block); if (!pagevec_add(&lru_pvec, page)) -__pagevec_lru_add(&lru_pvec); +__pagevec_lru_add_file(&lru_pvec); } else { page_cache_release(page); } } - pagevec_lru_add(&lru_pvec); + pagevec_lru_add_file(&lru_pvec); BUG_ON(!list_empty(pages)); if (bio) mpage_bio_submit(READ, bio); --- linux-2.6.20.x86_64/fs/cifs/file.c.vmsplit 2007-03-19 12:00:10.0 -0400 +++ linux-2.6.20.x86_64/fs/cifs/file.c 2007-03-19 12:00:23.0 -0400 @@ -1746,7 +1746,7 @@ static void cifs_copy_cache_pages(struct SetPageUptodate(page); unlock_page(page); if (!pagevec_add(plru_pvec, page)) - __pagevec_lru_add(plru_p
UDP packets scheduling
Hello, can anyone suggest me a proper way how to schedule UDP packets to transmit at some given rate? E.g., I have two boxes both having 10 GE interfaces. One box is able to transmit at 9.9Gbps, the other one is able to receive only at about 5.5Gbps. Flow control must be turned off for some other reason. How can I put delay between subsequent msg sends to achieve desired packet rate without loses, e.g., 3.5Gbps without bursts? Even nanosleep() with the lowest possible delay seems to be too much delay. Busy loop with clock_gettime(3) works OK on SMP boxes, but on UP it causes problems. -- Lukáš Hejtmánek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND 1/1] crypto API: RSA algorithm patch (kernel version 2.6.20.1)
Tasos Parisinos <[EMAIL PROTECTED]> : [...] RSA is slow. syscalls are fast. Which part of the kernel is supposed to benefit from this code ? -- Ueimor - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007, Andrew Morton wrote: > Please provide proof that quicklists are superior to simply going direct to > the page allocator for these pages. See the patch. We are only touching 2 cachelines instead of 32. So even without considering the page allocator overhead and the slab allocator overhead (which will make the situation even better) its superior. > > I doubt it. The zeroing is a by product of our way of serializing pte > > handling. Its going to be difficult to change that. > > Nick didn't think so, and I don't see the problem either. You do not think that our current way of handling ptes is okay? If we do not zero the ptes then we need to separate munmap from process shutdown. > We'll save on some bus traffic by avoiding the writeback, but how much > effect that will have we don't know. Presumably little. The advantage of the quicklists is that it does not require a rework of the pte serialization. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm2
On Mon, 19 Mar 2007 17:27:11 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > On Wed, 7 Mar 2007 20:19:15 -0800 Andrew Morton wrote: > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm2/ > > > > - This is the same as 2.6.21-rc3-mm1, except Con's CPU scheduler changes > > were dropped. > > > > This is for A/B comparison purposes, and because those changes crashed on > > one test setup. > > I don't quite see why this error is happening. Looks like all > the nested #includes should handle it... > > CONFIG_KEXEC=y > CONFIG_CRASH_DUMP=y > CONFIG_UTRACE=y > # PTRACE=n > # PROC_FS=n > > In file included from arch/x86_64/kernel/crash.c:19: > include/linux/elfcore.h: In function 'elf_core_copy_regs': > include/linux/elfcore.h:103: error: dereferencing pointer to incomplete type > include/linux/elfcore.h:103: error: dereferencing pointer to incomplete type > make[1]: *** [arch/x86_64/kernel/crash.o] Error 1 > make: *** [arch/x86_64/kernel] Error 2 Perhaps it's complaining about undefined pt_regs. But it's there in asm/ptrace.h which is included by linux/ptrace.h. Perhaps there's an include snafu which is causing that inclusion to not work. Dunno. Please send full .config to Roland ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/13] signal/timer/event fds v7 - timerfd core ...
Davide Libenzi a écrit : +struct timerfd_ctx { + struct hrtimer tmr; + ktime_t tintv; + spinlock_t lock; + wait_queue_head_t wqh; + unsigned long ticks; +}; +static struct kmem_cache *timerfd_ctx_cachep; + timerfd_ctx_cachep = kmem_cache_create("timerfd_ctx_cache", + sizeof(struct timerfd_ctx), + 0, SLAB_PANIC, NULL, NULL); Do we really expect thousands of active timerfd_ctx ? If not, using kmalloc()/kfree() would be fine, because sizeof(struct timerfd_ctx) is so small. on SMP / NUMA platforms, each new kmem_cache is rather expensive. (memory allocated at kmem_cache_create(), but also memory used when cache is not empty, with slabs in freelist for each cpu/node) Using a general cache might be cheaper : No memory overhead for yet another kmem_cache. I know individual caches are good to spot memory leaks, but in timerfd case, you dont have mem leaks, do you ? :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images
On Mon, 2007-03-19 at 16:36 -0500, Matt Mackall wrote: > On Mon, Mar 19, 2007 at 11:06:33PM +0200, Artem Bityutskiy wrote: > > On Mon, 2007-03-19 at 14:54 -0500, Matt Mackall wrote: > > > The issue is 14000 lines of patch to make a parallel subsystem. > > > > Parallel system exists since very long. One is > > flash->SW_or_HW_FTL->all_blkdev_stuff. The other is MTD->JFFS2. Think > > about _why_ there are 2 of them. Hint - reliability, performance. Your > > ranting basically says that only the first one makes sense. This is not > > true. > > A better way would be for MTD to deliver a block dev with a rich > enough interface for JFFS2 to use efficiently in the first place. Yes, > I know that can't be done with the current block dev layer. But that's > what the source is for. Why the hell would JFFS2 need a block device interface ? What's the gain ? > > We enhance the second branch, not the first, please, realize this. Both > > branches have their user base, and have always had. > > > > > iSCSI/nbd(6) > > > | > > > filesystem {swap | ext3ext3 jffs2 > > > \ | || / > > >/ \ | dm-crypt->snapshot(5) / > > > device mapper -|\ \ | / > > >| partitioning / > > >| | partitioning(4) > > >|wear leveling(3) / > > >| | / > > >| block concatenation > > >| ||| | > > >\ bad block remapping(2) > > >||| | > > > MTD raw block { raw block devices with no smarts(1) > > > / | \ \ > > > hardware { NANDNAND NAND NAND > > > > Matt, as I pointed in the first mail, flash != block device. > > And as I pointed out, you're wrong. It is both block oriented > (eraseBLOCK??) and random access. That's what a block device is. The > fact that it doesn't look like the other things that Linux currently > calls a block device and supports well is another matter. It does well matter, as it is not a block device. It is a FLASH device and you can do as much comparisons of eraseBLOCK as you want, you do not turn FLASH into a DISK. Again: Disks (including CF-Cards and USB-Sticks) have intellegent controllers, which abstract the hardware oddities away and present you a block device. > > In your picture I see NAND->MTD raw block. So am I right that you > > assume that we already have a decent FTL? The fact is that we do > > not. > > No. Look at the picture for more than two seconds, please. > > I can tell you didn't do this because you didn't manage to find (1) > which explicitly says "with no smarts". And you also cut out the footnote > where I explained what I meant by "with no smarts". > > Find the spots marked (2) and (3). These are your FTL. And where please are (2) and (3) inside of device mapper ? > > Please, bear in mind that decent FTL is difficult and an FS on top of > > FTL is slow, FTL hits performance considerably. > > ...and if you'd actually looked at the picture, you'd have seen JFFS2 > bypassing it. Along with another footnote explaining it. The (4) partitioning and JFFS2 on top is a step back from the current UBI functionality. Now we can have resizable partitioning even for JFFS2 and JFFS2 can utilize the UBI wear levelling, which is way better than the crude heuristics of JFFS2. You want to force FLASH into device mapper for some strange and no obvious reason. Just the coincidence of "eraseBLOCK" and "BLOCKdevice" is not really convincing. You impose the usage of eraseblock size on FLASH, which is simply wrong: DISK has a 1:1 relationship of "eraseblock" and minimal I/O. FLASH has not. I did the math in a different mail and I'm not buying your factor 32 FLASH life time reduction for the price of having a bunch of lines of code less in the kernel. If you really consider to run ext3, xfs or whatever on top of FLASH, please go and do the homework on CF-Cards and USB-Sticks. Run them into the fast wearout death. And device mapper does not help anything to avoid that. Running ext3 on top of FLASH with a minimal I/O size of erase block size is simply braindead. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images
On Mon, 2007-03-19 at 17:32 -0500, Matt Mackall wrote: > > > If a static volume is simply a non-dynamic volume, then device mapper > > > can do that too. And countless other things. Which is not an aside. > > > UBI growing to do all the things that device mapper does is exactly > > > the thing we should be seeking to avoid. > > > > No it can't and device mapper sits on top of block devices. FLASH is no > > block device. Period. > > Which of the following two properties does it lack? > > - discrete blocks > - non-sequential access to blocks > > When you do the obvious s/blocks/eraseblocks/, this appears to be > true. It appears to be, but it is not. You enforce semantics on a device, which it does not have. > Saying "but I can't do I/O smaller than the blocksize" doesn't change > this any more than it would for disks. There is a huge difference. Disk block size is 512 byte and FLASH block size is min 16KiB and up to 256KiB. Just do the math: Write sampling data streams in 2KiB chunks to your uber devicemapper on a 1GiB device with 64KiB erase block size: Fine grained FLASH aware writes allow 32 chunks in a block without erasing the block. Your method erases the block 32 times to write the same amount of data. Result: You wear out the flash 32 times faster. Cool feature. > Saying "but I can do smaller I/O efficiently in some circumstances" > also doesn't change it. We can do it under _any_ circumstances and that _does_ change it. Implementing a clever block device layer on top of UBI is simple and would provide FLASH page sized I/O, i.e. 2Kib in the above example. > In historical UNIX, some tapes were block devices too. Because they > supported seek(). I'm impressed. How exactly are "some tapes" comparable to FLASH chips ? Your next proposal is to throw away MTD-utils and use "mt" instead ? > > Device mapper can not provide a simple easy to decode scheme for boot > > loaders. We need to be able to boot out of 512 - 2048 byte of NAND FLASH > > and be able to find the kernel or second stage boot loader in this > > unordered device. > > > > And no, fixed addresses do not work. Do you want to implement device > > mapper into your Initialial Bootloader stage ? > > This is exactly the same problem as booting on a desktop PC. But > somehow LILO manages. My first Linux box had a hell of a lot less disk > than the platform I bootstrapped (and wrote NAND drivers for) last > month had in NAND. No, it is not. You get the absolute sector address of your second stage and this is a complete nobrainer. The translation is done in the DISK device. You simply ignore the fact, that inside each disk, USB Stick, CF-CARD, whatever - there is a more or less intellegent controller device, which does the mapping to the physical storage location. There is _NO_ such thing on a bare FLASH chip. It does not matter, whether your embedded device had more NAND space than my old CP/M machines floppy. It simply matters, that even the old CP/M floppy device had some rudimentary intellence on board. Furthermore I want to be able to get the bitflip correction on my second stage loader / kernel in the same safe way as we do it for everything else and still be able to bootstrap that from an extremly small bootloader. > > > If the right way is instead to extend the block layer and device > > > mapper to encompass the quirks of NAND in a sensible fashion, then UBI > > > should not go in. > > > > No, block layer on top of FLASH needs 80% of the functionality of UBI in > > the first place. > > Incorrect. A block-based filesystem on top of flash needs this > functionality. But a block device suitable to device mapper layering > (which then provides the functionality) does not. How exactly does device mapper: A) across device wear levelling ? B) dynamic partitioning for FLASH aware file systems ? C) across device wear levelling for FLASH aware file systems ? D) background bit-flip corrections (copying affected blocks and recylce the old one) ? E) allow position independent placement of the second stage bootloader ? > > You need to implement a clever journalling block device > > emulator in order to keep the data alive and the FLASH not weared out > > within no time. You need the wear levelling, otherwise you can throw > > away your FLASH in no time. > > And that's why it's in my picture. Yes, it is in your picture, but: 1) it excludes FLASH aware file systems and UBI does not. 2) your picture does still not explain how it does achive the above A), B), C), D) and E) Your extra path for partitioning(4) and JFFS2 is just a weird hack, which makes your proposal completely absurd. > > > Let me draw a picture so we have something to argue about: > > > > > > iSCSI/nbd(6) > > > | > > > filesystem {swap | ext3ext3 jffs2 > > > \ | || / > > >/ \ | dm-crypt->snapshot(5) / > > > device mapper -|
Re: [PATCH 2/3] swsusp: Do not use page flags
On Mon, 12 Mar 2007 22:19:20 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > Make swsusp use memory bitmaps instead of page flags for marking 'nosave' and > free pages. This allows us to 'recycle' two page flags that can be used for > other > purposes. Also, the memory needed to store the bitmaps is allocated when > necessary (ie. before the suspend) and freed after the resume which is more > reasonable. > > The patch is designed to minimize the amount of changes and there are some > nice > simplifications and optimizations possible on top of it. I am going to > implement them separately in the future. Blows up with ia64 allmodconfig due to CONFIG_PM=y, CONFIG_SOFTWARE_SUSPEND=n: kernel/power/main.c:223: error: redefinition of 'software_suspend' include/linux/suspend.h:46: error: previous definition of 'software_suspend' was here I had a look at fixing it, but it's unobvious why we're compiling most of kernel/power/main.c when CONFIG_SOFTWARE_SUSPEND=n so I'll send this series back for repair please. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm2
On Wed, 7 Mar 2007 20:19:15 -0800 Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm2/ > > - This is the same as 2.6.21-rc3-mm1, except Con's CPU scheduler changes > were dropped. > > This is for A/B comparison purposes, and because those changes crashed on > one test setup. I don't quite see why this error is happening. Looks like all the nested #includes should handle it... CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y CONFIG_UTRACE=y # PTRACE=n # PROC_FS=n In file included from arch/x86_64/kernel/crash.c:19: include/linux/elfcore.h: In function 'elf_core_copy_regs': include/linux/elfcore.h:103: error: dereferencing pointer to incomplete type include/linux/elfcore.h:103: error: dereferencing pointer to incomplete type make[1]: *** [arch/x86_64/kernel/crash.o] Error 1 make: *** [arch/x86_64/kernel] Error 2 --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/13] signal/timer/event fds v7 - signalfd core ...
On Tue, 20 Mar 2007, Oleg Nesterov wrote: > On 03/19, Davide Libenzi wrote: > > > > +static void signalfd_unlock(struct signalfd_ctx *ctx, > > + struct signalfd_lockctx *lk) > > +{ > > + unlock_task_sighand(lk->tsk, &lk->flags); > > +} > > Again, this is a matter of taste. But I can't understand why signalfd_unlock() > needs "signalfd_ctx *ctx" parameter. If we have "struct signalfd_lockctx *lk", > signalfd_lock() can setup lk->ctx if it is ever needed. With the new API, I agree. Removed. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/13] signal/timer/event fds v7 - signalfd core ...
On 03/19, Davide Libenzi wrote: > > +static void signalfd_unlock(struct signalfd_ctx *ctx, > + struct signalfd_lockctx *lk) > +{ > + unlock_task_sighand(lk->tsk, &lk->flags); > +} Again, this is a matter of taste. But I can't understand why signalfd_unlock() needs "signalfd_ctx *ctx" parameter. If we have "struct signalfd_lockctx *lk", signalfd_lock() can setup lk->ctx if it is ever needed. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/13] signal/timer/event fds v7 - signalfd core ...
On Tue, 20 Mar 2007, Oleg Nesterov wrote: > On 03/19, Davide Libenzi wrote: > > > > +struct signalfd_lockctx { > > + struct task_struct *tsk; > > + struct sighand_struct *sighand; > > + unsigned long flags; > > +}; > > signalfd_lockctx is "private" to signalfd_lock/signalfd_unlock. But > lk->sighand > is used only by signalfd_lock(). I'd suggest to remove it. Ack > > +void signalfd_deliver(struct task_struct *tsk, int sig) > > +{ > > + struct sighand_struct *sighand = tsk->sighand; > > + struct signalfd_ctx *ctx, *tmp; > > + > > + list_for_each_entry_safe(ctx, tmp, &sighand->sfdlist, lnk) { > > + /* > > +* We use a negative signal value as a way to broadcast that the > > +* sighand has been orphaned, so that we can notify all the > > +* listeners about this. Remeber the ctx->sigmask is inverted, > > +* so if the user is interested in a signal, that corresponding > > +* bit will be zero. > > +*/ > > + if (sig < 0) { > > + if (ctx->tsk == tsk) { > > + ctx->tsk = NULL; > > + list_del_init(&ctx->lnk); > > + wake_up(&ctx->wqh); > > + } > > + } else if (sig > 0) { > > + if (!sigismember(&ctx->sigmask, sig)) > > + wake_up(&ctx->wqh); > > + } > > + } > > +} > > I tried to avoid this comment, but can't help myself :) Added BUG_ON() and using "else". - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007 15:37:16 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > ... > > --- /dev/null 1970-01-01 00:00:00.0 + > +++ linux-2.6.21-rc3-mm2/include/linux/quicklist.h2007-03-16 > 02:19:15.0 -0700 > @@ -0,0 +1,95 @@ > +#ifndef LINUX_QUICKLIST_H > +#define LINUX_QUICKLIST_H > +/* > + * Fast allocations and disposal of pages. Pages must be in the condition > + * as needed after allocation when they are freed. Per cpu lists of pages > + * are kept that only contain node local pages. > + * > + * (C) 2007, SGI. Christoph Lameter <[EMAIL PROTECTED]> > + */ > +#include > +#include > +#include > + > +#ifdef CONFIG_QUICKLIST > + > +#ifndef CONFIG_NR_QUICK > +#define CONFIG_NR_QUICK 1 > +#endif No, please don't define config items like this. Do it in Kconfig. > +static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void > *)) > +{ > + struct quicklist *q; > + void **p = NULL; > + > + q =&get_cpu_var(quicklist)[nr]; > + p = q->page; > + if (likely(p)) { > + q->page = p[0]; > + p[0] = NULL; > + q->nr_pages--; > + } > + put_cpu_var(quicklist); > + if (likely(p)) > + return p; > + > + p = (void *)__get_free_page(flags | __GFP_ZERO); > + if (ctor && p) > + ctor(p); > + return p; > +} > + > +static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp) > +{ > + struct quicklist *q; > + void **p = pp; > + struct page *page = virt_to_page(p); > + int nid = page_to_nid(page); > + > + if (unlikely(nid != numa_node_id())) { > + if (dtor) > + dtor(p); > + free_page((unsigned long)p); > + return; > + } > + > + q = &get_cpu_var(quicklist)[nr]; > + p[0] = q->page; > + q->page = p; > + q->nr_pages++; > + put_cpu_var(quicklist); > +} These guys seem to have multiple callsites for ia64 at least and probably would benefit from being uninlined. > +void quicklist_check(int nr, void (*dtor)(void *)); > +unsigned long quicklist_total_size(void); > + > +#else > +void quicklist_check(int nr, void (*dtor)(void *)) > +{ > +} > + > +unsigned long quicklist_total_size(void) > +{ > + return 0; > +} > +#endif That obviouslty won't link and wasn't tested. Making these static inline will help. > +/* > + * Quicklist support. > + * > + * Quicklists are light weight lists of pages that have a defined state > + * on alloc and free. Pages must be in the quicklist specific defined state > + * (zero by default) when the page is freed. It seems that the initial idea > + * for such lists first came from Dave Miller and then various other people > + * improved on it. > + * > + * Copyright (C) 2007 SGI, > + * Christoph Lameter <[EMAIL PROTECTED]> > + * Generalized, added support for multiple lists and > + * constructors / destructors. > + */ > +#include > + > +#include > +#include > +#include > +#include > + > +DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK]; If we uninline those big inlines, this can perhaps be made static. > +#define MIN_PAGES25 > +#define MAX_FREES_PER_PASS 16 > +#define FRACTION_OF_NODE_MEM 16 Are these constants optimal for all architectures? > +static unsigned long max_pages(void) > +{ > + unsigned long node_free_pages, max; > + > + node_free_pages = node_page_state(numa_node_id(), > + NR_FREE_PAGES); > + max = node_free_pages / FRACTION_OF_NODE_MEM; > + return max(max, (unsigned long)MIN_PAGES); > +} > + > +static long min_pages_to_free(struct quicklist *q) > +{ > + long pages_to_free; > + > + pages_to_free = q->nr_pages - max_pages(); > + > + return min(pages_to_free, (long)MAX_FREES_PER_PASS); > +} min_t and max_t are the standard way of avoiding that warning. Or stick a UL on the constants (which is probably better). > +void quicklist_check(int nr, void (*dtor)(void *)) > +{ > + long pages_to_free; > + struct quicklist *q; > + > + q = &get_cpu_var(quicklist)[nr]; > + if (q->nr_pages > MIN_PAGES) { > + pages_to_free = min_pages_to_free(q); > + > + while (pages_to_free > 0) { > + void *p = quicklist_alloc(nr, 0, NULL); > + > + if (dtor) > + dtor(p); > + free_page((unsigned long)p); > + pages_to_free--; > + } > + } > + put_cpu_var(quicklist); > +} The use of a literal 0 as a gfp_t is a bit ugly. I assume that we don't care because we should never actually call into the page allocator for this caller. But it's not terribly clear because there is no commentary describing what this function is supposed to do. The name foo_check() is unfortunate: it implies that the function checks something (ie: has no side-effects). But this function _does_ change things and perhaps should
Re: [patch 2/13] signal/timer/event fds v7 - signalfd core ...
On 03/19, Davide Libenzi wrote: > > +struct signalfd_lockctx { > + struct task_struct *tsk; > + struct sighand_struct *sighand; > + unsigned long flags; > +}; signalfd_lockctx is "private" to signalfd_lock/signalfd_unlock. But lk->sighand is used only by signalfd_lock(). I'd suggest to remove it. > +void signalfd_deliver(struct task_struct *tsk, int sig) > +{ > + struct sighand_struct *sighand = tsk->sighand; > + struct signalfd_ctx *ctx, *tmp; > + > + list_for_each_entry_safe(ctx, tmp, &sighand->sfdlist, lnk) { > + /* > + * We use a negative signal value as a way to broadcast that the > + * sighand has been orphaned, so that we can notify all the > + * listeners about this. Remeber the ctx->sigmask is inverted, > + * so if the user is interested in a signal, that corresponding > + * bit will be zero. > + */ > + if (sig < 0) { > + if (ctx->tsk == tsk) { > + ctx->tsk = NULL; > + list_del_init(&ctx->lnk); > + wake_up(&ctx->wqh); > + } > + } else if (sig > 0) { > + if (!sigismember(&ctx->sigmask, sig)) > + wake_up(&ctx->wqh); > + } > + } > +} I tried to avoid this comment, but can't help myself :) This is a matter of taste, of course, but imho this is a classical "hide the problem" example. Why "else if (sig > 0)" ? sig can't be == 0. In my opinion, it is better to add BUG_ON(!sig), but use just "else". Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007 16:57:55 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Mon, 19 Mar 2007, Andrew Morton wrote: > > > Has it been proven that quicklists are superior to simply going direct to > > the > > page allocator for these pages? > > Yes. Sigh. Please provide proof that quicklists are superior to simply going direct to the page allocator for these pages. > > Would it provide a superior solution if we were to a) stop zeroing out the > > pte's when doing a fullmm==1 teardown and b) go direct to the page allocator > > for these pages? > > I doubt it. The zeroing is a by product of our way of serializing pte > handling. Its going to be difficult to change that. Nick didn't think so, and I don't see the problem either. We'll save on some bus traffic by avoiding the writeback, but how much effect that will have we don't know. Presumably little. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
unify encoding of files in Documentation
Hiyas, at the moment, some file in Documentation are utf-8 encoded and some are latin1 encoded. Therefore I propose to change the default encoding to utf-8, because this is the encoding that may current linux distributions use. I can send a patch, if required. If you want to change the encoding of a file from latin1 to utf-8 you can use recode: recode latin1..utf-8 file.txt This changes the encoding in place. Regards, Till pgpavc3q6nH51.pgp Description: PGP signature
Re: [PATCH 0/2] wistron_btns: More keymaps
19.03.2007 22:28, Dmitry Torokhov wrote/a écrit: On 3/15/07, Éric Piel <[EMAIL PROTECTED]> wrote: Ok, so let me summarize: There are two kinds of keys on those laptops (for which we are not sure about the keycode that it should generate): * Laptop screen on/off * Display output selection (for instance: laptop/external/both) The possible keycodes that we could assign to them: KEY_SCREEN KEY_MEDIA KEY_MODE KEY_VIDEO KEY_SWITCHVIDEOMODE KEY_COMPUTER KEY_PC From the discussion, I had the feeling this association would be the less incorrect: Screen on/off : KEY_SCREEN It looks like DVB folks chose to ise KEY_SCREEN and KEY_WINDOW to switch applications between full screen and windowed modes Just for info, I couldn't find any reference to KEY_WINDOW. Anyway, indeed, KEY_SCREEN is already used for "full screen" (although sometimes it's KEY_ZOOM :-/) so better not using it if something else is possible. so we'll have to invent our own keycode. KEY_DISPLAYTOGGLE anyone? What about KEY_DISPLAYONOFF ? :-) What should be its value? Would 239 be fine? Display selection : KEY_SWITCHVIDEOMODE I agree here. BTW, I'm thinking of implementing led support. However, there are two mechanisms for leds in the kernel: the "input layer" leds and the "full feature" leds. The laptops may have up to three leds: mail, wifi, bluetooth. The input layer has LED_MAIL but no wifi nor bluetooth. The led subsystem has the advantage of the very extensible "trigger" mechanism. Which of the subsystems would you recommend me to use? See you, Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] x25_forward_call(): fix NULL dereferences
From: Adrian Bunk <[EMAIL PROTECTED]> Date: Mon, 19 Mar 2007 10:24:03 +0100 > This patch fixes two NULL dereferences spotted by the Coverity checker. > > For a better understanding, the "diff -uwp" output (that ignores the > indentation changes) is: I'll apply this, thanks Adrian. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
On Mon, 2007-03-19 at 11:38 -0700, Linus Torvalds wrote: > > On Mon, 19 Mar 2007, Eric W. Biederman wrote: > > > > True. You can use all of the call clobbered registers. > > Quite often, the biggest single win of inlining is not so much the code > size (although if done right, that will be smaller too), but the fact that > inlining DOES NOT CLOBBER AS MANY REGISTERS! Thanks Linus. *This* was the reason that the current hand-coded calls only clobber % eax. It was a compromise between native (no clobbers) and others (might need a reg). Now, since we decided to allow paravirt_ops operations to be normal C (ie. the patching is optional and done late), we actually push and pop % ecx and %edx. This makes the call site 10 bytes long, which is a nice size for patching anyway (enough for a movl $0, , a-la lguest's cli, or movw $0, %gs: if we supported SMP). The current 6 paravirt ops which are patched cover the vast majority of calls (until the Xen patches, then we need ~4 more?). Jeremy chose to expand patching to cover *all* paravirt ops, rather than just the new hot ones, and that's where we tipped over the ugliness threshold. Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007, Andrew Morton wrote: > Has it been proven that quicklists are superior to simply going direct to the > page allocator for these pages? Yes. > Would it provide a superior solution if we were to a) stop zeroing out the > pte's when doing a fullmm==1 teardown and b) go direct to the page allocator > for these pages? I doubt it. The zeroing is a by product of our way of serializing pte handling. Its going to be difficult to change that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
From: Andrew Morton <[EMAIL PROTECTED]> Date: Mon, 19 Mar 2007 16:53:29 -0700 > Would it provide a superior solution if we were to a) stop zeroing out the > pte's when doing a fullmm==1 teardown and b) go direct to the page allocator > for these pages? While you could avoid zero'ing them out, you certainly can't avoid reading them into the cpu caches. And for the PGDs you have to initialize these things partially to non-zero values on x86{,_64} on every new PGD you allocate, which is a complete waste of cpu cache dirtying. Avoiding this overhead alone justifies the quicklists I think. It's not just a "zero" thing, so GFP_ZERO cannot help you here. The more I think about it the more I like the quicklists. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 1/5] Quicklists for page table pages V3
On Mon, 19 Mar 2007 15:37:16 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > This patchset introduces an arch independent framework to handle lists > of recently used page table pages to replace the existing (ab)use of the > slab for that purpose. > > 1. Proven code from the IA64 arch. Has it been proven that quicklists are superior to simply going direct to the page allocator for these pages? Would it provide a superior solution if we were to a) stop zeroing out the pte's when doing a fullmm==1 teardown and b) go direct to the page allocator for these pages? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/13] signal/timer/event fds v7 - signalfd wire up x86_64 arch ...
This patch wire the signalfd system call to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-03-19 16:03:26.0 -0700 +++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h 2007-03-19 16:41:30.0 -0700 @@ -619,8 +619,10 @@ __SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages279 __SYSCALL(__NR_move_pages, sys_move_pages) +#define __NR_signalfd 280 +__SYSCALL(__NR_signalfd, sys_signalfd) -#define __NR_syscall_max __NR_move_pages +#define __NR_syscall_max __NR_signalfd #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-19 16:03:26.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-19 16:41:30.0 -0700 @@ -714,9 +714,10 @@ .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu .quad sys_epoll_pwait + .quad sys_signalfd /* 320 */ ia32_syscall_end: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 8/13] signal/timer/event fds v7 - timerfd wire up x86_64 arch ...
This patch wire the timerfd system call to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-19 16:41:30.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-19 16:41:37.0 -0700 @@ -720,4 +720,5 @@ .quad sys_getcpu .quad sys_epoll_pwait .quad sys_signalfd /* 320 */ + .quad sys_timerfd ia32_syscall_end: Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-03-19 16:41:30.0 -0700 +++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h 2007-03-19 16:41:37.0 -0700 @@ -621,8 +621,10 @@ __SYSCALL(__NR_move_pages, sys_move_pages) #define __NR_signalfd 280 __SYSCALL(__NR_signalfd, sys_signalfd) +#define __NR_timerfd 281 +__SYSCALL(__NR_timerfd, sys_timerfd) -#define __NR_syscall_max __NR_signalfd +#define __NR_syscall_max __NR_timerfd #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 5/13] signal/timer/event fds v7 - signalfd compat code ...
This patch implement the necessary compat code for the signalfd system call. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/compat.c === --- linux-2.6.21-rc3.quilt.orig/fs/compat.c 2007-03-19 16:03:26.0 -0700 +++ linux-2.6.21-rc3.quilt/fs/compat.c 2007-03-19 16:41:32.0 -0700 @@ -46,6 +46,7 @@ #include #include #include +#include #include #include @@ -2235,3 +2236,24 @@ return sys_ni_syscall(); } #endif + +asmlinkage long compat_sys_signalfd(int ufd, + const compat_sigset_t __user *sigmask, + compat_size_t sigsetsize) +{ + compat_sigset_t ss32; + sigset_t tmp; + sigset_t __user *ksigmask; + + if (sigsetsize != sizeof(compat_sigset_t)) + return -EINVAL; + if (copy_from_user(&ss32, sigmask, sizeof(ss32))) + return -EFAULT; + sigset_from_compat(&tmp, &ss32); + ksigmask = compat_alloc_user_space(sizeof(sigset_t)); + if (copy_to_user(ksigmask, &tmp, sizeof(sigset_t))) + return -EFAULT; + + return sys_signalfd(ufd, ksigmask, sizeof(sigset_t)); +} + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 10/13] signal/timer/event fds v7 - eventfd core ...
This is a very simple and light file descriptor, that can be used as event wait/dispatch by userspace (both wait and dispatch) and by the kernel (dispatch only). It can be used instead of pipe(2) in all cases where those would simply be used to signal events. Their kernel overhead is much lower than pipes, and they do not consume two fds. When used in the kernel, it can offer an fd-bridge to enable, for example, functionalities like KAIO or syslets/threadlets to signal to an fd the completion of certain operations. But more in general, an eventfd can be used by the kernel to signal readiness, in a POSIX poll/select way, of interfaces that would otherwise be incompatible with it. The API is: int eventfd(unsigned int count); The eventfd API accepts an initial "count" parameter, and returns an eventfd fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2). The POLLIN flag is raised when the internal counter is greater than zero. The POLLOUT flag is raised when at least a value of "1" can be written to the internal counter. The POLLERR flag is raised when an overflow in the counter value is detected. The write(2) operation can never overflow the counter, since it blocks (unless O_NONBLOCK is set, in which case -EAGAIN is returned). But the eventfd_signal() function can do it, since it's supposed to not sleep during its operation. The read(2) function reads the __u64 counter value, and reset the internal value to zero. If the value read is equal to (__u64) -1, an overflow happened on the internal counter (due to 2^64 eventfd_signal() posts that has never been retired - unlickely, but possible). The write(2) call writes an __u64 count value, and adds it to the current counter. The eventfd fd supports O_NONBLOCK also. On the kernel side, we have: struct file *eventfd_fget(int fd); int eventfd_signal(struct file *file, unsigned int n); The eventfd_fget() should be called to get a struct file* from an eventfd fd (this is an fget() + check of f_op being an eventfd fops pointer). The kernel can then call eventfd_signal() every time it wants to post an event to userspace. The eventfd_signal() function can be called from any context. An eventfd() simple test and bench is available here: http://www.xmailserver.org/eventfd-bench.c This is the eventfd-based version of pipetest-4 (pipe(2) based): http://www.xmailserver.org/pipetest-4.c Not that performance matters much in the eventfd case, but eventfd-bench shows almost as double as performance than pipetest-4. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/fs/Makefile === --- linux-2.6.21-rc3.quilt.orig/fs/Makefile 2007-03-19 16:41:33.0 -0700 +++ linux-2.6.21-rc3.quilt/fs/Makefile 2007-03-19 16:41:40.0 -0700 @@ -11,7 +11,7 @@ attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \ seq_file.o xattr.o libfs.o fs-writeback.o \ pnode.o drop_caches.o splice.o sync.o utimes.o \ - stack.o anon_inodes.o signalfd.o timerfd.o + stack.o anon_inodes.o signalfd.o timerfd.o eventfd.o ifeq ($(CONFIG_BLOCK),y) obj-y += buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o Index: linux-2.6.21-rc3.quilt/include/linux/syscalls.h === --- linux-2.6.21-rc3.quilt.orig/include/linux/syscalls.h2007-03-19 16:41:33.0 -0700 +++ linux-2.6.21-rc3.quilt/include/linux/syscalls.h 2007-03-19 16:41:40.0 -0700 @@ -605,6 +605,7 @@ asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask); asmlinkage long sys_timerfd(int ufd, int clockid, int flags, const struct itimerspec __user *utmr); +asmlinkage long sys_eventfd(unsigned int count); int kernel_execve(const char *filename, char *const argv[], char *const envp[]); Index: linux-2.6.21-rc3.quilt/fs/eventfd.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.21-rc3.quilt/fs/eventfd.c 2007-03-19 16:41:40.0 -0700 @@ -0,0 +1,253 @@ +/* + * fs/eventfd.c + * + * Copyright (C) 2007 Davide Libenzi + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +struct eventfd_ctx { + spinlock_t lock; + wait_queue_head_t wqh; + __u64 count; +}; + + +static void eventfd_cleanup(struct eventfd_ctx *ctx); +static int eventfd_close(struct inode *inode, struct file *file); +static unsigned int eventfd_poll(struct file *file, poll_table *wait); +static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos); +static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count, +
[patch 11/13] signal/timer/event fds v7 - eventfd wire up i386 arch ...
This patch wire the eventfd system call to the i386 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S === --- linux-2.6.21-rc3.quilt.orig/arch/i386/kernel/syscall_table.S 2007-03-19 16:41:35.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S 2007-03-19 16:41:42.0 -0700 @@ -321,3 +321,4 @@ .long sys_epoll_pwait .long sys_signalfd /* 320 */ .long sys_timerfd + .long sys_eventfd Index: linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-i386/unistd.h 2007-03-19 16:41:35.0 -0700 +++ linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h2007-03-19 16:41:42.0 -0700 @@ -327,10 +327,11 @@ #define __NR_epoll_pwait 319 #define __NR_signalfd 320 #define __NR_timerfd 321 +#define __NR_eventfd 322 #ifdef __KERNEL__ -#define NR_syscalls 322 +#define NR_syscalls 323 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 12/13] signal/timer/event fds v7 - eventfd wire up x86_64 arch ...
This patch wire the eventfd system call to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-19 16:41:37.0 -0700 +++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-19 16:41:43.0 -0700 @@ -721,4 +721,5 @@ .quad sys_epoll_pwait .quad sys_signalfd /* 320 */ .quad sys_timerfd + .quad sys_eventfd ia32_syscall_end: Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h === --- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-03-19 16:41:37.0 -0700 +++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h 2007-03-19 16:41:43.0 -0700 @@ -623,8 +623,10 @@ __SYSCALL(__NR_signalfd, sys_signalfd) #define __NR_timerfd 281 __SYSCALL(__NR_timerfd, sys_timerfd) +#define __NR_eventfd 282 +__SYSCALL(__NR_eventfd, sys_eventfd) -#define __NR_syscall_max __NR_timerfd +#define __NR_syscall_max __NR_eventfd #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/