Re: [RFC v2] Documentation about unaligned memory access
On Nov 29, 2007 5:15 PM, Daniel Drake <[EMAIL PROTECTED]> wrote: [...] > To avoid the unaligned memory access, you would rewrite it as follows: > >void myfunc(u8 *data, u32 value) >{ >[...] >value = cpu_to_le32(value); >put_unaligned(value, data); >[...] >} > > The get_unaligned() macro works similarly. Assuming 'data' is a pointer to > memory and you wish to avoid unaligned access, its usage is as follows: > >u32 value = get_unaligned(data); > > These macros work work for memory accesses of any length (not just 32 bits as > in the examples above). Be aware that when compared to standard access of > aligned memory, using these macros to access unaligned memory can be costy in > terms of performance. > The get_unaligned call above will not do what you intended given the, at least as I read it, implied context of myfunc. Since data is a u8* it will only get one byte of data. To avoid misunderstandings the code should probably read: u32 value = get_unaligned((u32 *)data); /DM - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xfs: revert to double-buffering readdir
On Fri, Nov 30, 2007 at 12:45:05AM +0100, Christian Kujau wrote: > On Sun, 25 Nov 2007, Christoph Hellwig wrote: > >This patch does exactly that and reverts xfs_file_readdir to what's > >basically the 2.6.23 version minus the uio and vnops junk. > > Thanks, works here too (without nordirplus as a mountoption). > Am I supposed to close the bug[0] or do you guys want to leave this > open to track the Real Fix (TM) for 2.6.25? I've been giving the fix some QA - that change appears to have caused a different regression as well so I'm holding off for a little bit until we know what the cause of the other regression is before deciding whether to take this fix or back the entire change out. Either way we'll include the fix in 2.6.24 Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + proc-fix-the-threaded-proc-self.patch added to -mm tree
On Nov 29, 2007 4:40 PM, Eric W. Biederman <[EMAIL PROTECTED]> wrote: > "Albert Cahalan" <[EMAIL PROTECTED]> writes: > > > On Nov 28, 2007 6:31 AM, Eric W. Biederman <[EMAIL PROTECTED]> wrote: > >> Ingo Molnar <[EMAIL PROTECTED]> writes: > >> > * Albert Cahalan <[EMAIL PROTECTED]> wrote: > >> >> On Nov 27, 2007 7:49 PM, Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > Linux tasks when used in one particular way can fulfill the posix > requirements for single threaded processes. > > Linux task groups when used in one particular way can fulfill the > posix requirements for processes. Right. Once you leave this, weirdness happens. POSIX defines things in terms of processes and threads. POSIX defines many of our interfaces. That includes kernel behavior, the C library, and numerous programs. > As for where /proc/self points given that procps seems to read > files like /proc/self/stat. It looks to me like we have a clear > case of a user space application that cares about the current > behavior and would break if we changed things. I wasn't saying procps would break, though it would if /proc/self/task went away. I'm more concerned about multi-threaded things that look in their own /proc/self directory. The procps programs are single-threaded. In procps, the self link is used: a. to see if the wchan file exists b. to see if the task directory exists c. to find the tty number (that last one: there might not be a file descriptor for the tty, and anyway I need it with the bits in all the same places as what I get for the other processes) I'll bet that something reads /proc/self/stat to see CPU usage. > > Note that it was intended that non-legacy additions > > would normally be added to either the process directory > > or the thread directory, not both. I think somebody may > > have ripped out the ability to do this; at the very least > > there have been numerous illogical additions. > > The rationale was not conveyed and the policy you describe > seems like deprecating the /proc/ directory in favor > of the /proc//task//. Which was a pattern > never established and it doesn't seem to make anything better > so I don't see the point there. For the stuff that is logically per-task, yes. For the rest, no. Oh well... It does make things better because redundant info is a source of confusion. > >> I'm still trying to understand which will break user space more, > >> adding /proc/task or changing /proc/self. > > > > Changing /proc/self makes you get per-thread data > > when you asked for per-process data. That's bad. > > /proc/self used to ask for per task data. Which is why there > is some confusion. Heh. Well, /proc/self used to ask for per process data. It was all the same. I think it matters that /proc/self was always documented as being per-process. > >> >> This one is probably best: > >> >> /proc/task -> 123/task/456 > >> >> (with both numbers showing) > >> > > >> > this sounds good to me. If it's a symlink then there's not much other > >> > choice because the thread PIDs do not even show up under /proc anymore. > >> > >> The name sounds good to me. > > I will see about writing the patch for this in a bit and sending > it to Andrew. Nice. > Nope. /proc/mounts was a symlink to /proc/self/mounts long before > /proc/self was modified to stop pointing at the task directory and > changed it point at the new task group directory. Having the filesystem namespace be per-process is wild enough. We really don't need it to be per-thread. (and yes, I'm using the POSIX terms on purpose) > Frankly from what I have seen of the code the task-group work > seems to be a larger source of bugs, and complications, because > people have a darn hard time wrapping their head around how it > is supposed to behave, and all of the corner cases were not > resolved at the time it was developed. People look at me like I have two heads when I explain to them that the Linux kernel source uses "pid" to mean a thread. The bad terminology probably promotes bad thinking. It would be lovely if that could somehow get fixed. > My favorite ongoing issue is what is needed to allow a threaded > init to actually function properly. I think enough fixes have > gone in that it might even work. My "favorite" is the multi-threaded debugger. By this I mean the debugger itself wants to be multi-threaded, issuing ptrace commands from multiple threads. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
circular locking dependency detected
=== [ INFO: possible circular locking dependency detected ] 2.6.24-rc3 #6 --- bash/2294 is trying to acquire lock: (>j_list_lock){--..}, at: [] journal_try_to_free_buffers+0x76/0x10c but task is already holding lock: (inode_lock){--..}, at: [] drop_pagecache+0x48/0xd8 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (inode_lock){--..}: [] __lock_acquire+0xa31/0xc1a [] lock_acquire+0x7a/0x94 [] _spin_lock+0x2e/0x58 [] __mark_inode_dirty+0xd8/0x15e [] __set_page_dirty+0xfb/0x10a [] mark_buffer_dirty+0x80/0x86 [] __journal_temp_unlink_buffer+0xc1/0xc5 [] __journal_unfile_buffer+0xb/0x15 [] __journal_refile_buffer+0x3b/0x85 [] journal_commit_transaction+0xe7f/0x10ec [] kjournald+0x131/0x35f [] kthread+0x3b/0x62 [] kernel_thread_helper+0x7/0x10 [] 0x -> #0 (>j_list_lock){--..}: [] __lock_acquire+0x921/0xc1a [] lock_acquire+0x7a/0x94 [] _spin_lock+0x2e/0x58 [] journal_try_to_free_buffers+0x76/0x10c [] ext3_releasepage+0x68/0x74 [] try_to_release_page+0x33/0x44 [] __invalidate_mapping_pages+0x74/0xe0 [] drop_pagecache+0x70/0xd8 [] drop_caches_sysctl_handler+0x36/0x4e [] proc_sys_write+0x6b/0x85 [] vfs_write+0x90/0x119 [] sys_write+0x3d/0x61 [] sysenter_past_esp+0x5f/0xa5 [] 0x other info that might help us debug this: 2 locks held by bash/2294: #0: (>s_umount_key#16){}, at: [] drop_pagecache+0x38/0xd8 #1: (inode_lock){--..}, at: [] drop_pagecache+0x48/0xd8 stack backtrace: [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] print_circular_bug_tail+0x5f/0x68 [] __lock_acquire+0x921/0xc1a [] lock_acquire+0x7a/0x94 [] _spin_lock+0x2e/0x58 [] journal_try_to_free_buffers+0x76/0x10c [] ext3_releasepage+0x68/0x74 [] try_to_release_page+0x33/0x44 [] __invalidate_mapping_pages+0x74/0xe0 [] drop_pagecache+0x70/0xd8 [] drop_caches_sysctl_handler+0x36/0x4e [] proc_sys_write+0x6b/0x85 [] vfs_write+0x90/0x119 [] sys_write+0x3d/0x61 [] sysenter_past_esp+0x5f/0xa5 === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc3-git2 softlockup detected
Andrew Morton wrote: > On Thu, 29 Nov 2007 23:00:47 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > >> On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <[EMAIL PROTECTED]> wrote: >> >>> On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote: ten million is close enough to infinity for me to assume that we broke the driver and that's never going to terminate. >>> how about this? doesn't break things on my pa8800: >>> >>> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c >>> b/drivers/scsi/sym53c8xx_2/sym_hipd.c >>> index 463f119..ef01cb1 100644 >>> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c >>> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c >>> @@ -1037,10 +1037,13 @@ restart_test: >>> /* >>> * Wait 'til done (with timeout) >>> */ >>> - for (i=0; i>> + do { >>> if (INB(np, nc_istat) & (INTF|SIP|DIP)) >>> break; >>> - if (i>=SYM_SNOOP_TIMEOUT) { >>> + msleep(10); >>> + } while (i++ < SYM_SNOOP_TIMEOUT); >>> + >>> + if (i >= SYM_SNOOP_TIMEOUT) { >>> printf ("CACHE TEST FAILED: timeout.\n"); >>> return (0x20); >>> } >>> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h >>> b/drivers/scsi/sym53c8xx_2/sym_hipd.h >>> index ad07880..85c483b 100644 >>> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h >>> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h >>> @@ -339,7 +339,7 @@ >>> /* >>> * Misc. >>> */ >>> -#define SYM_SNOOP_TIMEOUT (1000) >>> +#define SYM_SNOOP_TIMEOUT (1000) >>> #define BUS_8_BIT 0 >>> #define BUS_16_BIT 1 >>> >> That might be the fix, but do we know what we're actually fixing? afaik >> 2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we >> don't know why? >> > > > > > > So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not? Yes, the 2.6.24-rc3 was Ok and this is seen from 2.6.24-rc3-git2/3/4. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xfs: revert to double-buffering readdir
Christoph Hellwig wrote: The current readdir implementation deadlocks on a btree buffers locks because nfsd calls back into ->lookup from the filldir callback. The only short-term fix for this is to revert to the old inefficient double-buffering scheme. Probably why Steve did this: :) xfs_file.c revision 1.40 date: 2001/03/15 23:33:20; author: lord; state: Exp; lines: +54 -17 modid: 2.4.x-xfs:slinx:90125a Change linvfs_readdir to allocate a buffer, call xfs to fill it, and then call the filldir function on each entry. This is instead of doing the filldir deep in the bowels of xfs which causes locking problems. Yes it looks like it is done equivalently to before (minus the uio stuff etc). I don't know what the 7fff* masking is about but we did that previously. I hadn't come across the name[] struct field before, was used to name[0] (or name[1] in times gone by) but found that is a kosher way of doing things too for the variable len string at the end. Hmmm, don't see the point of "eof" local var now. Previously bhv_vop_readdir() returned eof. I presume if we don't move the offset (offset == startoffset) then we're done and break out? So we lost eof when going to the filldir in the getdents code etc... --Tim This patch does exactly that and reverts xfs_file_readdir to what's basically the 2.6.23 version minus the uio and vnops junk. I'll try to find something more optimal for 2.6.25 or at least find a way to use the proper version for local access. Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]> Index: linux-2.6/fs/xfs/linux-2.6/xfs_file.c === --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 11:41:20.0 +0100 +++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 17:14:27.0 +0100 @@ -218,6 +218,15 @@ } #endif /* CONFIG_XFS_DMAPI */ +/* + * Unfortunately we can't just use the clean and simple readdir implementation + * below, because nfs might call back into ->lookup from the filldir callback + * and that will deadlock the low-level btree code. + * + * Hopefully we'll find a better workaround that allows to use the optimal + * version at least for local readdirs for 2.6.25. + */ +#if 0 STATIC int xfs_file_readdir( struct file *filp, @@ -249,6 +258,121 @@ return -error; return 0; } +#else + +struct hack_dirent { + int namlen; + loff_t offset; + u64 ino; + unsigned intd_type; + charname[]; +}; + +struct hack_callback { + char*dirent; + size_t len; + size_t used; +}; + +STATIC int +xfs_hack_filldir( + void*__buf, + const char *name, + int namlen, + loff_t offset, + u64 ino, + unsigned intd_type) +{ + struct hack_callback *buf = __buf; + struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used); + + if (buf->used + sizeof(struct hack_dirent) + namlen > buf->len) + return -EINVAL; + + de->namlen = namlen; + de->offset = offset; + de->ino = ino; + de->d_type = d_type; + memcpy(de->name, name, namlen); + buf->used += sizeof(struct hack_dirent) + namlen; + return 0; +} + +STATIC int +xfs_file_readdir( + struct file *filp, + void*dirent, + filldir_t filldir) +{ + struct inode*inode = filp->f_path.dentry->d_inode; + xfs_inode_t *ip = XFS_I(inode); + struct hack_callback buf; + struct hack_dirent *de; + int error; + loff_t size; + int eof = 0; + xfs_off_t start_offset, curr_offset, offset; + + /* +* Try fairly hard to get memory +*/ + buf.len = PAGE_CACHE_SIZE; + do { + buf.dirent = kmalloc(buf.len, GFP_KERNEL); + if (buf.dirent) + break; + buf.len >>= 1; + } while (buf.len >= 1024); + + if (!buf.dirent) + return -ENOMEM; + + curr_offset = filp->f_pos; + if (curr_offset == 0x7fff) + offset = 0x; + else + offset = filp->f_pos; + + while (!eof) { + int reclen; + start_offset = offset; + + buf.used = 0; + error = -xfs_readdir(ip, , buf.len, , +xfs_hack_filldir); + if (error || offset == start_offset) { + size = 0; + break; + } + + size = buf.used; + de = (struct hack_dirent *)buf.dirent; + while (size > 0) { + if (filldir(dirent, de->name, de->namlen, +
Re: [PATCH] Documentation/Changes -> Documentation/Requirements (resend without truncated comment text)
On 30-11-2007 04:32, H. Peter Anvin wrote: ... > As far as I can tell, Documentation/Changes is the only thing we have > that even attempts to document the basic requirements. This attempts > to formalize that fact. > > Documentation/Changes | 396 > > Documentation/Requirements | 394 +++ ...But, there are a few more 'things', which mention Documentation/Changes. Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: constant_tsc and TSC unstable
Paul Rolland (ポール・ロラン) wrote: Note that once TSC is disabled (it's using "jiffies" as far as I can see), ntpd constantly speeds up and slows down the clock, it jumps +/- 0.5sec every several minutes or hours - I guess that's when ntpd process gets moved from one core to another for whatever reason. And an interesting thing is that with 64bits kernel this TSC problem does not occur on this very machine. H That could make it a problem related to kernel rather than CPU. Something similar is reported on AMD X2 64 machines as well -- can't check right now. If I recall correctly, issues with AMD X2 where related to TSC being independant for each core and not constant (speed depending of C state). But the reason I raise the issue is that the Core2 reports constant TSC, so there is (IMHO) no reason for that. Well, "constant" doesn't mean "synchronized", but it might very well be that the Core2 could really benefit from synchronizing the TSCs manually like we used to. On the other hand, I notice that most of the TSC warp values are relatively close to 2^32, so this could be a specific bug. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc3-git2 softlockup detected
On Thu, 29 Nov 2007 23:00:47 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <[EMAIL PROTECTED]> wrote: > > > On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote: > > > ten million is close enough to infinity for me to assume that we broke the > > > driver and that's never going to terminate. > > > > > > > how about this? doesn't break things on my pa8800: > > > > diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c > > b/drivers/scsi/sym53c8xx_2/sym_hipd.c > > index 463f119..ef01cb1 100644 > > --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c > > +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c > > @@ -1037,10 +1037,13 @@ restart_test: > > /* > > * Wait 'til done (with timeout) > > */ > > - for (i=0; i > + do { > > if (INB(np, nc_istat) & (INTF|SIP|DIP)) > > break; > > - if (i>=SYM_SNOOP_TIMEOUT) { > > + msleep(10); > > + } while (i++ < SYM_SNOOP_TIMEOUT); > > + > > + if (i >= SYM_SNOOP_TIMEOUT) { > > printf ("CACHE TEST FAILED: timeout.\n"); > > return (0x20); > > } > > diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h > > b/drivers/scsi/sym53c8xx_2/sym_hipd.h > > index ad07880..85c483b 100644 > > --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h > > +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h > > @@ -339,7 +339,7 @@ > > /* > > * Misc. > > */ > > -#define SYM_SNOOP_TIMEOUT (1000) > > +#define SYM_SNOOP_TIMEOUT (1000) > > #define BUS_8_BIT 0 > > #define BUS_16_BIT 1 > > > > That might be the fix, but do we know what we're actually fixing? afaik > 2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we > don't know why? > So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc3-git2 softlockup detected
On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <[EMAIL PROTECTED]> wrote: > On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote: > > ten million is close enough to infinity for me to assume that we broke the > > driver and that's never going to terminate. > > > > how about this? doesn't break things on my pa8800: > > diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c > b/drivers/scsi/sym53c8xx_2/sym_hipd.c > index 463f119..ef01cb1 100644 > --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c > +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c > @@ -1037,10 +1037,13 @@ restart_test: > /* >* Wait 'til done (with timeout) >*/ > - for (i=0; i + do { > if (INB(np, nc_istat) & (INTF|SIP|DIP)) > break; > - if (i>=SYM_SNOOP_TIMEOUT) { > + msleep(10); > + } while (i++ < SYM_SNOOP_TIMEOUT); > + > + if (i >= SYM_SNOOP_TIMEOUT) { > printf ("CACHE TEST FAILED: timeout.\n"); > return (0x20); > } > diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h > b/drivers/scsi/sym53c8xx_2/sym_hipd.h > index ad07880..85c483b 100644 > --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h > +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h > @@ -339,7 +339,7 @@ > /* > * Misc. > */ > -#define SYM_SNOOP_TIMEOUT (1000) > +#define SYM_SNOOP_TIMEOUT (1000) > #define BUS_8_BIT0 > #define BUS_16_BIT 1 > That might be the fix, but do we know what we're actually fixing? afaik 2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we don't know why? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: constant_tsc and TSC unstable
Hello, On Fri, 30 Nov 2007 00:26:47 +0300 Michael Tokarev <[EMAIL PROTECTED]> wrote: > H. Peter Anvin wrote: > > Paul Rolland (ポール・ロラン) wrote: > [] > >> Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock. > >> Marking TSC unstable due to: check_tsc_sync_source failed. > [] > >> but I was wondering if this is a bug or a feature ;) > > > The problem you're having is that the TSCs of your two cores are > > completely different, over a second apart. This is a bug, unrelated to > > constant_tsc. > > A bug in where - in the CPU or in kernel? Good question ! > The thing is that all our dual-core machines shows something like > that. > > (not that huge difference as Paul reported, but still "unstable". > The same happens with 2.6.23) I've been checking my logs, and the difference is quite constant and huge : [EMAIL PROTECTED] log]# grep 'cycles TSC warp' messages* messages:Nov 26 08:27:56 tux kernel: Measured 4078687691 cycles TSC warp between C PUs, turning off TSC clock. messages:Nov 26 17:21:21 tux kernel: Measured 3978592228 cycles TSC warp between C PUs, turning off TSC clock. messages.1:Nov 18 22:52:23 tux kernel: Measured 4063102940 cycles TSC warp between CPUs, turning off TSC clock. messages.1:Nov 19 07:19:02 tux kernel: Measured 4057192061 cycles TSC warp between CPUs, turning off TSC clock. messages.1:Nov 23 20:50:12 tux kernel: Measured 4064589321 cycles TSC warp between CPUs, turning off TSC clock. messages.2:Nov 12 08:06:44 tux kernel: Measured 4072130361 cycles TSC warp between CPUs, turning off TSC clock. messages.2:Nov 13 19:42:47 tux kernel: Measured 4049899451 cycles TSC warp between CPUs, turning off TSC clock. messages.2:Nov 17 09:27:22 tux kernel: Measured 4066629060 cycles TSC warp between CPUs, turning off TSC clock. messages.3:Nov 5 08:25:08 tux kernel: Measured 4086386109 cycles TSC warp between CPUs, turning off TSC clock. messages.3:Nov 8 13:07:08 tux kernel: Measured 4041945934 cycles TSC warp between CPUs, turning off TSC clock. messages.3:Nov 9 23:31:24 tux kernel: Measured 4092303059 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Oct 29 07:28:23 tux kernel: Measured 4096946373 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Oct 31 17:07:21 tux kernel: Measured 4046765372 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Oct 31 17:15:09 tux kernel: Measured 4039328228 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Oct 31 23:19:00 tux kernel: Measured 4069714246 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Nov 1 20:33:02 tux kernel: Measured 4088199726 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Nov 2 11:53:17 tux kernel: Measured 4079927527 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Nov 3 09:37:16 tux kernel: Measured 4071112656 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Nov 3 10:51:29 tux kernel: Measured 3986266219 cycles TSC warp between CPUs, turning off TSC clock. messages.4:Nov 4 18:14:56 tux kernel: Measured 4074214144 cycles TSC warp between CPUs, turning off TSC clock. > Note that once TSC is disabled (it's using "jiffies" as far > as I can see), ntpd constantly speeds up and slows down the > clock, it jumps +/- 0.5sec every several minutes or hours - > I guess that's when ntpd process gets moved from one core > to another for whatever reason. And an interesting thing > is that with 64bits kernel this TSC problem does not occur > on this very machine. H That could make it a problem related to kernel rather than CPU. > Something similar is reported on AMD X2 64 machines as well -- > can't check right now. If I recall correctly, issues with AMD X2 where related to TSC being independant for each core and not constant (speed depending of C state). But the reason I raise the issue is that the Core2 reports constant TSC, so there is (IMHO) no reason for that. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: constant_tsc and TSC unstable
Hello, On Thu, 29 Nov 2007 15:29:49 -0800 "Pallipadi, Venkatesh" <[EMAIL PROTECTED]> wrote: > TSCs on Core 2 Duo are supposed to be in sync unless CPU supports deep idle > states like C2, C3. Can you send the full /proc/cpuinfo and full dmesg. > Sure I can... [EMAIL PROTECTED] log]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz stepping: 2 cpu MHz : 800.000 cache size : 2048 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat ps e36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmo n pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 3461.13 clflush size: 64 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz stepping: 2 cpu MHz : 800.000 cache size : 2048 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat ps e36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmo n pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 3458.02 clflush size: 64 Regards, Paul dmesg Description: Binary data
[patch 3/3] x86_64: Make the x86_32 percpu operations usable on x86_64
Relocate the x86_64 percpu variables to begin at zero. Then we can directly use the x86_32 percpu operations. x86_32 offsets %fs by __per_cpu_start. x86_64 has %gs pointing directly to the pda and the per cpu area if they start at zero. Access to the pda with the x86_64 pda operations is still possible in addition to access to the per cpu variables using x86_32 percpu operations. Hopefully this is helpful for arch integration. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86/Kconfig |5 + arch/x86/kernel/setup64.c|4 ++-- arch/x86/kernel/vmlinux_64.lds.S |1 + include/asm-x86/percpu.h | 12 +++- 4 files changed, 19 insertions(+), 3 deletions(-) Index: linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-x86/percpu.h 2007-11-29 22:13:54.806575787 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h 2007-11-29 22:21:42.383571603 -0800 @@ -17,6 +17,12 @@ #define per_cpu_offset(x) (__per_cpu_offset(x)) +#define __percpu_seg "%%gs:" + +#else + +#define __percpu_seg "" + #endif #include @@ -81,6 +87,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda); /* We can use this directly for local CPU (faster). */ DECLARE_PER_CPU(unsigned long, this_cpu_off); +#endif /* __ASSEMBLY__ */ +#endif /* !CONFIG_X86_64 */ + +#ifndef __ASSEMBLY__ + /* For arch-specific code, we can use direct single-insn ops (they * don't give an lvalue though). */ extern void __bad_percpu_size(void); @@ -138,5 +149,4 @@ extern void __bad_percpu_size(void); #define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val) #define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val) #endif /* !__ASSEMBLY__ */ -#endif /* !CONFIG_X86_64 */ #endif /* _ASM_X86_PERCPU_H_ */ Index: linux-2.6.24-rc3-mm2/arch/x86/Kconfig === --- linux-2.6.24-rc3-mm2.orig/arch/x86/Kconfig 2007-11-29 22:05:39.003576212 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/Kconfig 2007-11-29 22:12:53.942575452 -0800 @@ -123,6 +123,11 @@ config GENERIC_TIME_VSYSCALL config ARCH_SETS_UP_PER_CPU_AREA def_bool X86_64 +config PERCPU_ZERO_BASED + bool + depends on X86_64 && SMP + default y + config ZONE_DMA32 bool default X86_64 Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-29 22:12:08.962826086 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c 2007-11-29 22:12:53.942575452 -0800 @@ -111,11 +111,11 @@ void __init setup_per_cpu_areas(void) } if (!ptr) panic("Cannot allocate cpu data for CPU %d\n", i); - memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start); + memcpy(ptr, __per_cpu_load, __per_cpu_size); /* Relocate the pda */ memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda)); cpu_pda(i) = (struct x8664_pda *)ptr; - cpu_pda(i)->data_offset = ptr - __per_cpu_start; + cpu_pda(i)->data_offset = (unsigned long)ptr; } /* Fix up pda for this processor */ pda_init(0); Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/vmlinux_64.lds.S === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/vmlinux_64.lds.S 2007-11-29 22:05:38.987576338 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/vmlinux_64.lds.S 2007-11-29 22:12:53.930825752 -0800 @@ -16,6 +16,7 @@ jiffies_64 = jiffies; _proxy_pda = 1; PHDRS { text PT_LOAD FLAGS(5); /* R_E */ + percpu PT_LOAD FLAGS(4);/* R__ */ data PT_LOAD FLAGS(7); /* RWE */ user PT_LOAD FLAGS(7); /* RWE */ data.init PT_LOAD FLAGS(7); /* RWE */ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/3] X86_64: Declare pda as per cpu data thereby moving it into the cpu area
Declare the pda as a per cpu variable. This will have the effect of moving the pda data into the cpu area managed by cpu alloc. The boot_pdas are only needed in head64.c so move the declaration over there and make it static. Remove the code that allocates special pda data structures. The pda is moved to the beginning of the per cpu area. gs is pointing to the pda. And therefore gs: is now pointing to the per cpu area of the current processor. A per cpu variable can then be reached at %gs:[_cpu_ - __per_cpu_start] Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86/kernel/head64.c |6 ++ arch/x86/kernel/setup64.c | 13 ++--- arch/x86/kernel/smpboot_64.c | 16 include/asm-generic/vmlinux.lds.h |1 + include/asm-x86/pda.h |1 - include/linux/percpu.h|4 6 files changed, 21 insertions(+), 20 deletions(-) Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-28 20:59:13.124188194 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c 2007-11-28 21:08:50.473347382 -0800 @@ -30,7 +30,9 @@ cpumask_t cpu_initialized __cpuinitdata struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly; EXPORT_SYMBOL(_cpu_pda); -struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned; + +DEFINE_PER_CPU_FIRST(struct x8664_pda, pda); +EXPORT_PER_CPU_SYMBOL(pda); struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table }; @@ -109,10 +111,15 @@ void __init setup_per_cpu_areas(void) } if (!ptr) panic("Cannot allocate cpu data for CPU %d\n", i); - cpu_pda(i)->data_offset = ptr - __per_cpu_start; memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start); + /* Relocate the pda */ + memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda)); + cpu_pda(i) = (struct x8664_pda *)ptr; + cpu_pda(i)->data_offset = ptr - __per_cpu_start; } -} + /* Fix up pda for this processor */ + pda_init(0); +} void pda_init(int cpu) { Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/smpboot_64.c 2007-11-28 20:59:13.136188167 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c 2007-11-28 20:59:35.399937395 -0800 @@ -556,22 +556,6 @@ static int __cpuinit do_boot_cpu(int cpu return -1; } - /* Allocate node local memory for AP pdas */ - if (cpu_pda(cpu) == _cpu_pda[cpu]) { - struct x8664_pda *newpda, *pda; - int node = cpu_to_node(cpu); - pda = cpu_pda(cpu); - newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC, - node); - if (newpda) { - memcpy(newpda, pda, sizeof (struct x8664_pda)); - cpu_pda(cpu) = newpda; - } else - printk(KERN_ERR - "Could not allocate node local PDA for CPU %d on node %d\n", - cpu, node); - } - alternatives_smp_switch(1); c_idle.idle = get_idle_for_cpu(cpu); Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/head64.c 2007-11-28 20:59:13.152187359 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c 2007-11-28 20:59:35.403937534 -0800 @@ -22,6 +22,12 @@ #include #include +/* + * Only used before the per cpu areas are setup. The use for the non possible + * cpus continues after boot + */ +static struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned; + static void __init zap_identity_mappings(void) { pgd_t *pgd = pgd_offset_k(0UL); Index: linux-2.6.24-rc3-mm2/include/asm-x86/pda.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-x86/pda.h 2007-11-28 20:59:13.164187921 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-x86/pda.h 2007-11-28 20:59:35.403937534 -0800 @@ -39,7 +39,6 @@ struct x8664_pda { } cacheline_aligned_in_smp; extern struct x8664_pda *_cpu_pda[]; -extern struct x8664_pda boot_cpu_pda[]; extern void pda_init(int); #define cpu_pda(i) (_cpu_pda[i]) Index: linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-generic/vmlinux.lds.h 2007-11-28 20:59:13.176187886 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h 2007-11-28 20:59:35.403937534 -0800 @@ -259,6 +259,7 @@ . = ALIGN(align);
[patch 1/3] Percpu infrastructure to rebase the per cpu area to 0UL
Support an option CONFIG_PERCPU_ZERO_BASED that makes offsets for per cpu variables start at zero. If a percpu area starts at zero then 1. We do not need RELOC_HIDE anymore 2. Indexes off the per cpu area for each processor are small 3. The percpu area "addresses" are offsets and we can then have allocpercpu/cpu_alloc in the future also use these offsets so that percpu functions can take any type of percpu address if it is provided by a percpu variable or a pointer obtained via allocpercpu/cpu_alloc. The linker area boundaries variables are different for zero based percpu segments: __per_cpu_load -> The address at which the percpu area was loaded __per_cpu_size -> The length of the per cpu area Removes the &__per_cpu_x in lockdep. AFAICT The __per_cpu_x are already pointers. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/asm-generic/percpu.h |7 ++- include/asm-generic/sections.h| 10 ++ include/asm-generic/vmlinux.lds.h | 15 +++ init/main.c | 17 + kernel/lockdep.c |4 ++-- 5 files changed, 42 insertions(+), 11 deletions(-) Index: linux-2.6.24-rc3-mm2/include/asm-generic/percpu.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-generic/percpu.h 2007-11-29 22:05:58.359576450 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-generic/percpu.h 2007-11-29 22:06:22.750825804 -0800 @@ -42,8 +42,13 @@ extern unsigned long __per_cpu_offset[NR * Only S390 provides its own means of moving the pointer. */ #ifndef SHIFT_PTR +#ifdef CONFIG_PERCPU_ZERO_BASED +#define SHIFT_PTR(__p, __offset) \ + ((__typeof(__p))(((void *)(__p)) + (__offset))) +#else #define SHIFT_PTR(__p, __offset) RELOC_HIDE((__p), (__offset)) -#endif +#endif /* CONFIG_PER_CPU_ZERO_BASED */ +#endif /* SHIFT_PTR */ /* * A percpu variable may point to a discarded reghions. The following are Index: linux-2.6.24-rc3-mm2/include/asm-generic/sections.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-generic/sections.h2007-11-29 22:05:58.367576240 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-generic/sections.h 2007-11-29 22:06:22.754826440 -0800 @@ -9,7 +9,17 @@ extern char __bss_start[], __bss_stop[]; extern char __init_begin[], __init_end[]; extern char _sinittext[], _einittext[]; extern char _end[]; +#ifdef CONFIG_PERCPU_ZERO_BASED +extern char __per_cpu_load[]; +extern char per_cpu_size[]; +#define __per_cpu_size ((unsigned long)&per_cpu_size) +#define __per_cpu_start ((char *)0) +#define __per_cpu_end ((char *)__per_cpu_size) +#else extern char __per_cpu_start[], __per_cpu_end[]; +#define __per_cpu_load __per_cpu_start +#define __per_cpu_size (__per_cpu_end - __per_cpu_start) +#endif extern char __kprobes_text_start[], __kprobes_text_end[]; extern char __initdata_begin[], __initdata_end[]; extern char __start_rodata[], __end_rodata[]; Index: linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-generic/vmlinux.lds.h 2007-11-29 22:06:03.486826118 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h 2007-11-29 22:06:22.754826440 -0800 @@ -255,6 +255,20 @@ *(.initcall7.init) \ *(.initcall7s.init) +#ifdef CONFIG_PERCPU_ZERO_BASED +#define PERCPU(align) \ + . = ALIGN(align); \ + percpu : { } :percpu\ + __per_cpu_load = .; \ + .data.percpu 0 : AT(__per_cpu_load - LOAD_OFFSET) { \ + *(.data.percpu.first) \ + *(.data.percpu) \ + *(.data.percpu.shared_aligned) \ + per_cpu_size = .; \ + } \ + . = __per_cpu_load + per_cpu_size; \ + data : { } :data +#else #define PERCPU(align) \ . = ALIGN(align); \ __per_cpu_start = .;\ @@ -263,3 +277,4 @@ *(.data.percpu.shared_aligned) \ } \ __per_cpu_end = .; +#endif Index: linux-2.6.24-rc3-mm2/init/main.c === --- linux-2.6.24-rc3-mm2.orig/init/main.c 2007-11-29
[patch 0/3] Per cpu relocation to ZERO and x86_32 percpu ops on x86_64
This patchset allows the use of x86_32 percpu ops on x86_64 while maintaining %gs pointing to the pda. It does that by moving the x86_64 pda into the percpu area (thereby pointing %gs at the per cpu area) and then relocating the x86_64 per cpu variables to start at 0. Patch applies on top of the per cpu cleanup patches V2. See http://marc.info/?l=linux-kernel=119628478316525=2 Ultimately I think we can make the per cpu accessors arch independent (see the RFC at http://marc.info/?l=linux-kernel=119552126330405=2). There is a performance benefit from using these in core code. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Sample kset/ktype/kobject implementation
On Thu, Nov 29, 2007 at 05:11:35PM -0500, Alan Stern wrote: > On Thu, 29 Nov 2007, Greg KH wrote: > > > > > > kobject_put(foo) is needed since it gets you through kobject_cleanup() > > > > > where the name can be freed. > > > > > > > > No, kobject_register() should have handled that for us, right? > > > > > > kobject_register() doesn't do a kobject_put() if kobject_add() failed. > > > > Crap. If I can't get this code right in an example, the API is messed > > up. Time to take Kay seriously and start to revamp the basic kobject > > api :) > > The rule is simple enough. After calling kobject_register() you should > always use kobject_put() -- even if kobject_register() failed. Yes. > In fact, after calling kobject_init() you should use kobject_put(). > The first rule follows from this one, since kobject_register() calls > kobject_init() internally. Yes, that makes sense, time to write it all down :) thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pnpacpi : exceeded the max number of IO resources
On Fri, 30 Nov 2007 10:21:28 +0800, Zhao Yakui said: > Thanks for the acpidump & dmesg. > In the acpidump there are so many IO resource definitions in the device > of mem2 and the number exceeds the predefined number(24). On a semi-related note, I'm seeing 7 of these at each boot on a Dell Latitude D820: pnpacpi: exceeded the max number of mem resources: 12 2.6.24-rc3-mm2 does it, it didn't do it for 2.6.23-mm1. pnp-increase-the-maximum-number-of-resources.patch raised it from 4 to 12, but I don't understand why it didn't complain at 4 in 23-mm1, but it does at 12 now. pgpH0YcKmbnsZ.pgp Description: PGP signature
Re: [BUG] 2.6.24-rc3-git2 softlockup detected
On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote: > ten million is close enough to infinity for me to assume that we broke the > driver and that's never going to terminate. > how about this? doesn't break things on my pa8800: diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_hipd.c index 463f119..ef01cb1 100644 --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c @@ -1037,10 +1037,13 @@ restart_test: /* * Wait 'til done (with timeout) */ - for (i=0; i=SYM_SNOOP_TIMEOUT) { + msleep(10); + } while (i++ < SYM_SNOOP_TIMEOUT); + + if (i >= SYM_SNOOP_TIMEOUT) { printf ("CACHE TEST FAILED: timeout.\n"); return (0x20); } diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h b/drivers/scsi/sym53c8xx_2/sym_hipd.h index ad07880..85c483b 100644 --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h @@ -339,7 +339,7 @@ /* * Misc. */ -#define SYM_SNOOP_TIMEOUT (1000) +#define SYM_SNOOP_TIMEOUT (1000) #define BUS_8_BIT 0 #define BUS_16_BIT 1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Out of tree module using LSM
On Thu, 29 Nov 2007 18:34:33 EST, Jon Masters said: > > On Thu, 2007-11-29 at 21:45 +, Alan Cox wrote: > > > Jargon File in all its glory. And if you still think you could look for > > > patterns, how about executable code that self-modifies in random ways > > > but when executed as a whole actually has the functionality of fetchmail > > > embedded within it? How would you guard against that? > > > > Thats a problem for whoever writes the ESR detection tool and to what > > level it works. The question for the kernel is how do we provide a > > mechanism to allow (to some extent at least) this kind of tool to run. > > Right. I'm just saying reading a single page out of context (no pun > intended) is not going to be very useful. Fortunately for all concerned, although Alan's self-modifying code is indeed a possibility, it's much less of an issue than the sort of malware that can be found with a simple "find this 27-byte sequence, which will be found in either block 36 or 37 of the file". And I'll make the prediction that we won't see anything doing the sorts of things that Alan's program does, until that's the *easiest* way to get into a system. Until that time, they're either going to be sending simpler stuff that a scanner can easily template and find, or using other means of attacks that are outside the scope of a scanner. Remember guys - we want to think about *realistic* threat models. The e-mail virus scanners we use catch hundreds to thousands of known viruses *every day*. But I can count on the fingers of both hands the number of times I've had to deal with a *real* "0-day" in a quarter century. The scanner doesn't have to be perfect - it just has to make it hard enough to bypass to render it economically infeasible. If you're targeted by a military/govt/political/ religious group that doesn't *care* if it's economically viable, you have other, bigger problems to deal with... pgpaezS6lQXPW.pgp Description: PGP signature
Re: [PATCH] [RESEND] crypto test: use print_hex_dump from kernel.h instead
On Fri, Nov 30, 2007 at 09:20:34AM +0800, rae l wrote: > > Cc: Randy Dunlap <[EMAIL PROTECTED]> > Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> Patch applied. Thanks a lot Denis! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Sample kset/ktype/kobject implementation
On Fri, Nov 30, 2007 at 01:07:37PM +0800, Dave Young wrote: > On Nov 30, 2007 6:11 AM, Alan Stern <[EMAIL PROTECTED]> wrote: > > On Thu, 29 Nov 2007, Greg KH wrote: > > > > > > > > kobject_put(foo) is needed since it gets you through > > > > > > kobject_cleanup() > > > > > > where the name can be freed. > > > > > > > > > > No, kobject_register() should have handled that for us, right? > > > > > > > > kobject_register() doesn't do a kobject_put() if kobject_add() failed. > > > > > > Crap. If I can't get this code right in an example, the API is messed > > > up. Time to take Kay seriously and start to revamp the basic kobject > > > api :) > > > > The rule is simple enough. After calling kobject_register() you should > > always use kobject_put() -- even if kobject_register() failed. > > > > In fact, after calling kobject_init() you should use kobject_put(). > > The first rule follows from this one, since kobject_register() calls > > kobject_init() internally. > > > Hi, > The behavior is not very clear here, the root problem is that : > > 1. Should we call kobject_put so cleanup work can be done by refcount > touch zero or call kfree every time after kobject_register failed? > > 2. If kobject_put calling is true, should this be done in > kobject_register error handling codes or by hand after > kobject_register failed? > IMO,I'd rather select kobject_put due to the kobj name should also be released. After searching for kobject_register, I found one leaks as this issue in pktcdvd. Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- drivers/block/pktcdvd.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff -upr linux/drivers/block/pktcdvd.c linux.new/drivers/block/pktcdvd.c --- linux/drivers/block/pktcdvd.c 2007-11-30 13:13:44.0 +0800 +++ linux.new/drivers/block/pktcdvd.c 2007-11-30 13:24:08.0 +0800 @@ -117,8 +117,10 @@ static struct pktcdvd_kobj* pkt_kobj_cre p->kobj.parent = parent; p->kobj.ktype = ktype; p->pd = pd; - if (kobject_register(>kobj) != 0) + if (kobject_register(>kobj) != 0) { + kobject_put(>kobj); return NULL; + } return p; } /* > Regards > dave > > Alan Stern > > > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] keyspan: init termios properly
On Mon, Nov 26, 2007 at 02:18:52PM -0800, Andrew Morton wrote: > On Sun, 18 Nov 2007 14:11:30 +0100 > Borislav Petkov <[EMAIL PROTECTED]> wrote: > > > On Thu, Nov 15, 2007 at 01:10:16PM -0800, Lucy McCoy wrote: ... > > yes, after testing this i can confirm that this one fixes the NULL ptr > > problem here so you might want to submit a proper patch to Greg. > > I'll merge revert-keyspan-init-termios-properly.patch soon, but afaik we > are still awaiting the real fix for this problem? Hi Andrew, sorry for the late reply - i was away from the country and couldn't read mail. Yes, we are still awaiting the real fix afaik but the code fragment above removes the NULL ptr deref so we should at least merge that. Will prepare a patch for this later today... -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc3-mm2 soft lockup while running tbench
Andrew Morton wrote: > On Wed, 28 Nov 2007 20:03:22 +0530 > Kamalesh Babulal <[EMAIL PROTECTED]> wrote: > >> Hi Andrew, >> >> while running tbench on the powerpc with 2.6.24-rc3-mm2 softlock up occurs >> >> BUG: soft lockup - CPU#0 stuck for 11s! [tbench:12183] >> NIP: c00ac978 LR: c00acff0 CTR: c005c648 >> REGS: C0076F0F3200 TRAP: 0901 Not tainted (2.6.24-rc3-mm2-autotest) >> MSR: 80009032 CR: 44000482 XER: >> TASK = C0076F4BC000[12183] 'tbench' THREAD: C0076F0F CPU: 0 >> NIP [c00ac978] .get_page_from_freelist+0x1cc/0x754 >> LR [c00acff0] .__alloc_pages+0xb0/0x3a8 >> Call Trace: >> [c0076f0f3480] [c0076f0f3560] 0xc0076f0f3560 (unreliable) >> [c0076f0f3590] [c00acff0] .__alloc_pages+0xb0/0x3a8 >> [c0076f0f3680] [c00ce2e4] .alloc_pages_current+0xa8/0xc8 >> [c0076f0f3710] [c00ac6ec] .__get_free_pages+0x20/0x70 >> [c0076f0f3790] [c00d75c8] .__kmalloc_node_track_caller+0x60/0x148 >> [c0076f0f3840] [c02c22b0] .__alloc_skb+0x98/0x184 >> [c0076f0f38f0] [c0306cd8] .tcp_sendmsg+0x1fc/0xe24 >> [c0076f0f3a10] [c02b963c] .sock_sendmsg+0xe4/0x128 >> [c0076f0f3c10] [c02ba4ec] .sys_sendto+0xd4/0x120 >> [c0076f0f3d90] [c02df2f8] .compat_sys_socketcall+0x148/0x214 >> [c0076f0f3e30] [c000872c] syscall_exit+0x0/0x40 >> Instruction dump: >> 720b0001 eb97 40820070 7202 4182000c e8bc 4818 72080004 >> 4182000c e8bc0008 4808 e8bc0010 7f83e378 7de407b4 7e078378 >> > > hm. Beats me. Does the machine recover OK? > - Hi Andrew, In the set of test cases ran serially, the softlockup in seen in tbench, then the remaining test cases get to run successfully after the softlockup. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Sample kset/ktype/kobject implementation
On Nov 30, 2007 6:11 AM, Alan Stern <[EMAIL PROTECTED]> wrote: > On Thu, 29 Nov 2007, Greg KH wrote: > > > > > > kobject_put(foo) is needed since it gets you through kobject_cleanup() > > > > > where the name can be freed. > > > > > > > > No, kobject_register() should have handled that for us, right? > > > > > > kobject_register() doesn't do a kobject_put() if kobject_add() failed. > > > > Crap. If I can't get this code right in an example, the API is messed > > up. Time to take Kay seriously and start to revamp the basic kobject > > api :) > > The rule is simple enough. After calling kobject_register() you should > always use kobject_put() -- even if kobject_register() failed. > > In fact, after calling kobject_init() you should use kobject_put(). > The first rule follows from this one, since kobject_register() calls > kobject_init() internally. > Hi, The behavior is not very clear here, the root problem is that : 1. Should we call kobject_put so cleanup work can be done by refcount touch zero or call kfree every time after kobject_register failed? 2. If kobject_put calling is true, should this be done in kobject_register error handling codes or by hand after kobject_register failed? Regards dave > Alan Stern > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] [BUG] USB_PERSIST
On 11/29/07, Alan Stern <[EMAIL PROTECTED]> wrote: > On Thu, 29 Nov 2007, Raymano Garibaldi wrote: > > > The feature does work as long as the device remains plugged in and > > that is what I have said in my previous postings too. What I'm saying > > that should work and worked under 2.6.21 and is not working currently > > is the ability to unplug and plug back in the device while the > > computer is suspended before resuming without losing the mount. > > Okay, guess I misunderstood what you wrote before. > > The patch below for 2.6.23 should do what you want (and more besides). > It forces the USB Persist feature to apply to all persist-enabled > devices, whether they were unplugged or not. > > There's no chance of this getting accepted into the official kernel in > such a simple form, but at least it will allow you to do what you want. > > Alan Stern > > > --- 2.6.23/drivers/usb/core/driver.c1 2007-11-29 10:57:36.0 -0500 > +++ 2.6.23/drivers/usb/core/driver.c2007-11-29 11:01:44.0 -0500 > @@ -1550,6 +1550,9 @@ > if (!(udev->reset_resume && udev->do_remote_wakeup)) > return -EPERM; > } > + > + /* Force all system resumes to be reset-resumes */ > + udev->reset_resume = 1; > return usb_external_resume_device(udev); > } > > > Alan, Thank you! Thank you! Thank you! Who'd have thought such a simple patch could make someone so happy? That did the trick. I just tried it and it works beautifully whether the device remains plugged in during suspend or if it's unplugged and plugged back in during suspend and before resume. Now if this could only become the default behavior ;-) Thanks again, Raymano G. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield: delete sysctl_sched_compat_yield
On Fri, 2007-11-30 at 14:29 +1100, Nick Piggin wrote: > On Friday 30 November 2007 14:15, Zhang, Yanmin wrote: > > On Fri, 2007-11-30 at 13:46 +1100, Nick Piggin wrote: > > > On Wednesday 28 November 2007 09:57, Arjan van de Ven wrote: > > > > > sounds like a bad idea; volanomark (well, technically the jvm behind > > > > it) is abusing sched_yield() by assuming it does something it really > > > > doesn't do, and as it happens some of the earlier 2.6 schedulers > > > > accidentally happened to behave in a way that was nice for this > > > > benchmark. > > > > > > OK, why is this still happening? Haven't we been asking JVMs to use > > > futexes or posix locking for years and years now? Are there any sane > > > jvms that _don't_ use yield? > > > > I think it's an issue of volanomark (a kind of java application) instead of > > JVM. > > volanomark itself and not the jvm is calling sched_yield()? Do we have > any non-toy threaded java apps? (what's JAVA in the kernel-perf tests?) I run lots of well-known benchmarks and volanoMark is the one who gets the largest impact from sched_yield. As for real-applications which use sched_yield, mostly, they are not open sources. Yesterday, I got to know someone was using sched_yield in his network C programs, but he didn't want to share the sources with me. > > > > > > Todays kernel has a different behavior somewhat (and before people > > > > scream "regression"; sched_yield() behavior isn't really specified and > > > > doesn't make any sense at all, whatever you get is what you get > > > > it's pretty much an insane defacto behavior that is incredibly tied to > > > > which decisions the scheduler makes how, and no app can depend on that > > > > > > It is a performance regression. Is there any reason *not* to use the > > > "compat" yield by default? > > > > There is no, so I suggest to set sched_compat_yield=1 by default. > > If sched_compat_yield=0, kernel almost does nothing but returns. When > > sched_compat_yield=1, it is closer to the meaning of sched_yield man page. > > sched_yield() is really only defined for posix realtime scheduling > AFAIK, which talks about priority lists. > > SCHED_OTHER is defined to be a single priority, below the rest of the > realtime priorities. So at first you *might* say that the process > should then be made to run only after all other SCHED_OTHER processes, > however there is no such ordering requirement for SCHED_OTHER > scheduling. The SCHED_OTHER scheduler can run any task at any time. > > That said, I think people would *expect* that call be much closer to > the compat behaviour than the current default. And that's definitely > what Linux has done in the past. So there really does need to be a > good reason to change it like this IMO. That's indeed what I am thinking. I am running many testing(SPECjbb/SPECjbb2005/cpu2000/iozone/dbench/tbench...) to see if there is any regression if sched_compat_yield=1. I think there is no regression and the testing is just to double-check. > > > > > As you say, for SCHED_OTHER tasks, yield > > > can do almost anything. We may as well do something that isn't a > > > regression... > > > > I just found SCHED_OTHER in man sched_setscheduler. Is it SCHED_NORMAL in > > the latest kernel? > > Yes, SCHED_NORMAL is SCHED_OTHER. Don't know why it got renamed... Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid overflows in kernel/time.c
Arjan van de Ven wrote: Anyway, I don't think compiling bc is hard on anything which has a C compiler. alternative is to just also ship the precomputed values ;-) Oh, come on... it's not like bc is some obscure thing. It's a POSIX utility. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid overflows in kernel/time.c
On Thu, 29 Nov 2007 19:04:36 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > Chris Snook wrote: > > H. Peter Anvin wrote: > >> NOTE: This patch uses a bc(1) script to compute the appropriate > >> constants. > > > > Perhaps dc would be more appropriate? That's included in busybox. > > > > Perhaps it would, but I think there is more variability between dc > implementations -- consider if the busybox version is broken, for > eample. > > Either way, how many people compile their kernels in a busybox > environment? > > Anyway, I don't think compiling bc is hard on anything which has a C > compiler. alternative is to just also ship the precomputed values ;-) > > -hpa > - > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RFC - organize include/linux/kernel.h, add include/linux/logging.h
2.6.25 material. kernel.h has become a bit disorganized over a long time. Here's an attempt to clean it up a bit. Something for everyone to like or dislike... Groups externs and functions by module/function Creates a "logging.h" for printk, KERN_ Changes some macros to statement expressions DIV_ROUND_UP, roundup and __ALIGN_MASK Removes the unused PTR_ALIGN Conforms to coding style and 80 columns Passes checkpatch but for coding style defects in checkpatch statement expressions don't need a space between "; and })" "do {} whiles" between "; and }" include/linux/kernel.h | 458 +-- include/linux/logging.h | 154 These files used macros to declare array elements. Statement expressions can't be used for that, so these now use direct calculations instead. include/linux/bitops.h |2 +- lib/radix-tree.c|5 +- This one used the ALIGN macro, but I'm not inclined to figure out what it actually does right now, so copy the old macro to this file and renames it. include/net/neighbour.h |5 +- diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 94bc996..2783ed9 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -1,403 +1,273 @@ #ifndef _LINUX_KERNEL_H #define _LINUX_KERNEL_H /* * 'kernel.h' contains some often-used function prototypes etc */ #ifdef __KERNEL__ #include #include #include #include #include #include #include -#include +#include #include -extern const char linux_banner[]; -extern const char linux_proc_banner[]; - +/* could be in an include linux/limits.h */ #define INT_MAX((int)(~0U>>1)) #define INT_MIN(-INT_MAX - 1) #define UINT_MAX (~0U) #define LONG_MAX ((long)(~0UL>>1)) #define LONG_MIN (-LONG_MAX - 1) #define ULONG_MAX (~0UL) #define LLONG_MAX ((long long)(~0ULL>>1)) #define LLONG_MIN (-LLONG_MAX - 1) #define ULLONG_MAX (~0ULL) -#define STACK_MAGIC0xdeadbeef +/* useful macros */ +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) +#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) -#define ALIGN(x,a) __ALIGN_MASK(x,(typeof(x))(a)-1) -#define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask)) -#define PTR_ALIGN(p, a)((typeof(p))ALIGN((unsigned long)(p), (a))) -#define IS_ALIGNED(x,a)(((x) % ((typeof(x))(a))) == 0) +/* + * Check at compile time that something is of a particular type. + * Always evaluates to 1 so you may use it easily in comparisons. + */ +#define typecheck(type, x) \ + ({type _dummy; typeof(x) _dummy2; (void)(&_dummy == &_dummy2); 1;}) -#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) +/* + * Check at compile time that 'function' is a certain type, or is a pointer + * to that type (needs to use typedef for the function type.) + */ +#define typecheck_fn(type, function) \ + ({typeof(type) _x = function; (void)_x;}) -#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) -#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) -#define roundup(x, y) x) + ((y) - 1)) / (y)) * (y)) +/** + * container_of - cast a member of a structure out to the containing structure + * @ptr: the pointer to the member. + * @type: the type of the container struct this is embedded in. + * @member:the name of the member within the struct. + * + */ +#define container_of(ptr, type, member) ({ \ + const typeof(((type *)0)->member) *__mptr = (ptr); \ + (type *)((char *)__mptr - offsetof(type, member));}) -#ifdef CONFIG_LBD -# include -# define sector_div(a, b) do_div(a, b) -#else -# define sector_div(n, b)( \ -{ \ - int _res; \ - _res = (n) % (b); \ - (n) /= (b); \ - _res; \ -} \ -) -#endif +/* + * min()/max() macros that also do strict type-checking.. + * See the "unnecessary" pointer comparison. + */ +#define min(x, y) ({ \ + typeof(x) _x = (x); \ + typeof(y) _y = (y); \ + (void)(&_x == &_y); \ + _x < _y ? _x : _y;}) + +#define max(x, y) ({ \ + typeof(x) _x = (x); \ + typeof(y) _y = (y); \ + (void)(&_x == &_y); \ + _x > _y ? _x : _y;}) + +/* + * ..and if you can't take the strict + * types, you can specify one yourself. + * + * Or not use min/max at all, of course. + */ +#define min_t(type, x, y) \ + ({type _x = (x); type _y = (y); _x < _y ? _x: _y;}) + +#define max_t(type, x, y) \ + ({type _x = (x); type _y = (y); _x > _y ? _x: _y;}) + +#define abs(x) ({int _x = (x); (_x < 0) ? -_x : _x;}) /** * upper_32_bits - return bits 32-63 of a number * @n: the number we're accessing * * A basic shift-right of a 64- or 32-bit quantity. Use this to suppress * the "right shift count >= width of type" warning when that
[PATCH] Documentation/Changes -> Documentation/Requirements (resend without truncated comment text)
Change Documentation/Changes to Documentation/Requirements, and at least begin to separate the runtime requirements from the kernel compilation requirements. There are definitely kernel compilation requirements that are not listed in this file. It would be good to get them uncovered. This document is obviously woefully incomplete, for one thing it has absolutely no per-architecture information, except "may depend on the CPU in your system." Hopefully this will encourage people to document those per-architecture requirements. Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> --- As far as I can tell, Documentation/Changes is the only thing we have that even attempts to document the basic requirements. This attempts to formalize that fact. Documentation/Changes | 396 Documentation/Requirements | 394 +++ 2 files changed, 394 insertions(+), 396 deletions(-) delete mode 100644 Documentation/Changes create mode 100644 Documentation/Requirements diff --git a/Documentation/Changes b/Documentation/Changes deleted file mode 100644 index cb2b141..000 --- a/Documentation/Changes +++ /dev/null @@ -1,396 +0,0 @@ -Intro -= - -This document is designed to provide a list of the minimum levels of -software necessary to run the 2.6 kernels, as well as provide brief -instructions regarding any other "Gotchas" users may encounter when -trying life on the Bleeding Edge. If upgrading from a pre-2.4.x -kernel, please consult the Changes file included with 2.4.x kernels for -additional information; most of that information will not be repeated -here. Basically, this document assumes that your system is already -functional and running at least 2.4.x kernels. - -This document is originally based on my "Changes" file for 2.0.x kernels -and therefore owes credit to the same people as that file (Jared Mauch, -Axel Boldt, Alessandro Sigala, and countless other users all over the -'net). - -Current Minimal Requirements - - -Upgrade to at *least* these software revisions before thinking you've -encountered a bug! If you're unsure what version you're currently -running, the suggested command should tell you. - -Again, keep in mind that this list assumes you are already -functionally running a Linux 2.4 kernel. Also, not all tools are -necessary on all systems; obviously, if you don't have any ISDN -hardware, for example, you probably needn't concern yourself with -isdn4k-utils. - -o Gnu C 3.2 # gcc --version -o Gnu make 3.79.1 # make --version -o binutils 2.12# ld -v -o util-linux 2.10o # fdformat --version -o module-init-tools 0.9.10 # depmod -V -o e2fsprogs 1.29# tune2fs -o jfsutils 1.1.3 # fsck.jfs -V -o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs -o xfsprogs 2.6.0 # xfs_db -V -o pcmciautils004 # pccardctl -V -o quota-tools3.09# quota -V -o PPP2.4.0 # pppd --version -o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version -o nfs-utils 1.0.5 # showmount --version -o procps 3.2.0 # ps --version -o oprofile 0.9 # oprofiled --version -o udev 081 # udevinfo -V -o grub 0.93# grub --version - -Kernel compilation -== - -GCC - -The gcc version requirements may vary depending on the type of CPU in your -computer. - -Make - - -You will need Gnu make 3.79.1 or later to build the kernel. - -Binutils - - -Linux on IA-32 has recently switched from using as86 to using gas for -assembling the 16-bit boot code, removing the need for as86 to compile -your kernel. This change does, however, mean that you need a recent -release of binutils. - -System utilities - - -Architectural changes -- - -DevFS has been obsoleted in favour of udev -(http://www.kernel.org/pub/linux/utils/kernel/hotplug/) - -32-bit UID support is now in place. Have fun! - -Linux documentation for functions is transitioning to inline -documentation via specially-formatted comments near their -definitions in the source. These comments can be combined with the -SGML templates in the Documentation/DocBook directory to make DocBook -files, which can then be converted by DocBook stylesheets to PostScript, -HTML, PDF files, and several other formats. In order to convert from -DocBook format to a format of your choice, you'll need to install Jade as -well as the desired DocBook
Re: sched_yield: delete sysctl_sched_compat_yield
On Friday 30 November 2007 14:15, Zhang, Yanmin wrote: > On Fri, 2007-11-30 at 13:46 +1100, Nick Piggin wrote: > > On Wednesday 28 November 2007 09:57, Arjan van de Ven wrote: > > > sounds like a bad idea; volanomark (well, technically the jvm behind > > > it) is abusing sched_yield() by assuming it does something it really > > > doesn't do, and as it happens some of the earlier 2.6 schedulers > > > accidentally happened to behave in a way that was nice for this > > > benchmark. > > > > OK, why is this still happening? Haven't we been asking JVMs to use > > futexes or posix locking for years and years now? Are there any sane > > jvms that _don't_ use yield? > > I think it's an issue of volanomark (a kind of java application) instead of > JVM. volanomark itself and not the jvm is calling sched_yield()? Do we have any non-toy threaded java apps? (what's JAVA in the kernel-perf tests?) > > > Todays kernel has a different behavior somewhat (and before people > > > scream "regression"; sched_yield() behavior isn't really specified and > > > doesn't make any sense at all, whatever you get is what you get > > > it's pretty much an insane defacto behavior that is incredibly tied to > > > which decisions the scheduler makes how, and no app can depend on that > > > > It is a performance regression. Is there any reason *not* to use the > > "compat" yield by default? > > There is no, so I suggest to set sched_compat_yield=1 by default. > If sched_compat_yield=0, kernel almost does nothing but returns. When > sched_compat_yield=1, it is closer to the meaning of sched_yield man page. sched_yield() is really only defined for posix realtime scheduling AFAIK, which talks about priority lists. SCHED_OTHER is defined to be a single priority, below the rest of the realtime priorities. So at first you *might* say that the process should then be made to run only after all other SCHED_OTHER processes, however there is no such ordering requirement for SCHED_OTHER scheduling. The SCHED_OTHER scheduler can run any task at any time. That said, I think people would *expect* that call be much closer to the compat behaviour than the current default. And that's definitely what Linux has done in the past. So there really does need to be a good reason to change it like this IMO. > > As you say, for SCHED_OTHER tasks, yield > > can do almost anything. We may as well do something that isn't a > > regression... > > I just found SCHED_OTHER in man sched_setscheduler. Is it SCHED_NORMAL in > the latest kernel? Yes, SCHED_NORMAL is SCHED_OTHER. Don't know why it got renamed... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Documentation/Changes -> Documentation/Requirements
Change Documentation/Changes to Documentation/Requirements, and at least begin to separate the runtime requirements from the kernel compilation requirements. There are definitely kernel compilation requirements that are not listed in this file. It would be good to get them uncovered. This document is obviously woefully incomplete, for one thing it has absolutely no per-architecture information, except "may depend on the CPU in your system." Hopefully this will encourage people to --- As far as I can tell, Documentation/Changes is the only thing we have that even attempts to document the basic requirements. This attempts to formalize that fact. Documentation/Changes | 396 Documentation/Requirements | 394 +++ 2 files changed, 394 insertions(+), 396 deletions(-) delete mode 100644 Documentation/Changes create mode 100644 Documentation/Requirements diff --git a/Documentation/Changes b/Documentation/Changes deleted file mode 100644 index cb2b141..000 --- a/Documentation/Changes +++ /dev/null @@ -1,396 +0,0 @@ -Intro -= - -This document is designed to provide a list of the minimum levels of -software necessary to run the 2.6 kernels, as well as provide brief -instructions regarding any other "Gotchas" users may encounter when -trying life on the Bleeding Edge. If upgrading from a pre-2.4.x -kernel, please consult the Changes file included with 2.4.x kernels for -additional information; most of that information will not be repeated -here. Basically, this document assumes that your system is already -functional and running at least 2.4.x kernels. - -This document is originally based on my "Changes" file for 2.0.x kernels -and therefore owes credit to the same people as that file (Jared Mauch, -Axel Boldt, Alessandro Sigala, and countless other users all over the -'net). - -Current Minimal Requirements - - -Upgrade to at *least* these software revisions before thinking you've -encountered a bug! If you're unsure what version you're currently -running, the suggested command should tell you. - -Again, keep in mind that this list assumes you are already -functionally running a Linux 2.4 kernel. Also, not all tools are -necessary on all systems; obviously, if you don't have any ISDN -hardware, for example, you probably needn't concern yourself with -isdn4k-utils. - -o Gnu C 3.2 # gcc --version -o Gnu make 3.79.1 # make --version -o binutils 2.12# ld -v -o util-linux 2.10o # fdformat --version -o module-init-tools 0.9.10 # depmod -V -o e2fsprogs 1.29# tune2fs -o jfsutils 1.1.3 # fsck.jfs -V -o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs -o xfsprogs 2.6.0 # xfs_db -V -o pcmciautils004 # pccardctl -V -o quota-tools3.09# quota -V -o PPP2.4.0 # pppd --version -o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version -o nfs-utils 1.0.5 # showmount --version -o procps 3.2.0 # ps --version -o oprofile 0.9 # oprofiled --version -o udev 081 # udevinfo -V -o grub 0.93# grub --version - -Kernel compilation -== - -GCC - -The gcc version requirements may vary depending on the type of CPU in your -computer. - -Make - - -You will need Gnu make 3.79.1 or later to build the kernel. - -Binutils - - -Linux on IA-32 has recently switched from using as86 to using gas for -assembling the 16-bit boot code, removing the need for as86 to compile -your kernel. This change does, however, mean that you need a recent -release of binutils. - -System utilities - - -Architectural changes -- - -DevFS has been obsoleted in favour of udev -(http://www.kernel.org/pub/linux/utils/kernel/hotplug/) - -32-bit UID support is now in place. Have fun! - -Linux documentation for functions is transitioning to inline -documentation via specially-formatted comments near their -definitions in the source. These comments can be combined with the -SGML templates in the Documentation/DocBook directory to make DocBook -files, which can then be converted by DocBook stylesheets to PostScript, -HTML, PDF files, and several other formats. In order to convert from -DocBook format to a format of your choice, you'll need to install Jade as -well as the desired DocBook stylesheets. - -Util-linux --- - -New versions of util-linux provide *fdisk support for
Re: sched_yield: delete sysctl_sched_compat_yield
On Fri, 2007-11-30 at 13:46 +1100, Nick Piggin wrote: > On Wednesday 28 November 2007 09:57, Arjan van de Ven wrote: > > On Tue, 27 Nov 2007 17:33:05 +0800 > > > > "Zhang, Yanmin" <[EMAIL PROTECTED]> wrote: > > > If echo "1">/proc/sys/kernel/sched_compat_yield before starting > > > volanoMark testing, the result is very good with kernel 2.6.24-rc3 on > > > my 16-core tigerton. > > > > > > 1) If /proc/sys/kernel/sched_compat_yield=1, comparing with 2.6.22, > > > 2.6.24-rc3 has more than 70% improvement; > > > 2) If /proc/sys/kernel/sched_compat_yield=0, comparing with 2.6.22, > > > 2.6.24-rc3 has more than 80% regression; > > > > > > On other machines, the volanoMark result also has much improvement if > > > /proc/sys/kernel/sched_compat_yield=1. > > > > > > Would you like to change function yield_task_fair to delete codes > > > around sysctl_sched_compat_yield, or just initiate it to 1? > > > > sounds like a bad idea; volanomark (well, technically the jvm behind > > it) is abusing sched_yield() by assuming it does something it really > > doesn't do, and as it happens some of the earlier 2.6 schedulers > > accidentally happened to behave in a way that was nice for this > > benchmark. > > OK, why is this still happening? Haven't we been asking JVMs to use > futexes or posix locking for years and years now? Are there any sane > jvms that _don't_ use yield? I think it's an issue of volanomark (a kind of java application) instead of JVM. > > > > Todays kernel has a different behavior somewhat (and before people > > scream "regression"; sched_yield() behavior isn't really specified and > > doesn't make any sense at all, whatever you get is what you get > > it's pretty much an insane defacto behavior that is incredibly tied to > > which decisions the scheduler makes how, and no app can depend on that > > It is a performance regression. Is there any reason *not* to use the > "compat" yield by default? There is no, so I suggest to set sched_compat_yield=1 by default. If sched_compat_yield=0, kernel almost does nothing but returns. When sched_compat_yield=1, it is closer to the meaning of sched_yield man page. > As you say, for SCHED_OTHER tasks, yield > can do almost anything. We may as well do something that isn't a > regression... I just found SCHED_OTHER in man sched_setscheduler. Is it SCHED_NORMAL in the latest kernel? > > > > in any way. In fact, I've proposed to make sched_yield() just do an > > msleep(1)... that'd be closer to what sched_yield is supposed to do > > standard wise than any of the current behaviors ;_ > > What makes you say that? IIRC of all the things that sched_yeild can > do, it is not allowed to block. So this is about the only thing that > will break the standard... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What can we do to get ready for memory controller merge in 2.6.25
Nick Piggin wrote: > On Friday 30 November 2007 01:43, Balbir Singh wrote: >> They say better strike when the iron is hot. >> >> Since we have so many people discussing the memory controller, I would >> like to access the readiness of the memory controller for mainline >> merge. Given that we have some time until the merge window, I'd like to >> set aside some time (from my other work items) to work on the memory >> controller, fix review comments and defects. >> >> In the past, we've received several useful comments from Rik Van Riel, >> Lee Schermerhorn, Peter Zijlstra, Hugh Dickins, Nick Piggin, Paul Menage >> and code contributions and bug fixes from Hugh Dickins, Pavel Emelianov, >> Lee Schermerhorn, YAMAMOTO-San, Andrew Morton and KAMEZAWA-San. I >> apologize if I missed out any other names or contributions >> >> At the VM-Summit we decided to try the current double LRU approach for >> memory control. At this juncture in the space-time continuum, I seek >> your support, feedback, comments and help to move the memory controller > > Do you have any test cases, performance numbers, etc.? And also some > results or even anecdotes of where this is going to be used would be > interesting... > Some test results were posted at http://lkml.org/lkml/2007/8/17/69 http://lkml.org/lkml/2007/8/19/36 http://lwn.net/Articles/242554/ Some results for the RSS controller can be found in the OLS paper https://ols2006.108.redhat.com/2007/Reprints/singh-Reprint.pdf and at http://lkml.org/lkml/2007/5/18/1 As far as test cases are concerned, I have a simple test case that I use that allocates memory and touches all the allocated memory in a loop. I can post that out if required. It uses various types of allocation 1. mmaped memory 2. anonymous memory 3. shared memory I also run various benchmarks inside a control group, limited to 400 MB of RAM. One interesting that I noticed was that when I booted with mem= and created a container with the same . The swapout test case ran much faster in the container (NOTE: This was prior to the swap cache changes). KAMEZAWA-San posted some test results on background reclaim and per zone reclaim http://forum.openvz.org/index.php?t=tree=4696=23964&== The simplest use cases that come to mind are 1. Memory control for containers/virtualization 2. Job Isolation -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid overflows in kernel/time.c
Chris Snook wrote: H. Peter Anvin wrote: NOTE: This patch uses a bc(1) script to compute the appropriate constants. Perhaps dc would be more appropriate? That's included in busybox. Perhaps it would, but I think there is more variability between dc implementations -- consider if the busybox version is broken, for eample. Either way, how many people compile their kernels in a busybox environment? Anyway, I don't think compiling bc is hard on anything which has a C compiler. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield: delete sysctl_sched_compat_yield
On Friday 30 November 2007 13:51, Arjan van de Ven wrote: > On Fri, 30 Nov 2007 13:46:22 +1100 > > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Todays kernel has a different behavior somewhat (and before people > > > scream "regression"; sched_yield() behavior isn't really specified > > > and doesn't make any sense at all, whatever you get is what you > > > get it's pretty much an insane defacto behavior that is > > > incredibly tied to which decisions the scheduler makes how, and no > > > app can depend on that > > > > It is a performance regression. Is there any reason *not* to use the > > "compat" yield by default? As you say, for SCHED_OTHER tasks, yield > > can do almost anything. We may as well do something that isn't a > > regression.. > > it just makes OTHER tests/benchmarks regress this is one of those > things where you just can't win. OK, which ones? Because java is slightly important... > > > in any way. In fact, I've proposed to make sched_yield() just do an > > > msleep(1)... that'd be closer to what sched_yield is supposed to do > > > standard wise than any of the current behaviors ;_ > > > > What makes you say that? IIRC of all the things that sched_yeild can > > do, it is not allowed to block. So this is about the only thing that > > will break the standard... > > sched_yield OF COURSE can block.. it's a schedule call after all! In unix, blocking ~= removed from runqueue, no? OF COURSE it is allowed to cooperatively schedule another task, but I don't see why you think it should so obviously be allowed to block / sleep. It breaks the basically only invariant of sched_yeild in that the task will no longer run when there is nothing else running. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid overflows in kernel/time.c
Andrew Morton wrote: NOTE: This patch uses a bc(1) script to compute the appropriate constants. Does this add the first dependency upon the availability of bc? I believe it does. I used bc because doing it C would have required arbitrary-precision code or have added a dependency on libgmp. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)
On Thu, 29 Nov 2007, Linus Torvalds wrote: > Heh. It definitely hasn't gotten lost by "the git software". No, it sure hasn't. In fact it was staring me right in the face and I didn't realize it. > In fact, with > the kinds of hints you already gave, git makes it really _trivial_ to find > it. > > Here's what you do: > > git log v2.6.23.. --author=Wilcox > > and then just search for "scan_mutex", in the hope that Matthew wrote a > nice commit message. And yes, he did, so in less than a blink you get: > > commit 6b7f123f378743d739377871c0cbfbaf28c7d25a > Author: Matthew Wilcox <[EMAIL PROTECTED]> > Date: Tue Jun 26 15:18:51 2007 -0600 > > [SCSI] Fix async scanning double-add problems > > Stress-testing and some thought has revealed some places where > asynchronous scanning needs some more attention to locking. > >- Since async_scan is a bit, we need to hold the host_lock while > modifying it to prevent races against other CPUs modifying the > word > that bit is in. This is probably a theoretical race for the > moment, > but other patches may change that. >- The async_scan bit means not only that this host is being scanned > asynchronously, but that all the devices attached to this host > are not > yet added to sysfs. So we must ensure that this bit is always > in sync. > I've chosen to do this with the scan_mutex since it's already > acquired > in most of the right places. > ... > > which I assume is the commit you're talking about. Yep, that's the one. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
Ben Woodard <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> Vivek Goyal <[EMAIL PROTECTED]> writes: >> >>> Ok. Got it. So in this case we route the interrupts directly through LAPIC >>> and put LVT0 in ExtInt mode and IOAPIC is bypassed. >>> >>> I am looking at Intel Multiprocessor specification v1.4 and as per figure >>> 3-3 on page 3-9, 8259 is connected to LINTIN0 line, which in turn is >>> connected to LINTIN0 pin on all processors. If that is the case, even in >>> this mode, all the CPU should see the timer interrupts (which is coming >>> from 8259)? >> >> However things are implemented completely differently now. I don't think >> the coherent hypertransport domain of AMD processors actually routes >> ExtINT interrupts to all cpus but instead one (the default route?) is >> picked. >> >> So I think for the kdump case we pretty much need to use an IOAPIC >> in virtual wire mode for recent AMD systems. >> >> For current Intel systems I believe either scenario still works. >> >>> Can you print the LAPIC registers (print_local_APIC) during normal boot >>> and during kdump boot and paste here? >> >> It's worth a look. >> >> I still think we need to just use apic mode at kernel startup, and >> be done with it. >> > > Neil whipped up a patch to try this and evidently it worked on his test boxes > but it didn't work very well on our problem tests box. It hung after the > kernel > printed "Ready". i.e. on a normal boot I get: Interesting can you please try an early_printk console. I expect you made it a fair ways and it just didn't show up because you didn't get as far as the normal serial port setup. You don't have any output from your linux kernel. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield: delete sysctl_sched_compat_yield
On Fri, 30 Nov 2007 13:46:22 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > > Todays kernel has a different behavior somewhat (and before people > > scream "regression"; sched_yield() behavior isn't really specified > > and doesn't make any sense at all, whatever you get is what you > > get it's pretty much an insane defacto behavior that is > > incredibly tied to which decisions the scheduler makes how, and no > > app can depend on that > > It is a performance regression. Is there any reason *not* to use the > "compat" yield by default? As you say, for SCHED_OTHER tasks, yield > can do almost anything. We may as well do something that isn't a > regression.. it just makes OTHER tests/benchmarks regress this is one of those things where you just can't win. > > > > in any way. In fact, I've proposed to make sched_yield() just do an > > msleep(1)... that'd be closer to what sched_yield is supposed to do > > standard wise than any of the current behaviors ;_ > > What makes you say that? IIRC of all the things that sched_yeild can > do, it is not allowed to block. So this is about the only thing that > will break the standard... sched_yield OF COURSE can block.. it's a schedule call after all! -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched_yield: delete sysctl_sched_compat_yield
On Wednesday 28 November 2007 09:57, Arjan van de Ven wrote: > On Tue, 27 Nov 2007 17:33:05 +0800 > > "Zhang, Yanmin" <[EMAIL PROTECTED]> wrote: > > If echo "1">/proc/sys/kernel/sched_compat_yield before starting > > volanoMark testing, the result is very good with kernel 2.6.24-rc3 on > > my 16-core tigerton. > > > > 1) If /proc/sys/kernel/sched_compat_yield=1, comparing with 2.6.22, > > 2.6.24-rc3 has more than 70% improvement; > > 2) If /proc/sys/kernel/sched_compat_yield=0, comparing with 2.6.22, > > 2.6.24-rc3 has more than 80% regression; > > > > On other machines, the volanoMark result also has much improvement if > > /proc/sys/kernel/sched_compat_yield=1. > > > > Would you like to change function yield_task_fair to delete codes > > around sysctl_sched_compat_yield, or just initiate it to 1? > > sounds like a bad idea; volanomark (well, technically the jvm behind > it) is abusing sched_yield() by assuming it does something it really > doesn't do, and as it happens some of the earlier 2.6 schedulers > accidentally happened to behave in a way that was nice for this > benchmark. OK, why is this still happening? Haven't we been asking JVMs to use futexes or posix locking for years and years now? Are there any sane jvms that _don't_ use yield? > Todays kernel has a different behavior somewhat (and before people > scream "regression"; sched_yield() behavior isn't really specified and > doesn't make any sense at all, whatever you get is what you get > it's pretty much an insane defacto behavior that is incredibly tied to > which decisions the scheduler makes how, and no app can depend on that It is a performance regression. Is there any reason *not* to use the "compat" yield by default? As you say, for SCHED_OTHER tasks, yield can do almost anything. We may as well do something that isn't a regression... > in any way. In fact, I've proposed to make sched_yield() just do an > msleep(1)... that'd be closer to what sched_yield is supposed to do > standard wise than any of the current behaviors ;_ What makes you say that? IIRC of all the things that sched_yeild can do, it is not allowed to block. So this is about the only thing that will break the standard... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix kmem_cache_free performance regression in slab
On Thu, 29 Nov 2007 12:05:13 -0700 Matthew Wilcox <[EMAIL PROTECTED]> wrote: > The database performance group have found that half the cycles spent > in kmem_cache_free are spent in this one call to BUG_ON. Moving it > into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a > performance win of almost 0.5% on their particular benchmark. > > The call was added as part of commit ddc2e812d592457747c4367fb73edcaa8e1e49ff > with the comment that "overhead should be minimal". It may have been > minimal at the time, but it isn't now. > It is worth noting that the offending commit hit mainline in June 2006. It takes a very long time for some performance regressions to be discovered. By which time it is effectively too late to fix it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Trailing periods in kernel messages
On Fri, 2007-11-30 at 09:54 +0800, Li Zefan wrote: > So it doesn't deserve the effort to eliminate these periods, isn't it? I hope these will eventually disappear. > Or we can add a check to checkpatch.pl to prevent new ones. Perhaps that's a good idea. diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index cbb4258..707f84c 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -1390,6 +1390,10 @@ sub process { if ($line =~ /\*\s*\)\s*k[czm]alloc\b/) { WARN("unnecessary cast may hide bugs, see http://c-faq.com/malloc/mallocnocast.html\n; . $herecurr); } + + if ($rawline =~ /(print|pr_(emerg|alert|crit|err|warning|notice|info|debug)).*\.\\n\"/) { + WARN("unnecessary period before newline\n" . $herecurr); + } } if ($chk_patch && !$is_patch) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pnpacpi : exceeded the max number of IO resources
On Fri, 2007-11-30 at 03:18 +0100, Rene Herman wrote: > On 29-11-07 10:11, Dave Young wrote: > > > The pnpacpi rsparser.c report warnings of: > > exceeded the max number of IO resources: 24 > > > > dmesg|grep exceeded|wc > > 66 5943564 > > Heavens... (added CCs of people who just upped it from 8 -- I suppose the > problem is not new then?) Properly we should make a bit bigger till Thomas's patch is ready. Thomas, your patch isn't 2.6.24 staff, right? Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Something similar to inotify in 2.4.
On 29-11-07 18:09, Vitaliy Ivanov wrote: Can anyone advice whether there is something similar to inotify in 2.4 kernel? inotify is 2.6 (dnotify 2.4). Rene - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thursday 29 November 2007 10:36:06 Christoph Lameter wrote: > The code becomes much simpler if gs would point to the beginning of the > per cpu area and if the __per_cpu_offset[i] would do the same. No weird > __per_cpu_start offsetting anymore. It is a little weird, but it gave flexibility for most archs. ISTR I had issues relocating the percpu area to 0, but I look forward to your code! > The generic write/readpercpu functionality introduced by the cpu_alloc > patchset works best with offsets relative to an arch dependent > register. All per cpu data (pda, percpu and allocpercpu) is handles as an > offset relative to the start of the per cpu data. Hmm, did someone cc me on the patchset and I missed it? > If the current offset by __per_cpu_start is kept then a per cpu allocator > may have to dish out addresses that go beyond __per_cpu_end. Of course; you just need congruence in your allocation across CPUs. It's possible, but no worse than the requirements on other schemes where you can reach a variable with a single addition for the CPU. > I think dealing with a per cpu variable as if it would be an offset > relative to a base is natural for the typical addressing of cpus based on > an offset relative to some register. We've had practical problems getting the compiler to eke out the potential benefit. That's why we settled for an offset between where the compiler expected and where the variable actually was. Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pnpacpi : exceeded the max number of IO resources
On 29-11-07 10:11, Dave Young wrote: The pnpacpi rsparser.c report warnings of: exceeded the max number of IO resources: 24 dmesg|grep exceeded|wc 66 5943564 Heavens... (added CCs of people who just upped it from 8 -- I suppose the problem is not new then?) Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Friday 30 November 2007 03:53:34 Arjan van de Ven wrote: > On Mon, 26 Nov 2007 10:25:33 -0800 > > > Agreed. On first glance, I was intrigued but: > > > > 1) Why is everyone so concerned that export symbol space is large? > > - does it cost cpu or running memory? > > yes. about 120 bytes per symbol But this patch makes that worse, not better. > > - does it cause bugs? > > yes, bad apis are causing bugs... sys_open is just the starter of that. Sure, but this doesn't change the APIs, either. We seem to have fixed sys_open the right way, and since we're not supposed to care about out-of-tree modules... Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
Eric W. Biederman wrote: Vivek Goyal <[EMAIL PROTECTED]> writes: Ok. Got it. So in this case we route the interrupts directly through LAPIC and put LVT0 in ExtInt mode and IOAPIC is bypassed. I am looking at Intel Multiprocessor specification v1.4 and as per figure 3-3 on page 3-9, 8259 is connected to LINTIN0 line, which in turn is connected to LINTIN0 pin on all processors. If that is the case, even in this mode, all the CPU should see the timer interrupts (which is coming from 8259)? However things are implemented completely differently now. I don't think the coherent hypertransport domain of AMD processors actually routes ExtINT interrupts to all cpus but instead one (the default route?) is picked. So I think for the kdump case we pretty much need to use an IOAPIC in virtual wire mode for recent AMD systems. For current Intel systems I believe either scenario still works. Can you print the LAPIC registers (print_local_APIC) during normal boot and during kdump boot and paste here? It's worth a look. I still think we need to just use apic mode at kernel startup, and be done with it. Neil whipped up a patch to try this and evidently it worked on his test boxes but it didn't work very well on our problem tests box. It hung after the kernel printed "Ready". i.e. on a normal boot I get: 2007-11-29 13:48:29 Loading vmlinuz-2.6.18-13chaos.ben.test 2007-11-29 13:48:29 Loading initrd-2.6.18-13chaos.ben.test. .. 2007-11-29 13:48:29 Ready. 2007-11-29 13:48:30 Linux version 2.6.18-13chaos.ben.test ([EMAIL PROTECTED]) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14 )) #10 SMP Thu Nov 29 13:11:49 PST 2007 2007-11-29 13:48:30 Command line: initrd=initrd-2.6.18-13chaos.ben.test loglevel=8 console=ttyS0,115200n8 [EMAIL PROTECTED] elevator=deadline swiotlb=65536 selinux=0 apic=debug BOOT_IMAGE=vmlinuz-2.6.18-13chaos.ben.test BOOTIF= 01-00-30-48-57-91-56 With Neil's patch: 2007-11-29 17:12:55 PXELINUX 2.11 2004-08-16 Copyright (C) 1994-2004 H. Peter Anvin 2007-11-29 17:12:55 Boot options [default: 2.6.18-54.el5.bz336371]: 2007-11-29 17:12:55 linux-2.6.18-13chaos.ben.test-2.6.18-54.el5.bz336371 2007-11-29 17:12:55 linux 2007-11-29 17:12:55 linux-2.6.18-54.el5.bz336371 2007-11-29 17:12:55 linux-2.6.18-52.el5 2007-11-29 17:12:55 linux-2.6.18-13chaos.ben.test-2.6.18-13chaos.ben.test 2007-11-29 17:12:55 linux-2.6.23-0.214.rc8.git2.fc8 2007-11-29 17:12:55 linux-2.6.18-8.1.14.el5 2007-11-29 17:12:55 linux-2.6.18-7chaos 2007-11-29 17:12:55 boot: 2007-11-29 17:13:02 Loading vmlinuz-2.6.18-13chaos.ben.test 2007-11-29 17:13:02 Loading initrd-2.6.18-13chaos.ben.test. .. 2007-11-29 17:13:02 Ready. (END) That's all she wrote. End of story. Had to reboot to another kernel to make get it back. Neil's patch: --- linux-2.6.18.noarch/arch/x86_64/kernel/i8259.c.orig 2007-11-28 18:00:31.0 -0500 +++ linux-2.6.18.noarch/arch/x86_64/kernel/i8259.c 2007-11-29 10:37:14.0 -0500 @@ -599,4 +599,30 @@ if (!acpi_ioapic) setup_irq(2, ); + + /* + * Switch from PIC to APIC mode. + */ +connect_bsp_APIC(); +setup_local_APIC(); + +if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) { +panic("Boot APIC ID in local APIC unexpected (%d vs %d)", + GET_APIC_ID(apic_read(APIC_ID)), boot_cpu_id); +/* Or can we switch back to PIC here? */ +} + +/* + * Now start the IO-APICs + */ +if (!skip_ioapic_setup && nr_ioapics) +setup_IO_APIC(); +else +nr_ioapics = 0; + + /* +* Disable local irqs here so start_kernel doesn't complain +*/ + local_irq_disable(); + } --- linux-2.6.18.noarch/arch/x86_64/kernel/smpboot.c.orig 2007-11-28 18:07:33.0 -0500 +++ linux-2.6.18.noarch/arch/x86_64/kernel/smpboot.c2007-11-29 10:37:59.0 -0500 @@ -1088,26 +1088,6 @@ /* -* Switch from PIC to APIC mode. -*/ - connect_bsp_APIC(); - setup_local_APIC(); - - if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) { - panic("Boot APIC ID in local APIC unexpected (%d vs %d)", - GET_APIC_ID(apic_read(APIC_ID)), boot_cpu_id); - /* Or can we switch back to PIC here? */ - } - - /* -* Now start the IO-APICs -*/ - if (!skip_ioapic_setup && nr_ioapics) - setup_IO_APIC(); - else - nr_ioapics = 0; - - /* * Set up local APIC timer on boot CPU. */ Eric ___
Re: What can we do to get ready for memory controller merge in 2.6.25
On Friday 30 November 2007 01:43, Balbir Singh wrote: > They say better strike when the iron is hot. > > Since we have so many people discussing the memory controller, I would > like to access the readiness of the memory controller for mainline > merge. Given that we have some time until the merge window, I'd like to > set aside some time (from my other work items) to work on the memory > controller, fix review comments and defects. > > In the past, we've received several useful comments from Rik Van Riel, > Lee Schermerhorn, Peter Zijlstra, Hugh Dickins, Nick Piggin, Paul Menage > and code contributions and bug fixes from Hugh Dickins, Pavel Emelianov, > Lee Schermerhorn, YAMAMOTO-San, Andrew Morton and KAMEZAWA-San. I > apologize if I missed out any other names or contributions > > At the VM-Summit we decided to try the current double LRU approach for > memory control. At this juncture in the space-time continuum, I seek > your support, feedback, comments and help to move the memory controller Do you have any test cases, performance numbers, etc.? And also some results or even anecdotes of where this is going to be used would be interesting... Thanks, Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
Vivek Goyal wrote: On Wed, Nov 28, 2007 at 11:02:06AM -0500, Neil Horman wrote: On Wed, Nov 28, 2007 at 10:36:49AM -0500, Vivek Goyal wrote: On Tue, Nov 27, 2007 at 03:24:35PM -0800, Ben Woodard wrote: Andi Kleen wrote: Are we putting the system back in PIC mode or virtual wire mode? I have not seen systems which support PIC mode. All latest systems seems to be having virtual wire mode. I think in case of PIC mode, interrupts Yes it's probably virtual wire. For real PIC mode we would need really old systems without APIC. can be delivered to cpu0 only. In virt wire mode, one can program IOAPIC to deliver interrupt to any of the cpus and that's what we have been The code doesn't try to program anything specific, it just restores the state that was left over originally by the BIOS. So if the BIOS originally left the IOAPIC in a state where the timer interrupts were only going to CPU0 then by restoring that state we could be bringing this problem upon ourselves when we restore that state. Hi Ben, Apart from restoring the original state (Bring APICS back to virtual wire mode), we also reprogram IOAPIC so that timer interrupt can go to crashing cpu (and not necessarily cpu0). Look at following code in disable_IO_APIC. entry.dest.physical.physical_dest = GET_APIC_ID(apic_read(APIC_ID)); Here we read the apic id of crashing cpu and program IOAPIC accordingly. This will make sure that even in virtual wire mode, timer interrupts will be delivered to crashing cpu APIC. Yes, but according to Bens last debug effort, the APIC printout regarding the timer setup, indicates that ioapic_i8259.pin == -1, meaning that the 8259 is not routed through the ioapic. In those cases, disable_IO_APIC does not take us through the path you reference above, and does not revert to virtual wire mode. Instead, it simply disables legacy vector 0, which if I understand this correctly, simply tells the ioapic to not handle timer interrupts, trusting that the 8259 in the system will deliver that interrupt where it needs to be. If the 8259 is wired to deliver timer interrupts to cpu0 only, then you get the problem that we have, do you? Ok. Got it. So in this case we route the interrupts directly through LAPIC and put LVT0 in ExtInt mode and IOAPIC is bypassed. I am looking at Intel Multiprocessor specification v1.4 and as per figure 3-3 on page 3-9, 8259 is connected to LINTIN0 line, which in turn is connected to LINTIN0 pin on all processors. If that is the case, even in this mode, all the CPU should see the timer interrupts (which is coming from 8259)? Can you print the LAPIC registers (print_local_APIC) during normal boot and during kdump boot and paste here? Here are the ones from a normal bootup. I was unable to get info from a kdump boot. I haven't figured out why yet. With the same patch that I used to capture this, when I tried to kdump the kernel, it paused a second or two after the backtrace and then dropped to BIOS and came up normally. Here is a little trick, at the point where we are trying to get the info to print out, the kernel command line hasn't been completely parsed yet. That tricked me for part of the day. I had apic=debug on the command line but the logic in print_local_APIC saw the default value because the kernel command line had yet to be parsed. 2007-11-29 17:58:07 ***Here is the info you requested 2007-11-29 17:58:07 2007-11-29 17:58:07 printing local APIC contents on CPU#0/0: 2007-11-29 17:58:07 ... APIC ID: (0) 2007-11-29 17:58:07 ... APIC VERSION: 80050010 2007-11-29 17:58:07 ... APIC TASKPRI: (00) 2007-11-29 17:58:07 ... APIC ARBPRI: (00) 2007-11-29 17:58:07 ... APIC PROCPRI: 2007-11-29 17:58:07 ... APIC EOI: 2007-11-29 17:58:07 ... APIC RRR: 0002 2007-11-29 17:58:07 ... APIC LDR: 2007-11-29 17:58:07 ... APIC DFR: 2007-11-29 17:58:07 ... APIC SPIV: 010f 2007-11-29 17:58:07 ... APIC ISR field: 2007-11-29 17:58:07 ... APIC TMR field: 2007-11-29 17:58:07 ... APIC IRR field: 2007-11-29 17:58:07 ... APIC ESR: 2007-11-29 17:58:07 ... APIC ICR: 4630 2007-11-29 17:58:07 ... APIC ICR2: 0700 2007-11-29 17:58:07 ... APIC LVTT: 0001 2007-11-29 17:58:07 ... APIC LVTPC: 0001 2007-11-29 17:58:07 ... APIC LVT0: 0700 2007-11-29 17:58:07 ... APIC LVT1: 0400 2007-11-29 17:58:07 ... APIC LVTERR: 0001000f 2007-11-29 17:58:07 ... APIC TMICT: 8000 2007-11-29 17:58:07 ... APIC TMCCT: 2007-11-29 17:58:07 ... APIC TDCR: 2007-11-29 17:58:07 2007-11-29 17:58:07 number of MP IRQ sources: 15. 2007-11-29 17:58:07 number of IO-APIC #8 registers: 0. 2007-11-29 17:58:07 number of IO-APIC #9 registers: 0. 2007-11-29 17:58:07 number of IO-APIC #10 registers: 0. 2007-11-29 17:58:07 testing the IO APIC... 2007-11-29 17:58:07 2007-11-29 17:58:07 IO APIC #8.. 2007-11-29 17:58:07 register #00:
Re: kondemand: kernel BUG at kernel/workqueue.c:258!
On Thu, 29 Nov 2007 13:47:34 -0800 "Pallipadi, Venkatesh" <[EMAIL PROTECTED]> wrote: > > > >-Original Message- > >From: Jiri Slaby [mailto:[EMAIL PROTECTED] > >Sent: Thursday, November 29, 2007 1:43 PM > >To: Pallipadi, Venkatesh; Nakajima, Jun > >Cc: Linux kernel mailing list > >Subject: kondemand: kernel BUG at kernel/workqueue.c:258! > > > >Hi, > > > >while trying to evoke another bug by endlessly change > >governors, this appeared: > >kernel BUG at .../kernel/workqueue.c:258! > >invalid opcode: [1] PREEMPT SMP > >CPU 0 > >Modules linked in: iwl3945 mac80211 cfg80211 tun > >cpufreq_userspace rfcomm > >l2cap hci_usb bluetooth kvm_intel arc4 ecb blkcipher kvm cryptomgr > >crypto_algapi acpi_cpufreq fglrx(P) asus_laptop sr_mod cdrom ehci_hcd > >uhci_hcd battery > >Pid: 443, comm: kondemand/0 Tainted: P2.6.23 #38 > > Kernel version? on the same line as the tainted flag and 2 below the binary module that is in use I assume Jiri is now working on reproducing this untainted ... ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid overflows in kernel/time.c
H. Peter Anvin wrote: NOTE: This patch uses a bc(1) script to compute the appropriate constants. Perhaps dc would be more appropriate? That's included in busybox. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Trailing periods in kernel messages
Joe Perches wrote: > On Fri, 2007-11-30 at 09:12 +0800, Li Zefan wrote: >> Just a roughly grep: >> # grep -r -P --include=*.[ch] 'printk.*\.\\n' * | wc -l >> 6025 >> # grep -r -P --include=*.[ch] '\.\\n' * | wc -l >> 12723 > > Inequivalent. > > Try: > grep -rP --include=*.[ch] 'printk.*\.\\n' * | wc -l > and > grep -rp --include=*.[ch] 'printk.*[^\.]\\n' * | wc -l > > 6k/38k > My 2nd grep finds out how many strings are terminated with '.'. Those strings may finally pass to prink(). So it doesn't deserve the effort to eliminate these periods, isn't it? Or we can add a check to checkpatch.pl to prevent new ones. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid overflows in kernel/time.c
On Thu, 29 Nov 2007 16:19:51 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > When the conversion factor between jiffies and milli- or microseconds > is not a single multiply or divide, as for the case of HZ == 300, we > currently do a multiply followed by a divide. The intervening > result, however, is subject to overflows, especially since the > fraction is not simplified (for HZ == 300, we multiply by 300 and > divide by 1000). > > This is exposed to the user when passing a large timeout to poll(), > for example. > > This patch replaces the multiply-divide with a reciprocal > multiplication on 32-bit platforms. When the input is an unsigned > long, there is no portable way to do this on 64-bit platforms there is > no portable way to do this since it requires a 128-bit intermediate > result (which gcc does support on 64-bit platforms but may generate > libgcc calls, e.g. on 64-bit s390), but since the output is a 32-bit > integer in the cases affected, just simplify the multiply-divide > (*3/10 instead of *300/1000). > > The reciprocal multiply used can have off-by-one errors in the upper > half of the valid output range. This could be avoided at the expense > of having to deal with a potential 65-bit intermediate result. Since > the intent is to avoid overflow problems and most of the other time > conversions are only semiexact, the off-by-one errors were considered > an acceptable tradeoff. > > NOTE: This patch uses a bc(1) script to compute the appropriate > constants. Does this add the first dependency upon the availability of bc? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4, v3] Physical PCI slot objects
Hi Kenji-san, * Kenji Kaneshige <[EMAIL PROTECTED]>: > > Hi Gary, Kenji-san, et. al, > > > > * Gary Hade <[EMAIL PROTECTED]>: > >> Alex, What I was trying to suggest is a boot-time kernel > >> option, not a kernel configuration option. The basic idea is > >> to give the user (with a single binary kernel) the ability to > >> include your ACPI-PCI slot driver feature changes only when > >> they are really needed. In addition to reducing the number of > >> system/PCI hotplug driver combinations where your changes would > >> need to be validated, I believe would also help alleviate other > >> worries (e.g. Andi Kleen's memory consumption concern). I > >> believe this goal could also be achieved with the kernel config > >> option by making the pci_slot module runtime loadable with the > >> PCI hotplug drivers only visiting your new code when the > >> pci_slot driver is loaded, although I think this would be more > >> difficult to implement. > > > > I have modified my patch series so that the final patch that > > introduces my ACPI-PCI slot driver is a full-fledged module, that > > has a tristate Kconfig option. > > > > Thank you for your good job. Thanks for testing. :) > I tested shpchp and pciehp both with and without pci_slot > module. There seems no regression from shpchp and pciehp's > point of view. (I had a little concern about the hotplug > slots' name that vary depending on whether pci_slot > functionality is enabled or disabled. But, now that we can > build pci_slot driver as a kernel module, I don't think it is a > big problem). Hm, you are right. On my machine, if I load pciehp first and acpiphp second (even without loading pci_slot), I will see the following: [EMAIL PROTECTED] slots]# ls 0016_0006 0197_0005 10 3 4 7 8 9 [EMAIL PROTECTED] slots]# lsmod | grep pci_slot [EMAIL PROTECTED] slots]# lsmod | grep hp acpiphp 115984 0 pciehp140616 0 pci_hotplug 123972 2 acpiphp,pciehp On the other hand, if I do load pci_slot first, and then pciehp, you are right, I will see something like this: [EMAIL PROTECTED] slots]# ls 1 10 2 3 4 5 6 7 8 9 [EMAIL PROTECTED] slots]# lsmod | grep pci_slot pci_slot 74436 0 [EMAIL PROTECTED] slots]# lsmod | grep hp pciehp140616 0 pci_hotplug 123972 1 pciehp But I do agree, people don't need to load pci_slot at all if they don't want it, and they won't be bothered. > Only the problems is that I got Call Traces with the following > error messages when pci_slot driver was loaded, and one strange > slot named '1023' was registered (other slots are fine). This > is the same problem I reported before. > > sysfs: duplicate filename '1023' can not be created > WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() > > kobject_add failed for 1023 with -EEXIST, don't try to > register things with the same name in the same directory. > > On my system, hotplug slots themselves can be added, removed > and replaced with the ohter type of I/O box. The ACPI firmware > tells OS the presence of those slots using _STA method (That > is, it doesn't use 'LoadTable()' AML operator). On the other > hand, current pci_slot driver doesn't check _STA. As a result, > pci_slot driver tryied to register the invalid (non-existing) > slots. The ACPI firmware of my system returns '1023' if the > invalid slot's _SUN is evaluated. This is the cause of Call > Traces mentioned above. To fix this problem, pci_slot driver > need to check _STA when scanning ACPI Namespace. Now this is very curious. The relevant line in pci_slot is: check_slot() status = acpi_evaluate_integer(handle, "_SUN", NULL, sun); if (ACPI_FAILURE(status)) return -1; Why does your firmware return the error information inside sun, instead of returning an error in status? That doesn't seem right to me... > I'm sorry for reporting this so late. I'm attaching the patch > to fix the problem. This is against 2.6.24-rc3 with your > patches applied. Could you try it? Applying this patch causes me to only detect populated slots in my system, which isn't what I want -- otherwise, I could have just enumerated the PCI bus and found the devices that way. :) Maybe on your machine, checking existence of _STA might do the right thing, but I don't think we should actually be looking at any of the actual bits returned. If we check ACPI_STA_DEVICE_PRESENT, then we will not detect empty slots on my system. Can you try this patch to see if at least the first call to acpi_evaluate_integer helps? If that doesn't help, maybe the second block will help you, but it breaks my machine... Thanks. /ac diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c index 724f4f0..63a4dc8 100644 --- a/drivers/acpi/pci_slot.c +++ b/drivers/acpi/pci_slot.c @@ -55,9 +65,21 @@ static struct acpi_pci_driver acpi_pci_slot_driver = { static int check_slot(acpi_handle handle, int *device, unsigned long
Re: Out of tree module using LSM
On Thu, Nov 29, 2007 at 03:12:38PM -0700, Justin Banks wrote: > It's not perfect, but as was recently pointed out, if you can only get > 98% of the way there rather than 100% is that a reason for not trying to > make it possible? BTW, that's a fine example of a common fallacy: "$FOO is 98% of the way to $TARGET" does not allow to interpolate the properties of $TARGET to those of $FOO. Telling that a condom is a 98% approximation to platonic ideal of such is not particulary useful, especially if it turns out that what this number really means is that there's a hole on its tip covering 2% of surface... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Markers Implementation for RCU Tracing
On Fri, Nov 30, 2007 at 12:11:28AM +0530, K. Prasad wrote: > Hi, > Please review the ensuing set of patches which convert the > existing RCU tracing mechanism for Preempt RCU and RCU Boost into > markers. > > These patches are based upon the 2.6.24-rc2-rt1 kernel tree. > > Along with marker transition, the RCU Tracing infrastructure has also > been modularised to be built as a kernel module, thereby enabling > runtime changes to the RCU Tracing infrastructure. > > Patch [1/2] - Patch that converts the Preempt RCU tracing in > rcupreempt.c into markers. > > Patch [1/2] - Patch that converts the Preempt RCU Boost tracing in > rcupreempt-boost.c into markers. Looks good to me, though I do not pretend to understand the markers implementation. I presume that the markers implementation forces the varargs usage -- though the markers do seem quite a bit nicer in allowing the formatting to be specified more naturally. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/1] Writeback fix for concurrent large and small file writes
On Thu, Nov 29, 2007 at 12:16:36PM -0800, Michael Rubin wrote: > Due to my faux pas of top posting (see > http://www.zip.com.au/~akpm/linux/patches/stuff/top-posting.txt) I am > resending this email. > > On Nov 28, 2007 4:34 PM, Fengguang Wu <[EMAIL PROTECTED]> wrote: > > Could you demonstrate the situation? Or if I guess it right, could it > > be fixed by the following patch? (not a nack: If so, your patch could > > also be considered as a general purpose improvement, instead of a bug > > fix.) > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > > index 0fca820..62e62e2 100644 > > --- a/fs/fs-writeback.c > > +++ b/fs/fs-writeback.c > > @@ -301,7 +301,7 @@ __sync_single_inode(struct inode *inode, struct > > writeback_control *wbc) > > * Someone redirtied the inode while were writing > > back > > * the pages. > > */ > > - redirty_tail(inode); > > + requeue_io(inode); > > } else if (atomic_read(>i_count)) { > > /* > > * The inode is clean, inuse > > > > By testing the situation I can confirm that the one line patch above > fixes the problem. > > I will continue testing some other cases to see if it cause any other > issues but I don't expect it to. One major concern could be whether a continuous writer dirting pages at the 'right' pace will generate a steady flow of write I/Os which are _tiny_hence_inefficient_. I have gathered some timing info about writeback speed in http://lkml.org/lkml/2007/10/4/468. For ext3, it takes wb_kupdate() ~15ms to submit 4MB. Whereas one disk I/O typically takes ~5ms. So if there are too many tiny write I/Os, they will simply get delayed and merged into bigger ones. So it's not a problem in *theory* :-) > I will post this change for 2.6.24 and list Feng as author. If that's > ok with Feng. Thank you. > As for the original patch I will resubmit it for 2.6.25 as a general > purpose improvement. There are some discussions and patches on inode number based writeback clustering which you may want to reference/compare with: http://lkml.org/lkml/2007/8/21/396 http://lkml.org/lkml/2007/8/27/45 Cheers, Fengguang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Trailing periods in kernel messages
On Fri, 2007-11-30 at 09:12 +0800, Li Zefan wrote: > Just a roughly grep: > # grep -r -P --include=*.[ch] 'printk.*\.\\n' * | wc -l > 6025 > # grep -r -P --include=*.[ch] '\.\\n' * | wc -l > 12723 Inequivalent. Try: grep -rP --include=*.[ch] 'printk.*\.\\n' * | wc -l and grep -rp --include=*.[ch] 'printk.*[^\.]\\n' * | wc -l 6k/38k cheers, Joe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RESEND] crypto test: use print_hex_dump from kernel.h instead
On Nov 29, 2007 7:13 PM, Herbert Xu <[EMAIL PROTECTED]> wrote: ... > > uninlining this function shrinks crypto/tcrypt.o's .text from 20,009 bytes > > down to 19,701. > > > > inlining is almost always wrong. > > I agree. Please do as Andrew suggests and resubmit. inline disabled. Cc: Randy Dunlap <[EMAIL PROTECTED]> Signed-off-by: Denis Cheng <[EMAIL PROTECTED]> --- diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index 24141fb..13efc72 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -83,10 +83,9 @@ static char *check[] = { static void hexdump(unsigned char *buf, unsigned int len) { - while (len--) - printk("%02x", *buf++); - - printk("\n"); + print_hex_dump(KERN_CONT, "", DUMP_PREFIX_OFFSET, + 16, 1, + buf, len, false); } static void tcrypt_complete(struct crypto_async_request *req, int err) -- Denis Cheng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question regarding mutex locking
Larry Finger wrote: If a particular routine needs to lock a mutex, but it may be entered with that mutex already locked, would the following code be SMP safe? hold_lock = mutex_trylock() The common way to deal with this is first to restructure your function into two. One always acquires the lock, and the other (often written with a "__" prefix) never acquires it. The never-acquire code does the actual work, and the always-acquire function calls it. You then refactor the callers so that you don't have any code paths on which you can't predict whether or not the lock will be held. http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4, v3] Physical PCI slot objects
Hi Gary, First, thanks for all the help and testing -- I really appreciate it. * Gary Hade <[EMAIL PROTECTED]>: > > I'm getting back to you but unfortunately with not so good > news. Sorry Alex. :-/ > On the x3950 (configured single node) I encountered the below > problem when attempting to hotplug a PCIe adapter when 'pci_slot' > was loaded prior to 'acpiphp'. I did not see the problem when > the drivers were loaded in the opposite order. Very bizarre, especially given the stack trace below, which doesn't really make any sense to me at all. > FYI, the node contains 2 hotpluggable PCIe slots and 5 > non-hotpluggable PCIe slots but 'pci_slot' only exposed > the 2 hotpluggable slots. This does not appear to be due > to a 'pci_slot' driver problem since I looked at the DSDT > and SSDT and found that there are currently no _SUN methods > for the non-hotpluggable slots. Ok, this is not too surprising, but it's a different can o' worms. ;) Let's save this for another day... > invalid opcode: [1] SMP > CPU 1 > Modules linked in: acpiphp pci_slot e1000 aic79xx scsi_transport_spi shpchp > dock pci_hotplug ipt_LOG xt_limit xt_pkttype button battery ac power_supply > ip6t_REJECT xt_tcpudp ipt_REJECT iptable_mangle iptable_filter > ip6table_mangle ip_tables ip6table_filter ip6_tables x_tables ipv6 usbhid > ff_memless ext3 jbd loop dm_mod ehci_hcd uhci_hcd usbcore ide_cd bnx2 cdrom > rng_core reiserfs ata_piix ahci libata thermal processor piix sg megaraid_sas > fan edd sd_mod scsi_mod ide_disk ide_core > Pid: 121, comm: kacpi_notify Not tainted 2.6.24-rc3-gh-smp #1 > RIP: 0010:[] [] > :pci_slot:__this_module+0x21c4/0xf204 > RSP: 0018:81103fa43ea8 EFLAGS: 00010216 > RAX: 81103f944a18 RBX: 81103d4fe910 RCX: 000f > RDX: RSI: RDI: 8110400d13d0 > RBP: 8032d97b R08: 8110400fc7e0 R09: 0002 > R10: R11: 8021d193 R12: 811040105cf0 > R13: R14: 80635820 R15: > FS: () GS:8110400ed8c0() knlGS: > CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b > CR2: 2b266d876471 CR3: 00103c825000 CR4: 06e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process kacpi_notify (pid: 121, threadinfo 81103fa42000, task > 81103f9f8040) > Stack: 809c 81103d119a00 8032d99e 81103f9fc540 > 8024618d 81103f9fc540 81103f9fc540 8024696c > 80246a46 81103f9f8040 80249ada > Call Trace: > [] acpi_ev_notify_dispatch+0x57/0x60 > [] acpi_os_execute_notify+0x23/0x2c > [] run_workqueue+0x7f/0x10b > [] worker_thread+0x0/0xe4 > [] worker_thread+0xda/0xe4 > [] autoremove_wake_function+0x0/0x2e > [] kthread+0x47/0x73 > [] child_rip+0xa/0x12 > [] kthread+0x0/0x73 > [] child_rip+0x0/0x12 Maybe we're trying to kick off a hotplug event on the wrong slot? I really have no idea... > Code: ff ff ff ff 40 23 2c 88 ff ff ff ff 00 c8 c6 3b 10 81 ff ff > RIP [] :pci_slot:__this_module+0x21c4/0xf204 > RSP Can you apply this debug patch on top of your tree, and send me the output? I'd be curious to see the output for your failure case: # modprobe pci_slot debug=1 # modprobe acpiphp debug=1 Thanks. /ac diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c index 724f4f0..5a62def 100644 --- a/drivers/acpi/pci_slot.c +++ b/drivers/acpi/pci_slot.c @@ -30,12 +30,16 @@ #include #include +static int debug; + #define DRIVER_VERSION "0.1" #define DRIVER_AUTHOR "Alex Chiang <[EMAIL PROTECTED]>" #define DRIVER_DESC"ACPI PCI Slot Detection Driver" MODULE_AUTHOR(DRIVER_AUTHOR); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_LICENSE("GPL"); +MODULE_PARM_DESC(debug, "Debugging mode enabled or not"); +module_param(debug, bool, 0644); #define _COMPONENT ACPI_PCI_COMPONENT ACPI_MODULE_NAME("pci_slot"); @@ -43,6 +47,12 @@ ACPI_MODULE_NAME("pci_slot"); #define MY_NAME "pci_slot" #define err(format, arg...) printk(KERN_ERR "%s: " format , MY_NAME , ## arg) #define info(format, arg...) printk(KERN_INFO "%s: " format , MY_NAME , ## arg) +#define dbg(format, arg...)\ + do {\ + if (debug) \ + printk(KERN_DEBUG "%s: " format,\ + MY_NAME , ## arg); \ + } while (0) static int acpi_pci_slot_add(acpi_handle handle); static void acpi_pci_slot_remove(acpi_handle handle); @@ -125,6 +135,9 @@ register_slot(acpi_handle handle, u32 lvl, void *context, void **rv) if (IS_ERR(pci_slot)) err("pci_create_slot returned %ld\n", PTR_ERR(pci_slot)); +
Re: Trailing periods in kernel messages
Andrew Morton wrote: > On Thu, 29 Nov 2007 11:20:18 +0100 Frans Pop <[EMAIL PROTECTED]> wrote: > >> Well, for one it needlessly increases the size of log files. >> It also IMO just looks weird to have a trailing period only for some >> messages and it certainly is completely inappropriate for messages like: > > I'll confess to stealthily deleting some of those periods when nobody is > looking. > I don't find them to have any value and they do have some cost, including > screen > real estate at the source-code level. > > Just a roughly grep: # grep -r -P --include=*.[ch] 'printk.*\.\\n' * | wc -l 6025 # grep -r -P --include=*.[ch] '\.\\n' * | wc -l 12723 :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bluez-users] Lost connections - mouse and keyboard
On Nov 30, 2007 4:43 AM, Jiri Kosina <[EMAIL PROTECTED]> wrote: > On Thu, 29 Nov 2007, Marcel Holtmann wrote: > > > > >Nov 28 18:53:39 pico kernel: WARNING: at drivers/hid/hid-core.c:784 > [ ... ] > > > > Does bluetooth input devices have something to do with usbhid? I don't > > > know, perhaps this is another problem in kernel. > > in case you have a HID proxy dongle the usbhid driver can be involved. And > > since this is hiddev, then it will be caused by the hid2hci program. > > Absolutely. > > This particular warning means, that someone (usually indeed hid2hci) > passed usage through hiddev that was out of bounds, with respect to the > device's report descriptor. Is this behaviour the normal one? IMHO, userspace program should not cause kernel warnings like this no mater what input from users. > > This usually means that hid2hci has chosen the wrong method to switch the > modes. Unfortunately, it's not easy to implement always the switching > properly, if we don't know the vendor-specific packet that has to be sent. > > -- > Jiri Kosina > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] bdi patches
On Thursday November 29, [EMAIL PROTECTED] wrote: > > http://programming.kicks-ass.net/kernel-patches/foo/ > > > > bdi-task-dirty.patch > > bdi-sysfs.patch > > bdi-min.patch > > bdi-max.patch > > > > > > Is my current rather experimental stack, I just wrote the max part after > > having slept on it. I'm not fond of the multiplication there, but I > > dno't see a way around it. > > > > Compile tested only. > > I've done some testing on these patches and did some changes. So here > they go. > > Thanks, > Miklos > > - > Subject: mm: sysfs: expose the BDI object in sysfs > > Provide a place in sysfs for the backing_dev_info object. > This allows us to see and set the various BDI specific variables. You don't say what the place is, and I'm not quite familiar enough with sysfs internals to figure it out my self. Help? And while I was looking I noticed that bdi_register (and bdi_init_fmt) takes a second argument 'parent', which is always NULL, and which is undocumented as to purpose. If no-one would ever add another call to bdi_register, why have the second arg, and if they might, how would they know what to put there? Finally, the omission of NFS bothers me - and makes me wonder if the choice of name in sysfs is appropriate. Would a program ever want to generate the name (in sysfs) for a particular bdi? If so, how would it do it. It seems to me after a fairly quick look that a bdi is always associated with a device number. For block devices the device number is obvious. For NFS and FUSE, the device number is an anon device number allocated at mount time. Maybe the name of the bdi should be based on that number. Then it would be possible to map directly from e.g. a file to the bdi that the file would be written to. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Out of tree module using LSM
On Thu, 29 Nov 2007, Al Viro wrote: > Incidentally, I would really love to see the threat profile we are talking > about. Exactly. Please come up with a set of requirements that can be reviewed by the core kernel folk, and perhaps then focus on how to meet those requirements once they have been accepted. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/6] time: fix typo in comments
>> >> -/* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, the we can >> +/* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, then we can > > divide > Yes, I missed it. >> - * which, buy the way, it can do, but it take more code and at least 2 >> + * which, buy the way, it can do, but it takes more code and at least 2 > > by the way > (and does this really add anything to the sentence?) > Thanks for pointing it out :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: WARNING: at kernel/resource.c:189 __release_resource
On Thu, 29 Nov 2007 16:40:37 -0700 Bjorn Helgaas <[EMAIL PROTECTED]> wrote: > On Monday 26 November 2007 11:05:38 pm Andrew Morton wrote: > > On Thu, 22 Nov 2007 22:41:16 +0100 Jiri Slaby <[EMAIL PROTECTED]> wrote: > > > Ok, I hit the bug, suspend of 00:06 device complains about it: > > > WARNING: at .../kernel/resource.c:185 __release_resource() > > > > > > Call Trace: > > > [] release_resource+0xb5/0xf0 > > > [] pnp_release_resources+0x70/0x130 > > > [] pnp_stop_dev+0x45/0x90 > > > [] pnp_bus_suspend+0x92/0xb0 > > > [] suspend_device+0x113/0x180 > > > [] device_suspend+0x200/0x320 > > > [] suspend_devices_and_enter+0xa5/0x170 > > > [] enter_state+0x209/0x270 > > > [] state_store+0xaf/0xf0 > > > [] kobj_attr_store+0x17/0x20 > > > [] sysfs_write_file+0xce/0x140 > > > [] vfs_write+0xc7/0x170 > > > [] sys_write+0x50/0x90 > > > [] system_call+0x7e/0x83 > > > > > > # LANG=en ll /sys/devices/pnp0/00:06/ > > > total 0 > > > lrwxrwxrwx 1 root root0 Nov 22 22:35 driver -> > > > ../../../bus/pnp/drivers/serial > > > -r--r--r-- 1 root root 4096 Nov 22 22:35 id > > > -r--r--r-- 1 root root 4096 Nov 22 22:35 options > > > drwxr-xr-x 2 root root0 Nov 22 22:35 power > > > -rw-r--r-- 1 root root 4096 Nov 22 22:35 resources > > > lrwxrwxrwx 1 root root0 Nov 22 22:35 subsystem -> ../../../bus/pnp > > > drwxr-xr-x 3 root root0 Nov 22 22:35 tty > > > -rw-r--r-- 1 root root 4096 Nov 22 22:35 uevent > > > > I suppose that's a genuine leak, presumably in 8250_pnp. > > We used to have only the serial driver resource reservation. We now > have an additional 00:06 resource that is the parent of the serial > resource, e.g., > > 03f8-03ff : 00:06 > 03f8-03ff : serial > > I think this problem happens because pnp_bus_suspend() calls > serial_pnp_suspend(), which suspends the driver but does nothing > with the resources. Then it calls pnp_stop_dev(), which releases > the 00:06 resource, which still has a serial child resource. > > The corresponding PCI code in pci_device_suspend() does not do > any generic device disable or resource release. I don't know > why PNP disables the device on suspend. I glanced through the > ACPI spec but didn't see a requirement for it. Maybe Pierre [1] > remembers. > > Maybe we could either remove the pnp_{stop,start}_dev() calls > from the suspend/resume path, or move the PNP resource management > out of pnp_{start,stop}_dev(). > > Bjorn > > [1] http://lkml.org/lkml/2005/11/30/39 So was this particular problem caused/exposed by pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch, or is it in mainline? Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: constant_tsc and TSC unstable
Paul Rolland wrote: > Total of 2 processors activated (6919.15 BogoMIPS). > ENABLING IO-APIC IRQs > ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 > checking TSC synchronization [CPU#0 -> CPU#1]: > Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock. > Marking TSC unstable due to: check_tsc_sync_source failed. > Brought up 2 CPUs > ... Not sure if this is related, but thought I'd contribute it anyway... I've got a Pentium D system (dual core, single processor) and I on some boots I get "Marking TSC unstable due to check_tsc_sync_source failed" with some cycles warp between CPUs, while most boots are OK. This kind of inconsistency seems more due to a failure in the kernel to deal with differences between boots than with something inherent to the hardware. I conclude that because basically I never have any problems with the system once it has booted and the TSC has passed. >From my kern.logs since Okt 26, I get the following data: 2.6.23+cfs: 2 passes 2.6.23.1:1 pass; 1 failure (48 cycles warp) 2.6.24-rc1: 15 passes 2.6.24-rc2: 13 passes; 1 failure (8 cycles warp) 2.6.24-rc3: 5 passes; 3 failures (8, 8 and 16 cycles warp) Note that this is not a new issue. For 2.6.21/2.6.23-RCx kernels I reported similar data in http://lkml.org/lkml/2007/9/16/45. Cheers, FJP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Avoid overflows in kernel/time.c
When the conversion factor between jiffies and milli- or microseconds is not a single multiply or divide, as for the case of HZ == 300, we currently do a multiply followed by a divide. The intervening result, however, is subject to overflows, especially since the fraction is not simplified (for HZ == 300, we multiply by 300 and divide by 1000). This is exposed to the user when passing a large timeout to poll(), for example. This patch replaces the multiply-divide with a reciprocal multiplication on 32-bit platforms. When the input is an unsigned long, there is no portable way to do this on 64-bit platforms there is no portable way to do this since it requires a 128-bit intermediate result (which gcc does support on 64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but since the output is a 32-bit integer in the cases affected, just simplify the multiply-divide (*3/10 instead of *300/1000). The reciprocal multiply used can have off-by-one errors in the upper half of the valid output range. This could be avoided at the expense of having to deal with a potential 65-bit intermediate result. Since the intent is to avoid overflow problems and most of the other time conversions are only semiexact, the off-by-one errors were considered an acceptable tradeoff. NOTE: This patch uses a bc(1) script to compute the appropriate constants. Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> --- kernel/Makefile |8 +++ kernel/time.c | 29 +--- kernel/timeconst.bc | 123 +++ 3 files changed, 152 insertions(+), 8 deletions(-) create mode 100644 kernel/timeconst.bc diff --git a/kernel/Makefile b/kernel/Makefile index dfa9695..f136d18 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -80,3 +80,11 @@ quiet_cmd_ikconfiggz = IKCFG $@ targets += config_data.h $(obj)/config_data.h: $(obj)/config_data.gz FORCE $(call if_changed,ikconfiggz) + +$(obj)/time.o: $(obj)/timeconst.h + +quiet_cmd_timeconst = BC $@ + cmd_timeconst = (echo $(CONFIG_HZ) | bc -q $<) > $@ +targets += timeconst.h +$(obj)/timeconst.h: $(src)/timeconst.bc $(wildcard include/config/hz.h) FORCE + $(call if_changed,timeconst) diff --git a/kernel/time.c b/kernel/time.c index 09d3c45..8e790b5 100644 --- a/kernel/time.c +++ b/kernel/time.c @@ -39,6 +39,8 @@ #include #include +#include "timeconst.h" + /* * The timezone where the local system is located. Used as a default by some * programs who obtain this value by using gettimeofday. @@ -93,7 +95,8 @@ asmlinkage long sys_stime(time_t __user *tptr) #endif /* __ARCH_WANT_SYS_TIME */ -asmlinkage long sys_gettimeofday(struct timeval __user *tv, struct timezone __user *tz) +asmlinkage long sys_gettimeofday(struct timeval __user *tv, +struct timezone __user *tz) { if (likely(tv != NULL)) { struct timeval ktv; @@ -118,7 +121,7 @@ asmlinkage long sys_gettimeofday(struct timeval __user *tv, struct timezone __us * hard to make the program warp the clock precisely n hours) or * compile in the timezone information into the kernel. Bad, bad * - * - TYT, 1992-01-01 + * - TYT, 1992-01-01 * * The best thing to do is to keep the CMOS clock in universal time (UTC) * as real UNIX machines always do it. This avoids all headaches about @@ -239,7 +242,11 @@ unsigned int inline jiffies_to_msecs(const unsigned long j) #elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC) return (j + (HZ / MSEC_PER_SEC) - 1)/(HZ / MSEC_PER_SEC); #else - return (j * MSEC_PER_SEC) / HZ; +# if BITS_PER_LONG == 32 + return ((u64)HZ_TO_MSEC_MUL32 * j) >> HZ_TO_MSEC_SHR32; +# else + return (j * HZ_TO_MSEC_NUM) / HZ_TO_MSEC_DEN; +# endif #endif } EXPORT_SYMBOL(jiffies_to_msecs); @@ -251,7 +258,11 @@ unsigned int inline jiffies_to_usecs(const unsigned long j) #elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC) return (j + (HZ / USEC_PER_SEC) - 1)/(HZ / USEC_PER_SEC); #else - return (j * USEC_PER_SEC) / HZ; +# if BITS_PER_LONG == 32 + return ((u64)HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32; +# else + return (j * HZ_TO_USEC_NUM) / HZ_TO_USEC_DEN; +# endif #endif } EXPORT_SYMBOL(jiffies_to_usecs); @@ -351,7 +362,7 @@ EXPORT_SYMBOL(mktime); * normalize to the timespec storage format * * Note: The tv_nsec part is always in the range of - * 0 <= tv_nsec < NSEC_PER_SEC + * 0 <= tv_nsec < NSEC_PER_SEC * For negative values only the tv_sec field is negative ! */ void set_normalized_timespec(struct timespec *ts, time_t sec, long nsec) @@ -452,12 +463,13 @@ unsigned long msecs_to_jiffies(const unsigned int m) /* * Generic case - multiply, round and divide. But first * check that if we are doing a net multiplication, that -* we wouldnt overflow: +* we
Relation between nr_dirty and nr_inactive
Hi, I am running older kernel (CentOS 2.6.9-34 SMP) on 32 bit arch. Some of my systems got hung, while trying to write some data to disk. All those systems exhibit similar pattern where during this time, /proc/meminfo suggesting 'Inactive' < 'Dirty'. All of machines have 2G of physical memory and ~1.5G memory is locked (via mlock). I tried reading code but could not establish any direct relationship between Zone->in_active pages vs. per-cpu_page_state->nr_dirty. Has anybody seen system in this kind of state before ? And are these 2 parameters affect each-other ? Thanks -Kunal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3-mm2 (bugfix for memory cgroup per-zone-struct allocation.)
On Thu, 29 Nov 2007 16:25:33 -0500 Lee Schermerhorn <[EMAIL PROTECTED]> wrote: > > - pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node); > > + /* > > +* This routine is called against possible nodes. > > +* But it's BUG to call kmalloc() against offline node. > > +* > > +* TODO: this routine can waste much memory for nodes which will > > +* never be onlined. It's better to use memory hotplug callback > > +* function. > > +*/ > > + if (node_state(node, N_HIGH_MEMORY)) > > + pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node); > > + else > > + pn = kmalloc(sizeof(*pn), GFP_KERNEL); > > if (!pn) > > return 1; > > > > > > This worked for me. Can boot 24-rc3-mm2 [if I turn off async scsi scan, > that is--not related to mem controller]. > Thank you ! > Just FYI, on my ia64 platform, with NODES_SHIFT == 8 [RHEL & SLES ship > with 10, I believe], the size of the mem_cgroup structure is ~10KB. > Yes. But... I'll ask Goto-san how memory hotplug callback works and try it. Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] x86 setup: don't recalculate ss:esp unless really necessary
Hi Linus, It appears that unconditionally resetting the stack, which fixes old LILO, breaks LOADLIN after all. This patch should work with either, as well as work around the command-line truncation bug in old versions of SYSLINUX. Please pull: git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup.git for-linus Jens Rottmann (1): x86 setup: don't recalculate ss:esp unless really necessary arch/x86/boot/header.S | 41 - 1 files changed, 16 insertions(+), 25 deletions(-) commit 16252da654800461e0e1c32697cb59f4cda15aa9 Author: Jens Rottmann <[EMAIL PROTECTED]> Date: Tue Nov 27 12:35:13 2007 +0100 x86 setup: don't recalculate ss:esp unless really necessary In order to work around old LILO versions providing an invalid ss register, the current setup code always sets up a new stack, immediately following .bss and the heap. But this breaks LOADLIN. This rewrite of the workaround checks for an invalid stack (ss!=ds) first, and leaves ss:sp alone otherwise (apart from aligning esp). [hpa note: LOADLIN has a number of arbitrary hard-coded limits that are being pushed up against. Without some major revision of LOADLIN itself it will not be sustainable keeping it alive. This gives it another brief lease on life, however. This patch also helps the cmdline truncation problem with old versions of SYSLINUX.] Signed-off-by: Jens Rottmann Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S index 6ef5a06..4cc5b04 100644 --- a/arch/x86/boot/header.S +++ b/arch/x86/boot/header.S @@ -236,39 +236,30 @@ start_of_setup: movw%ax, %es cld -# Apparently some ancient versions of LILO invoked the kernel -# with %ss != %ds, which happened to work by accident for the -# old code. If the CAN_USE_HEAP flag is set in loadflags, or -# %ss != %ds, then adjust the stack pointer. +# Apparently some ancient versions of LILO invoked the kernel with %ss != %ds, +# which happened to work by accident for the old code. Recalculate the stack +# pointer if %ss is invalid. Otherwise leave it alone, LOADLIN sets up the +# stack behind its own code, so we can't blindly put it directly past the heap. - # Smallest possible stack we can tolerate - movw$(_end+STACK_SIZE), %cx - - movwheap_end_ptr, %dx - addw$512, %dx - jnc 1f - xorw%dx, %dx# Wraparound - whole segment available -1: testb $CAN_USE_HEAP, loadflags - jnz 2f - - # No CAN_USE_HEAP movw%ss, %dx cmpw%ax, %dx# %ds == %ss? movw%sp, %dx - # If so, assume %sp is reasonably set, otherwise use - # the smallest possible stack. - jne 4f # -> Smallest possible stack... + je 2f # -> assume %sp is reasonably set + + # Invalid %ss, make up a new stack + movw$_end, %dx + testb $CAN_USE_HEAP, loadflags + jz 1f + movwheap_end_ptr, %dx +1: addw$STACK_SIZE, %dx + jnc 2f + xorw%dx, %dx# Prevent wraparound - # Make sure the stack is at least minimum size. Take a value - # of zero to mean "full segment." -2: +2: # Now %dx should point to the end of our stack space andw$~3, %dx# dword align (might as well...) jnz 3f movw$0xfffc, %dx# Make sure we're not zero -3: cmpw%cx, %dx - jnb 5f -4: movw%cx, %dx# Minimum value we can possibly use -5: movw%ax, %ss +3: movw%ax, %ss movzwl %dx, %esp # Clear upper half of %esp sti # Now we should have a working stack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + proc-fix-the-threaded-proc-self.patch added to -mm tree
* Eric W. Biederman <[EMAIL PROTECTED]> wrote: > > You'll never run out of this sort of problem. Keeping Linux lean and > > simple would be far better. > > Nah. The control group stuff has all kinds of corner cases because it > is a new and untested API. The namespace work after we get the code > cleanup up so it is maintainable and we can work with it is usually > just finding our globals through a pointer instead of from a static > variable. Hardly a measurable cost on the best day. yeah - anyone who claims that containers are 'fat' has likely not even looked at the code. Even maintainance-wise there's very visible positive effects: we do discover and properly map our "global resource" dependencies and abstract them. That increases cleanliness of our code and APIs all around. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH x86/mm 01/11] x86-32 thread_struct.debugreg
On Thu, Nov 29, 2007 at 01:50:55PM -0800, Roland McGrath wrote: > UML is also a good test, though I have never been set up to verify > anything beyond "UML seems to boot far enough to complain I don't > have a userland filesystem for it". BTW, this doesn't exercise ptrace at all. Interesting ptrace things only start happening when userspace runs. Grab an interesting-looking image from http://uml.nagafix.co.uk, uncompress it, and run ./linux ubda=the-filesystem-image Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH x86/mm 11/11] x86 ptrace merge removals
On Thu, Nov 29, 2007 at 02:38:03PM -0800, Roland McGrath wrote: > > Can you make sure that UML still runs when you're done with ptrace? > > I'd be glad to, especially if you give me some advice on testing (.config > for um-i386 and um-x86_64, what do try that constitutes "UML still runs"). Use defconfig and boot it. If you break ptrace, I think it's overwhelmingly likely that UML will stop booting. So if UML boots, I'd say you're good to go, with one caveat. That is, UML should report at boot that PTRACE_SYSEMU works. I put in a fallback from PTRACE_SYSEMU to PTRACE_SYSCALL when Fedora broke PTRACE_SYSEMU. > Right now (before these), UML > doesn't build for x86_64 or i386 from this tree to begin with. For current -mm, you'll need http://marc.info/?l=linux-kernel=119635496908681=raw to build. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Possibly SATA related freeze killed networking and RAID
Phillip Susi wrote: Tejun Heo wrote: Agreed. Nobody cared on ATA controllers is usually very effective at taking the whole machine down. Is there any reason why we don't turn on irqpoll on turned off IRQs automatically? Why does a single spurious interrupt cause it to be shut down? I can see if the interrupt is stuck on and keeps interrupting constantly, but if it's just the occasional spurious interrupt, why not just ignore it and move on? I'm not certain offhand, but I think there may be such a threshold. However, an occasional spurious interrupt isn't likely. For a level-triggered interrupt, an unhandled interrupt will keep interrupting forever since nobody knows how to clear it (until we decide to disable the IRQ entirely). -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Possibly SATA related freeze killed networking and RAID
Phillip Susi wrote: > Tejun Heo wrote: >> Agreed. Nobody cared on ATA controllers is usually very effective at >> taking the whole machine down. Is there any reason why we don't turn on >> irqpoll on turned off IRQs automatically? > > Why does a single spurious interrupt cause it to be shut down? I can > see if the interrupt is stuck on and keeps interrupting constantly, but > if it's just the occasional spurious interrupt, why not just ignore it > and move on? Because SFF ATA controller don't have IRQ pending bit. You don't know whether IRQ is raised or not. Plus, accessing the status register which clears pending IRQ can be very slow on PATA machines. It has to go through the PCI and ATA bus and come back. So, unconditionally trying to clear IRQ by accessing Status can incur noticeable overhead if the IRQ is shared with devices which raise a lot of IRQs. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/2] x86, ptrace: support for branch trace store(BTS)
On Thu, 29 Nov 2007 08:14:10 - "Metzger, Markus T" <[EMAIL PROTECTED]> wrote: > Support for Intel's last branch recording to ptrace. This gives > debuggers > access to this hardware feature and allows them to show an execution > trace > of the debugged application. > > Last branch recording (see section 18.5 in the Intel 64 and IA-32 > Architectures Software Developer's Manual) allows taking an execution > trace of the running application without instrumentation. When a branch > is executed, the hardware logs the source and destination address in a > cyclic buffer given to it by the OS. > > This can be a great debugging aid. It shows you how exactly you got > where you currently are without requiring you to do lots of single > stepping and rerunning. > > This patch manages the various buffers, configures the trace > hardware, disentangles the trace, and provides a user interface via > ptrace. On the high-level design: > - there is one optional trace buffer per thread_struct > - upon a context switch, the trace hardware is reconfigured to either > disable tracing or to use the appropriate buffer for the new task. > - tracing induces ~20% overhead as branch records are sent out on > the bus. > - the hardware collects trace per processor. To disentangle the > traces for different tasks, we use separate buffers and reconfigure > the trace hardware. > - the low-level data layout is configured at cpu initialization time > - different processors use different branch record formats > > > patch 1/2 contains the kernel changes > patch 2/2 contains changes to the ptrace man pages > > Is there any userspace code avaialble which people can use to play with this? How do you envisage it being used in the long term? Do you expect any of the standard performance tuning tools will be tweaked to understand this feature and if so which ones? I'm generally wondering "how will developers be using this in a year or two's time?" Please cc Michael Kerrisk <[EMAIL PROTECTED]> on future versions of these patches. The patches were horridly wordwrapped. Is there any likelihood that any other CPUs do now or will in the future support any similar feature to this? If so, is an implementation which is 100% contained to arch/x86 appropriate? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Reproducible data corruption with sendfile+vsftp - splice regression?
Hi - This regular Linux user and lkml lurker just noticed data corruption in ftp'ed files and narrowed it down to vsftpd using sendfile(). So far this has never caused problems in the past; I have not noticed this with 2.6.22.x but may have missed it. I do remember reading about some changes to the underlying splice stuff since .23 so that may have something to do with it. The scenario: - created a file with known bit pattern on Linux server - ftp-got this file to Windows client: file has bad crc (yes, binary) - verified with another client: same result I have thus far eliminated (to the best of my knowledge) NICs, switches, cables, the Windows FTP clients, the hard disk in the server (SATA, ext3): nothing suspicious in any logs. Box is an AMD Sempron 2600+ with 1.5 GB RAM, added rt8169 card, Gentoo, vsftpd stable 2.0.5 - nothing fancy. Transferring the file with samba (interestingly with sendfile enabled) and via ftp but from /dev/shm repeatably works fine; pulling from disk creates bad crc, every time. The file is readable and can be copied, verified etc. over and over so I'm sure that I'm not falling prey to a false positive. ifconfig indicates no dropped or otherwise corrupted packets. I noticed this first with 2.6.4-rc3, but also just tried the latest stable 2.6.23.9 with the same config, with no change in behaviour. After setting vsftpd to use_sendfile=NO, gigs can be transferred without corruption. The data corruption is sporadic, but absolutely repeatable. The file with the known good pattern just contains multiple lines of: 012345678901234567890123456789012345678901234567890 012345678901234567890123456789012345678901234567890 012345678901234567890123456789012345678901234567890 ..etc.. A corrupted file is missing random characters, so that the corrupted lines looks like this (line numbers added by me): 19785: 012345678901234567890123456789012345678901234567890 19786: 01234567890123456789012345678901234567890123678901234567890 19787: 012345678901234567890123456789012345678901234567890 or: 20074: 012345678901234567890123456789012345678901234567890 20075: 01234567890123456789012345678901234567890123012345678901234567890123456789012345678901234567890 20076: 012345678901234567890123456789012345678901234567890 Again, other network or hd traffic shows no signs of gremlins; the box is perfectly stable, and turning sendfile on or off triggers/untriggers the corruption reliably. I will try 2.6.22.x over the weekend, and before I bother lkml with dmesg/.config etc. I wanted to fish for initial thoughts. thanks Holger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xfs: revert to double-buffering readdir
On Sun, 25 Nov 2007, Christoph Hellwig wrote: This patch does exactly that and reverts xfs_file_readdir to what's basically the 2.6.23 version minus the uio and vnops junk. Thanks, works here too (without nordirplus as a mountoption). Am I supposed to close the bug[0] or do you guys want to leave this open to track the Real Fix (TM) for 2.6.25? Again, thank you for the fix! Christian. [0] http://bugzilla.kernel.org/show_bug.cgi?id=9400 -- BOFH excuse #112: The monitor is plugged into the serial port - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: WARNING: at kernel/resource.c:189 __release_resource
On Monday 26 November 2007 11:05:38 pm Andrew Morton wrote: > On Thu, 22 Nov 2007 22:41:16 +0100 Jiri Slaby <[EMAIL PROTECTED]> wrote: > > Ok, I hit the bug, suspend of 00:06 device complains about it: > > WARNING: at .../kernel/resource.c:185 __release_resource() > > > > Call Trace: > > [] release_resource+0xb5/0xf0 > > [] pnp_release_resources+0x70/0x130 > > [] pnp_stop_dev+0x45/0x90 > > [] pnp_bus_suspend+0x92/0xb0 > > [] suspend_device+0x113/0x180 > > [] device_suspend+0x200/0x320 > > [] suspend_devices_and_enter+0xa5/0x170 > > [] enter_state+0x209/0x270 > > [] state_store+0xaf/0xf0 > > [] kobj_attr_store+0x17/0x20 > > [] sysfs_write_file+0xce/0x140 > > [] vfs_write+0xc7/0x170 > > [] sys_write+0x50/0x90 > > [] system_call+0x7e/0x83 > > > > # LANG=en ll /sys/devices/pnp0/00:06/ > > total 0 > > lrwxrwxrwx 1 root root0 Nov 22 22:35 driver -> > > ../../../bus/pnp/drivers/serial > > -r--r--r-- 1 root root 4096 Nov 22 22:35 id > > -r--r--r-- 1 root root 4096 Nov 22 22:35 options > > drwxr-xr-x 2 root root0 Nov 22 22:35 power > > -rw-r--r-- 1 root root 4096 Nov 22 22:35 resources > > lrwxrwxrwx 1 root root0 Nov 22 22:35 subsystem -> ../../../bus/pnp > > drwxr-xr-x 3 root root0 Nov 22 22:35 tty > > -rw-r--r-- 1 root root 4096 Nov 22 22:35 uevent > > I suppose that's a genuine leak, presumably in 8250_pnp. We used to have only the serial driver resource reservation. We now have an additional 00:06 resource that is the parent of the serial resource, e.g., 03f8-03ff : 00:06 03f8-03ff : serial I think this problem happens because pnp_bus_suspend() calls serial_pnp_suspend(), which suspends the driver but does nothing with the resources. Then it calls pnp_stop_dev(), which releases the 00:06 resource, which still has a serial child resource. The corresponding PCI code in pci_device_suspend() does not do any generic device disable or resource release. I don't know why PNP disables the device on suspend. I glanced through the ACPI spec but didn't see a requirement for it. Maybe Pierre [1] remembers. Maybe we could either remove the pnp_{stop,start}_dev() calls from the suspend/resume path, or move the PNP resource management out of pnp_{start,stop}_dev(). Bjorn [1] http://lkml.org/lkml/2005/11/30/39 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Out of tree module using LSM
On Thu, 2007-11-29 at 21:45 +, Alan Cox wrote: > > Jargon File in all its glory. And if you still think you could look for > > patterns, how about executable code that self-modifies in random ways > > but when executed as a whole actually has the functionality of fetchmail > > embedded within it? How would you guard against that? > > Thats a problem for whoever writes the ESR detection tool and to what > level it works. The question for the kernel is how do we provide a > mechanism to allow (to some extent at least) this kind of tool to run. Right. I'm just saying reading a single page out of context (no pun intended) is not going to be very useful. They need to scan the entire file, which means that there are limited ways this is practical - it's not practical to do that on every write into a shared mapping, hence a solution that scans on open, etc. is probably the best there is. (I know you know this) Jon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata NCQ blacklist entry
On 11/29/07, Tejun Heo <[EMAIL PROTECTED]> wrote: > > I now have affected drives on my desk and am gonna try reproduce it. My > gut feeling says it's timing related problem on controller / driver > side. Please wait a bit. > > > by the way, and OT, did the Plextor DVD-RW drive reach you, Tejun? > > No, not yet. Do you have a tracking number or something? > > Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Out of tree module using LSM
On Thu, 2007-11-29 at 15:56 -0500, [EMAIL PROTECTED] wrote: > On Thu, 29 Nov 2007 14:45:51 EST, Jon Masters said: > > Ah, but I could write a sequence of pages that on their own looked > > garbage, but in reality, when executed would print out a copy of the > > Jargon File in all its glory. And if you still think you could look for > > patterns, how about executable code that self-modifies in random ways > > but when executed as a whole actually has the functionality of fetchmail > > embedded within it? How would you guard against that? > > So, just because Fred Cohen showed in his PhD thesis that *perfect* > virus/malware > scanning is equivalent to the Turing Halting Problem, we should abandon > efforts to make a 99.9998% workable system? I think you misread what I said. I implied the exact opposite :-) I'm trying to show that I understand the problem by saying the above, that doing this perfectly is impossible, but I also happen to believe that there are those who have solutions that provide a level of protection to their users, who ask for such things. Hence my point is that it's not really our place to debate whether virus scanning is good/bad but more how to provide a sane API. I'll get a spec. Jon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() broken in 2.6.23?
On Thu, Nov 29, 2007 at 02:45:23PM -0500, Chuck Ebbert wrote: > Original report: https://bugzilla.redhat.com/show_bug.cgi?id=404201 > > The test case below, taken from the LTP test code, prints -1 (as > expected) on 2.6.22 and 0 on 2.6.23. It tries to remap an out-of-range > page. Proposed patch follows the program. Bug was apparently caused by > commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7. Ah, that's not such good behaviour anyway. mmap is allowed to map outside the file offset, so you're telling me that remap_file_pages just magically should not be allowed to remap these...? > Patch: > > Signed-off-by: Supriya Kannery <[EMAIL PROTECTED]> > > --- linux-2.6.23/mm/fremap.c.orig 2007-11-22 00:56:09.0 -0600 > +++ linux-2.6.23/mm/fremap.c 2007-11-26 03:08:55.0 -0600 > @@ -124,6 +124,7 @@ asmlinkage long sys_remap_file_pages(uns > struct vm_area_struct *vma; > int err = -EINVAL; > int has_write_lock = 0; > + unsigned long f_size = 0; > > if (__prot) > return err; > @@ -181,6 +182,14 @@ asmlinkage long sys_remap_file_pages(uns > goto retry; > } > mapping = vma->vm_file->f_mapping; > + > + f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1; > + f_size = f_size >> PAGE_CACHE_SHIFT; > + if ((pgoff + size >> PAGE_CACHE_SHIFT) > f_size) { > + err = -EINVAL; > + goto out; > + } > + > /* >* page_mkclean doesn't work on nonlinear vmas, so if >* dirty pages need to be accounted, emulate with linear I don't think there is anything preventing truncate races here. Theoretically we could do it by taking i_mutex around here, but anyway then a subsequent truncate is just going to be able to cause the mapping to be out of bounds anyway. If it were any other syscall than remap_file_pages, I'd be much more hesitant to say this: I propose we change the test case instead. I also changed other elements of the API, and we had the result tested and verified by Oracle... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Reduce stack used by lib/hexdump.c
On Thu, 2007-11-29 at 22:07 +0100, Jan Engelhardt wrote: > I'd add GFP_ATOMIC here. Who knows whether tomorrow, the oops dumper > or warn_on will use print_hex_dump. Signed-off-by: Joe Perches <[EMAIL PROTECTED]> diff --git a/lib/hexdump.c b/lib/hexdump.c index 70e23fb..be94934 100644 --- a/lib/hexdump.c +++ b/lib/hexdump.c @@ -140,13 +140,20 @@ EXPORT_SYMBOL(hex_dump_to_buffer); * Example output using %DUMP_PREFIX_ADDRESS and 4-byte mode: * 88089af0: 73727170 77767574 7b7a7978 7f7e7d7c pqrstuvwxyz{|}~. */ + +#define HEX_LINE_SIZE 200 + void print_hex_dump(const char *level, const char *prefix_str, int prefix_type, int rowsize, int groupsize, const void *buf, size_t len, bool ascii) { const u8 *ptr = buf; int i, linelen, remaining = len; - unsigned char linebuf[200]; + unsigned char *linebuf; + + linebuf = kmalloc(HEX_LINE_SIZE, GFP_ATOMIC); + if (!linebuf) { + WARN_ON(1); + return; + } if (rowsize != 16 && rowsize != 32) rowsize = 16; @@ -155,7 +162,7 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type, linelen = min(remaining, rowsize); remaining -= rowsize; hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize, - linebuf, sizeof(linebuf), ascii); + linebuf, HEX_LINE_SIZE, ascii); switch (prefix_type) { case DUMP_PREFIX_ADDRESS: @@ -170,6 +177,7 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type, break; } } + kfree(linebuf); } EXPORT_SYMBOL(print_hex_dump); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata NCQ blacklist entry
On 11/29/07, Tejun Heo <[EMAIL PROTECTED]> wrote: > > I now have affected drives on my desk and am gonna try reproduce it. My > gut feeling says it's timing related problem on controller / driver > side. Please wait a bit. > Okay, no problem, I am just curious. > > by the way, and OT, did the Plextor DVD-RW drive reach you, Tejun? > > No, not yet. Do you have a tracking number or something? > No, I havn't... all I got is the bill... but that doesn't help because we choosed to use shipment without enshurance... there is no tracking number. Mhhh that sucks... i can't get rid of the bad feeling that it got lost. But I'll try to make some checks. CU Bjoern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: constant_tsc and TSC unstable
>-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] On Behalf Of Paul >Rolland (???・???) >Sent: Thursday, November 29, 2007 8:12 AM >To: Linux Kernel >Cc: [EMAIL PROTECTED] >Subject: constant_tsc and TSC unstable > >Hello, > >I've a machine with a Core2Duo CPU. /proc/cpuinfo reports the flag >constant_tsc, but at boot time, I have the log : > >... >Total of 2 processors activated (6919.15 BogoMIPS). >ENABLING IO-APIC IRQs >..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 >checking TSC synchronization [CPU#0 -> CPU#1]: >Measured 3978592228 cycles TSC warp between CPUs, turning off >TSC clock. >Marking TSC unstable due to: check_tsc_sync_source failed. >Brought up 2 CPUs >... > >This machine is running 2.6.23.1-21.fc7. I know I should >report to Fedora, >but I was wondering if this is a bug or a feature ;) > TSCs on Core 2 Duo are supposed to be in sync unless CPU supports deep idle states like C2, C3. Can you send the full /proc/cpuinfo and full dmesg. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add the infamous Huawei E220 to option.c
Am Donnerstag, 29. November 2007 19:53:39 schrieb Jaime Velasco Juan: > Hi, > > El jue. 29 de nov. de 2007, a las 15:05:50 +0100, Johann Wilhelm escribió: > > If everything's working please also add code to also support the other > > E220 device... so both PID 0x1003 and 0x1004 should be treaded the same > > way... > > > > to test the device with the 0x1004-PID maybe Jaime Velasco > > <[EMAIL PROTECTED]> could be asked.. he initialy added the lines for > > this device in option.c > > The following patch works for me (on kernel 2.6.23). Jaime, please add your signed off by line and resend the patch with both lines to Greg. Signed-off-by: Oliver Neukum <[EMAIL PROTECTED]> Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] hugetlbfs :shmget with SHM_HUGETLB only works as root
On Fri, Nov 30, 2007 at 12:02:32AM +0530, Ciju Rajan K wrote: > I tested your patch. But that is not solving the problem. > If the code change to user_shm_lock() is not a good solution, could > you please suggest a method so that the normal user is able to allocate > the huge pages, if his gid is added to /proc/sys/vm/hugetlb_shm_group The patch I posted resolves a race unrelated to your issue. Raising your locked memory limits should not be difficult. /etc/limits.conf or similar should set it up for you. You can also change the default rlimit in the kernel and compile it with default limits elevated to what you want your unprivileged process to have to start with if you're truly having lots of trouble getting userspace to set the default limits properly. I'd look in include/asm-generic/resource.h -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)
On Thu, 29 Nov 2007, Alan Stern wrote: > > Yes indeed. I wish I could point you to the exact patch containing the > fix, but the git software seems to have lost track of it (it's combined > in with a large number of other patches with no obvious way to separate > it out). It's also available in the various mailing list archives, but > I don't have a pointer to it and there's no reasonable way to search > for it. > > The patch in question was written by Matthew Wilcox; it added code to > the SCSI async-scanning routines to utilize the scan_mutex. IMO it > should have been applied to 2.6.23 but it wasn't. Heh. It definitely hasn't gotten lost by "the git software". In fact, with the kinds of hints you already gave, git makes it really _trivial_ to find it. Here's what you do: git log v2.6.23.. --author=Wilcox and then just search for "scan_mutex", in the hope that Matthew wrote a nice commit message. And yes, he did, so in less than a blink you get: commit 6b7f123f378743d739377871c0cbfbaf28c7d25a Author: Matthew Wilcox <[EMAIL PROTECTED]> Date: Tue Jun 26 15:18:51 2007 -0600 [SCSI] Fix async scanning double-add problems Stress-testing and some thought has revealed some places where asynchronous scanning needs some more attention to locking. - Since async_scan is a bit, we need to hold the host_lock while modifying it to prevent races against other CPUs modifying the word that bit is in. This is probably a theoretical race for the moment, but other patches may change that. - The async_scan bit means not only that this host is being scanned asynchronously, but that all the devices attached to this host are not yet added to sysfs. So we must ensure that this bit is always in sync. I've chosen to do this with the scan_mutex since it's already acquired in most of the right places. ... which I assume is the commit you're talking about. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
alloc_page_vma: should be called from a module? (not exported in x86_64)
Hi, We've developed a driver for an image acquisition card, which maps kernel alloc'ed buffers into user space vma's. We use alloc_page + remap_pfn_range in the driver mmap file_operation. After looking at alloc_page_vma, I thought that it might be more appropiate than alloc_page in this context. However, if CONFIG_NUMA=y (x86_64), this function is not visible to modules. Is this limitation intentional? We alloc RAM in a page-by-page basis. Is vm_insert_page more appropiate than remap_pfn_range? Thanks a lot for your help. Alejandro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Race between generic_forget_inode() and sync_sb_inodes()?
On Friday November 30, [EMAIL PROTECTED] wrote: > On Fri, Nov 30, 2007 at 09:07:06AM +1100, Neil Brown wrote: > > > > Hi David, > > > > On Friday November 30, [EMAIL PROTECTED] wrote: > > > > > > > > > I came across this because I've been making changes to XFS to avoid the > > > inode hash, and I've found that I need to remove the inode from the > > > dirty list when setting I_WILL_FREE to avoid this race. I can't see > > > how this race is avoided when inodes are hashed, so I'm wondering > > > if we've just been lucky or there's something that I'm missing that > > > means the above does not occur. > > > > Looking at inode.c in 2.6.23-mm1, in generic_forget_inode, I see code: > > > > if (!hlist_unhashed(>i_hash)) { > > if (!(inode->i_state & (I_DIRTY|I_SYNC))) > > list_move(>i_list, _unused); > > > > so it looks to me like: > >If the inode is hashed and dirty, then move it (off the s_dirty > >list) to inode_unused. > > That check is for if the inode is _not_ dirty or being sync, right? > Or have I just not had enough coffee this morning? :-) And I cannot even blame the lack of coffee as I don't drink it. My second guess is that we have been lucky which is hard to believe. I wonder if iput (and even iget) should BUG on I_WILL_FREE as well... Perplexed. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH x86/mm 01/11] x86-32 thread_struct.debugreg
On 11/29/2007 04:50 PM, Roland McGrath wrote: > Jan Kratochvil has helped me a great deal with ptrace testing lately. > We have started to collect a small regression test suite, see > http://sourceware.org/systemtap/wiki/utrace/tests for pointers. That > has tests for individual problems that have come up, and not anything > exhaustive for testing all ptrace functionality. You could contribute them to LTP? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH x86/mm 6/6] x86-64 ia32 ptrace get/putreg32 current task
On 11/29/2007 05:21 PM, Roland McGrath wrote: >>> case offsetof(struct user32, regs.gs): >>> *val = child->thread.gsindex; >>> + if (child == current) >>> + asm("movl %%gs,%0" : "=r" (*val)); >> Won't this return the kernel's GS instead of the user's? > [...] >> But this is x86_64, where swapgs is done on kernel entry. > > As I understand it, and from what the documentation I have says, swapgs has > nothing to do with the %gs selector. It affects the "GS base register", > i.e. the MSR. > Yep, I confused the GS selector with the base address in the descriptor. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata NCQ blacklist entry
Bjoern Olausson wrote: > On 11/7/07, Tejun Heo <[EMAIL PROTECTED]> wrote: >> Thanks. We're currently trying to find out what's actually going on >> with all these drives. At first, drives which got blacklisted aren't >> many and made sense (had other problems with NCQ, etc..) but with new >> generation drives from many vendors showing the same symptom, we aren't >> too sure now. >> >> I'll keep your email in my todo list and add the drive to the blacklist >> once the problem is verified. >> >> Thanks. > > Something new on the NCQ front? > Just asking if you need someone to test some of your ideas? > > I got the "WDC WD740ADFD-00NLR1" I now have affected drives on my desk and am gonna try reproduce it. My gut feeling says it's timing related problem on controller / driver side. Please wait a bit. > by the way, and OT, did the Plextor DVD-RW drive reach you, Tejun? No, not yet. Do you have a tracking number or something? Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Out of tree module using LSM
Alan Cox <[EMAIL PROTECTED]> writes: > > The simple case is > open > write cathedral and bazaar in some order > close >process -> label eric_t> > > open (eric_t) - SELinux "no" > > > Anyone smart will then write it out of order and keep the file open, or That would assume Eric already has a program running on your system optimized to inject his works in a obfuscated way. And if he has a program running he can do nearly everything already. You already lost the game. The normal case Tvrtko et.al. are trying to handle would be more the work getting downloaded from somewhere or read from a usb stick using normal programs like web browsers or file managers who don't do any out of order writing tricks and other obfuscation. Important exception might be things like BitTorrent who write out of order or parallel downloaders to cheat TCP congestion control. Or simply tar+gzip with automatic depacking in desktops. There are probably more and it's probably tricky but it is not a "need to handle arbitary nastiness by a determined attacker" situation. Anyways I'm not saying that pattern matching is a useful security measure (just the interaction with compression and encryption makes it very dubious), but if you're talking hypothetically you should at least look closely at the hypothetical use cases @) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/