Re: [RFC v2] Documentation about unaligned memory access

2007-11-29 Thread DM
On Nov 29, 2007 5:15 PM, Daniel Drake <[EMAIL PROTECTED]> wrote:
[...]
> To avoid the unaligned memory access, you would rewrite it as follows:
>
>void myfunc(u8 *data, u32 value)
>{
>[...]
>value = cpu_to_le32(value);
>put_unaligned(value, data);
>[...]
>}
>
> The get_unaligned() macro works similarly. Assuming 'data' is a pointer to
> memory and you wish to avoid unaligned access, its usage is as follows:
>
>u32 value = get_unaligned(data);
>
> These macros work work for memory accesses of any length (not just 32 bits as
> in the examples above). Be aware that when compared to standard access of
> aligned memory, using these macros to access unaligned memory can be costy in
> terms of performance.
>

The get_unaligned call above will not do what you intended given the,
at least as I read it, implied context of myfunc. Since data is a u8*
it will only get one byte of data. To avoid misunderstandings the code
should probably read:

u32 value = get_unaligned((u32 *)data);

/DM
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xfs: revert to double-buffering readdir

2007-11-29 Thread David Chinner
On Fri, Nov 30, 2007 at 12:45:05AM +0100, Christian Kujau wrote:
> On Sun, 25 Nov 2007, Christoph Hellwig wrote:
> >This patch does exactly that and reverts xfs_file_readdir to what's
> >basically the 2.6.23 version minus the uio and vnops junk.
> 
> Thanks, works here too (without nordirplus as a mountoption).
> Am I supposed to close the bug[0] or do you guys want to leave this
> open to track the Real Fix (TM) for 2.6.25?

I've been giving the fix some QA - that change appears to have caused
a different regression as well so I'm holding off for a little bit
until we know what the cause of the other regression is before deciding
whether to take this fix or back the entire change out.

Either way we'll include the fix in 2.6.24

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + proc-fix-the-threaded-proc-self.patch added to -mm tree

2007-11-29 Thread Albert Cahalan
On Nov 29, 2007 4:40 PM, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
> "Albert Cahalan" <[EMAIL PROTECTED]> writes:
>
> > On Nov 28, 2007 6:31 AM, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
> >> Ingo Molnar <[EMAIL PROTECTED]> writes:
> >> > * Albert Cahalan <[EMAIL PROTECTED]> wrote:
> >> >> On Nov 27, 2007 7:49 PM, Guillaume Chazarain <[EMAIL PROTECTED]> wrote:

> Linux tasks when used in one particular way can fulfill the posix
> requirements for single threaded processes.
>
> Linux task groups when used in one particular way can fulfill the
> posix requirements for processes.

Right. Once you leave this, weirdness happens.
POSIX defines things in terms of processes and threads.
POSIX defines many of our interfaces. That includes
kernel behavior, the C library, and numerous programs.

> As for where /proc/self points given that procps seems to read
> files like /proc/self/stat.  It looks to me like we have a clear
> case of a user space application that cares about the current
> behavior and would break if we changed things.

I wasn't saying procps would break, though it would if
/proc/self/task went away. I'm more concerned about
multi-threaded things that look in their own /proc/self
directory. The procps programs are single-threaded.

In procps, the self link is used:

a. to see if the wchan file exists
b. to see if the task directory exists
c. to find the tty number

(that last one: there might not be a file descriptor
for the tty, and anyway I need it with the bits in all
the same places as what I get for the other processes)

I'll bet that something reads /proc/self/stat to see
CPU usage.

> > Note that it was intended that non-legacy additions
> > would normally be added to either the process directory
> > or the thread directory, not both. I think somebody may
> > have ripped out the ability to do this; at the very least
> > there have been numerous illogical additions.
>
> The rationale was not conveyed and the policy you describe
> seems like deprecating the /proc/ directory in favor
> of the /proc//task//.  Which was a pattern
> never established and it doesn't seem to make anything better
> so I don't see the point there.

For the stuff that is logically per-task, yes.
For the rest, no. Oh well...

It does make things better because redundant info
is a source of confusion.

> >> I'm still trying to understand which will break user space more,
> >> adding /proc/task or changing /proc/self.
> >
> > Changing /proc/self makes you get per-thread data
> > when you asked for per-process data. That's bad.
>
> /proc/self used to ask for per task data.  Which is why there
> is some confusion.

Heh. Well, /proc/self used to ask for per process data.
It was all the same. I think it matters that /proc/self was
always documented as being per-process.

> >> >> This one is probably best:
> >> >> /proc/task -> 123/task/456
> >> >> (with both numbers showing)
> >> >
> >> > this sounds good to me. If it's a symlink then there's not much other
> >> > choice because the thread PIDs do not even show up under /proc anymore.
> >>
> >> The name sounds good to me.
>
> I will see about writing the patch for this in a bit and sending
> it to Andrew.

Nice.

> Nope.  /proc/mounts was a symlink to /proc/self/mounts long before
> /proc/self was modified to stop pointing at the task directory and
> changed it point at the new task group directory.

Having the filesystem namespace be per-process is wild enough.
We really don't need it to be per-thread. (and yes, I'm using the
POSIX terms on purpose)

> Frankly from what I have seen of the code the task-group work
> seems to be a larger source of bugs, and complications, because
> people have a darn hard time wrapping their head around how it
> is supposed to behave, and all of the corner cases were not
> resolved at the time it was developed.

People look at me like I have two heads when I explain to
them that the Linux kernel source uses "pid" to mean
a thread. The bad terminology probably promotes bad thinking.
It would be lovely if that could somehow get fixed.

> My favorite ongoing issue is what is needed to allow a threaded
> init to actually function properly.  I think enough fixes have
> gone in that it might even work.

My "favorite" is the multi-threaded debugger. By this I
mean the debugger itself wants to be multi-threaded,
issuing ptrace commands from multiple threads.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


circular locking dependency detected

2007-11-29 Thread Aneesh Kumar K.V

===
[ INFO: possible circular locking dependency detected ]
2.6.24-rc3 #6
---
bash/2294 is trying to acquire lock:
(>j_list_lock){--..}, at: [] 
journal_try_to_free_buffers+0x76/0x10c

but task is already holding lock:
(inode_lock){--..}, at: [] drop_pagecache+0x48/0xd8

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (inode_lock){--..}:
[] __lock_acquire+0xa31/0xc1a
[] lock_acquire+0x7a/0x94
[] _spin_lock+0x2e/0x58
[] __mark_inode_dirty+0xd8/0x15e
[] __set_page_dirty+0xfb/0x10a
[] mark_buffer_dirty+0x80/0x86
[] __journal_temp_unlink_buffer+0xc1/0xc5
[] __journal_unfile_buffer+0xb/0x15
[] __journal_refile_buffer+0x3b/0x85
[] journal_commit_transaction+0xe7f/0x10ec
[] kjournald+0x131/0x35f
[] kthread+0x3b/0x62
[] kernel_thread_helper+0x7/0x10
[] 0x

-> #0 (>j_list_lock){--..}:
[] __lock_acquire+0x921/0xc1a
[] lock_acquire+0x7a/0x94
[] _spin_lock+0x2e/0x58
[] journal_try_to_free_buffers+0x76/0x10c
[] ext3_releasepage+0x68/0x74
[] try_to_release_page+0x33/0x44
[] __invalidate_mapping_pages+0x74/0xe0
[] drop_pagecache+0x70/0xd8
[] drop_caches_sysctl_handler+0x36/0x4e
[] proc_sys_write+0x6b/0x85
[] vfs_write+0x90/0x119
[] sys_write+0x3d/0x61
[] sysenter_past_esp+0x5f/0xa5
[] 0x

other info that might help us debug this:

2 locks held by bash/2294:
#0:  (>s_umount_key#16){}, at: [] drop_pagecache+0x38/0xd8
#1:  (inode_lock){--..}, at: [] drop_pagecache+0x48/0xd8

stack backtrace:
[] show_trace_log_lvl+0x1a/0x2f
[] show_trace+0x12/0x14
[] dump_stack+0x16/0x18
[] print_circular_bug_tail+0x5f/0x68
[] __lock_acquire+0x921/0xc1a
[] lock_acquire+0x7a/0x94
[] _spin_lock+0x2e/0x58
[] journal_try_to_free_buffers+0x76/0x10c
[] ext3_releasepage+0x68/0x74
[] try_to_release_page+0x33/0x44
[] __invalidate_mapping_pages+0x74/0xe0
[] drop_pagecache+0x70/0xd8
[] drop_caches_sysctl_handler+0x36/0x4e
[] proc_sys_write+0x6b/0x85
[] vfs_write+0x90/0x119
[] sys_write+0x3d/0x61
[] sysenter_past_esp+0x5f/0xa5
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-29 Thread Kamalesh Babulal
Andrew Morton wrote:
> On Thu, 29 Nov 2007 23:00:47 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
>> On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <[EMAIL PROTECTED]> wrote:
>>
>>> On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
 ten million is close enough to infinity for me to assume that we broke the
 driver and that's never going to terminate.

>>> how about this? doesn't break things on my pa8800:
>>>
>>> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c 
>>> b/drivers/scsi/sym53c8xx_2/sym_hipd.c
>>> index 463f119..ef01cb1 100644
>>> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
>>> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
>>> @@ -1037,10 +1037,13 @@ restart_test:
>>> /*
>>>  *  Wait 'til done (with timeout)
>>>  */
>>> -   for (i=0; i>> +   do {
>>> if (INB(np, nc_istat) & (INTF|SIP|DIP))
>>> break;
>>> -   if (i>=SYM_SNOOP_TIMEOUT) {
>>> +   msleep(10);
>>> +   } while (i++ < SYM_SNOOP_TIMEOUT);
>>> +
>>> +   if (i >= SYM_SNOOP_TIMEOUT) {
>>> printf ("CACHE TEST FAILED: timeout.\n");
>>> return (0x20);
>>> }
>>> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h 
>>> b/drivers/scsi/sym53c8xx_2/sym_hipd.h
>>> index ad07880..85c483b 100644
>>> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
>>> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
>>> @@ -339,7 +339,7 @@
>>>  /*
>>>   *  Misc.
>>>   */
>>> -#define SYM_SNOOP_TIMEOUT (1000)
>>> +#define SYM_SNOOP_TIMEOUT (1000)
>>>  #define BUS_8_BIT  0
>>>  #define BUS_16_BIT 1
>>>  
>> That might be the fix, but do we know what we're actually fixing?  afaik
>> 2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we
>> don't know why?
>>
> 
> 
> 
> 
> 
> So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not?

Yes, the 2.6.24-rc3 was Ok and this is seen from 2.6.24-rc3-git2/3/4.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xfs: revert to double-buffering readdir

2007-11-29 Thread Timothy Shimmin

Christoph Hellwig wrote:

The current readdir implementation deadlocks on a btree buffers locks
because nfsd calls back into ->lookup from the filldir callback.  The
only short-term fix for this is to revert to the old inefficient
double-buffering scheme.



Probably why Steve did this: :)

xfs_file.c

revision 1.40
date: 2001/03/15 23:33:20;  author: lord;  state: Exp;  lines: +54 -17
modid: 2.4.x-xfs:slinx:90125a
Change linvfs_readdir to allocate a buffer, call xfs to fill it, and
then call the filldir function on each entry. This is instead of doing the
filldir deep in the bowels of xfs which causes locking problems.



Yes it looks like it is done equivalently to before (minus the uio stuff etc).
I don't know what the 7fff* masking is about but we did that previously.
I hadn't come across the name[] struct field before,
was used to name[0] (or name[1] in times gone by) but found that is
a kosher way of doing things too for the variable len string at the end.

Hmmm, don't see the point of "eof" local var now.
Previously bhv_vop_readdir() returned eof.
I presume if we don't move the offset (offset == startoffset) then
we're done and break out?
So we lost eof when going to the filldir in the getdents code etc...

--Tim


This patch does exactly that and reverts xfs_file_readdir to what's
basically the 2.6.23 version minus the uio and vnops junk.

I'll try to find something more optimal for 2.6.25 or at least find a
way to use the proper version for local access.


Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

Index: linux-2.6/fs/xfs/linux-2.6/xfs_file.c
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_file.c  2007-11-25 11:41:20.0 
+0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c   2007-11-25 17:14:27.0 
+0100
@@ -218,6 +218,15 @@
 }
 #endif /* CONFIG_XFS_DMAPI */
 
+/*

+ * Unfortunately we can't just use the clean and simple readdir implementation
+ * below, because nfs might call back into ->lookup from the filldir callback
+ * and that will deadlock the low-level btree code.
+ *
+ * Hopefully we'll find a better workaround that allows to use the optimal
+ * version at least for local readdirs for 2.6.25.
+ */
+#if 0
 STATIC int
 xfs_file_readdir(
struct file *filp,
@@ -249,6 +258,121 @@
return -error;
return 0;
 }
+#else
+
+struct hack_dirent {
+   int namlen;
+   loff_t  offset;
+   u64 ino;
+   unsigned intd_type;
+   charname[];
+};
+
+struct hack_callback {
+   char*dirent;
+   size_t  len;
+   size_t  used;
+};
+
+STATIC int
+xfs_hack_filldir(
+   void*__buf,
+   const char  *name,
+   int namlen,
+   loff_t  offset,
+   u64 ino,
+   unsigned intd_type)
+{
+   struct hack_callback *buf = __buf;
+   struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + 
buf->used);
+
+   if (buf->used + sizeof(struct hack_dirent) + namlen > buf->len)
+   return -EINVAL;
+
+   de->namlen = namlen;
+   de->offset = offset;
+   de->ino = ino;
+   de->d_type = d_type;
+   memcpy(de->name, name, namlen);
+   buf->used += sizeof(struct hack_dirent) + namlen;
+   return 0;
+}
+
+STATIC int
+xfs_file_readdir(
+   struct file *filp,
+   void*dirent,
+   filldir_t   filldir)
+{
+   struct inode*inode = filp->f_path.dentry->d_inode;
+   xfs_inode_t *ip = XFS_I(inode);
+   struct hack_callback buf;
+   struct hack_dirent *de;
+   int error;
+   loff_t  size;
+   int eof = 0;
+   xfs_off_t   start_offset, curr_offset, offset;
+
+   /*
+* Try fairly hard to get memory
+*/
+   buf.len = PAGE_CACHE_SIZE;
+   do {
+   buf.dirent = kmalloc(buf.len, GFP_KERNEL);
+   if (buf.dirent)
+   break;
+   buf.len >>= 1;
+   } while (buf.len >= 1024);
+
+   if (!buf.dirent)
+   return -ENOMEM;
+
+   curr_offset = filp->f_pos;
+   if (curr_offset == 0x7fff)
+   offset = 0x;
+   else
+   offset = filp->f_pos;
+
+   while (!eof) {
+   int reclen;
+   start_offset = offset;
+
+   buf.used = 0;
+   error = -xfs_readdir(ip, , buf.len, ,
+xfs_hack_filldir);
+   if (error || offset == start_offset) {
+   size = 0;
+   break;
+   }
+
+   size = buf.used;
+   de = (struct hack_dirent *)buf.dirent;
+   while (size > 0) {
+   if (filldir(dirent, de->name, de->namlen,
+

Re: [PATCH] Documentation/Changes -> Documentation/Requirements (resend without truncated comment text)

2007-11-29 Thread Jarek Poplawski
On 30-11-2007 04:32, H. Peter Anvin wrote:
...
> As far as I can tell, Documentation/Changes is the only thing we have
> that even attempts to document the basic requirements.  This attempts
> to formalize that fact.
> 
>  Documentation/Changes  |  396 
> 
>  Documentation/Requirements |  394 +++

...But, there are a few more 'things', which mention Documentation/Changes.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: constant_tsc and TSC unstable

2007-11-29 Thread H. Peter Anvin

Paul Rolland (ポール・ロラン) wrote:



Note that once TSC is disabled (it's using "jiffies" as far
as I can see), ntpd constantly speeds up and slows down the
clock, it jumps +/- 0.5sec every several minutes or hours -
I guess that's when ntpd process gets moved from one core
to another for whatever reason.  And an interesting thing
is that with 64bits kernel this TSC problem does not occur
on this very machine.

H That could make it a problem related to kernel rather than CPU.
 

Something similar is reported on AMD X2 64 machines as well --
can't check right now.

If I recall correctly, issues with AMD X2 where related to TSC being
independant for each core and not constant (speed depending of C state).
But the reason I raise the issue is that the Core2 reports constant TSC,
so there is (IMHO) no reason for that.



Well, "constant" doesn't mean "synchronized", but it might very well be 
that the Core2 could really benefit from synchronizing the TSCs manually 
like we used to.


On the other hand, I notice that most of the TSC warp values are 
relatively close to 2^32, so this could be a specific bug.


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-29 Thread Andrew Morton
On Thu, 29 Nov 2007 23:00:47 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
> > > ten million is close enough to infinity for me to assume that we broke the
> > > driver and that's never going to terminate.
> > > 
> > 
> > how about this? doesn't break things on my pa8800:
> > 
> > diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c 
> > b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> > index 463f119..ef01cb1 100644
> > --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
> > +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> > @@ -1037,10 +1037,13 @@ restart_test:
> > /*
> >  *  Wait 'til done (with timeout)
> >  */
> > -   for (i=0; i > +   do {
> > if (INB(np, nc_istat) & (INTF|SIP|DIP))
> > break;
> > -   if (i>=SYM_SNOOP_TIMEOUT) {
> > +   msleep(10);
> > +   } while (i++ < SYM_SNOOP_TIMEOUT);
> > +
> > +   if (i >= SYM_SNOOP_TIMEOUT) {
> > printf ("CACHE TEST FAILED: timeout.\n");
> > return (0x20);
> > }
> > diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h 
> > b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> > index ad07880..85c483b 100644
> > --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
> > +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> > @@ -339,7 +339,7 @@
> >  /*
> >   *  Misc.
> >   */
> > -#define SYM_SNOOP_TIMEOUT (1000)
> > +#define SYM_SNOOP_TIMEOUT (1000)
> >  #define BUS_8_BIT  0
> >  #define BUS_16_BIT 1
> >  
> 
> That might be the fix, but do we know what we're actually fixing?  afaik
> 2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we
> don't know why?
> 





So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-29 Thread Andrew Morton
On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <[EMAIL PROTECTED]> wrote:

> On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
> > ten million is close enough to infinity for me to assume that we broke the
> > driver and that's never going to terminate.
> > 
> 
> how about this? doesn't break things on my pa8800:
> 
> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c 
> b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> index 463f119..ef01cb1 100644
> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> @@ -1037,10 +1037,13 @@ restart_test:
>   /*
>*  Wait 'til done (with timeout)
>*/
> - for (i=0; i + do {
>   if (INB(np, nc_istat) & (INTF|SIP|DIP))
>   break;
> - if (i>=SYM_SNOOP_TIMEOUT) {
> + msleep(10);
> + } while (i++ < SYM_SNOOP_TIMEOUT);
> +
> + if (i >= SYM_SNOOP_TIMEOUT) {
>   printf ("CACHE TEST FAILED: timeout.\n");
>   return (0x20);
>   }
> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h 
> b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> index ad07880..85c483b 100644
> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> @@ -339,7 +339,7 @@
>  /*
>   *  Misc.
>   */
> -#define SYM_SNOOP_TIMEOUT (1000)
> +#define SYM_SNOOP_TIMEOUT (1000)
>  #define BUS_8_BIT0
>  #define BUS_16_BIT   1
>  

That might be the fix, but do we know what we're actually fixing?  afaik
2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we
don't know why?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: constant_tsc and TSC unstable

2007-11-29 Thread ポール・ ロラン
Hello,

On Fri, 30 Nov 2007 00:26:47 +0300
Michael Tokarev <[EMAIL PROTECTED]> wrote:

> H. Peter Anvin wrote:
> > Paul Rolland (ポール・ロラン) wrote:
> []
> >> Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock.
> >> Marking TSC unstable due to: check_tsc_sync_source failed.
> []
> >> but I was wondering if this is a bug or a feature ;)
> 
> > The problem you're having is that the TSCs of your two cores are
> > completely different, over a second apart.  This is a bug, unrelated to
> > constant_tsc.
> 
> A bug in where - in the CPU or in kernel?
Good question !
 
> The thing is that all our dual-core machines shows something like
> that.
> 
> (not that huge difference as Paul reported, but still "unstable".
> The same happens with 2.6.23)
I've been checking my logs, and the difference is quite constant and
huge :
[EMAIL PROTECTED] log]# grep 'cycles TSC warp' messages*
messages:Nov 26 08:27:56 tux kernel: Measured 4078687691 cycles TSC warp 
between C
PUs, turning off TSC clock.
messages:Nov 26 17:21:21 tux kernel: Measured 3978592228 cycles TSC warp 
between C
PUs, turning off TSC clock.
messages.1:Nov 18 22:52:23 tux kernel: Measured 4063102940 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.1:Nov 19 07:19:02 tux kernel: Measured 4057192061 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.1:Nov 23 20:50:12 tux kernel: Measured 4064589321 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.2:Nov 12 08:06:44 tux kernel: Measured 4072130361 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.2:Nov 13 19:42:47 tux kernel: Measured 4049899451 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.2:Nov 17 09:27:22 tux kernel: Measured 4066629060 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.3:Nov  5 08:25:08 tux kernel: Measured 4086386109 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.3:Nov  8 13:07:08 tux kernel: Measured 4041945934 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.3:Nov  9 23:31:24 tux kernel: Measured 4092303059 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Oct 29 07:28:23 tux kernel: Measured 4096946373 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Oct 31 17:07:21 tux kernel: Measured 4046765372 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Oct 31 17:15:09 tux kernel: Measured 4039328228 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Oct 31 23:19:00 tux kernel: Measured 4069714246 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Nov  1 20:33:02 tux kernel: Measured 4088199726 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Nov  2 11:53:17 tux kernel: Measured 4079927527 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Nov  3 09:37:16 tux kernel: Measured 4071112656 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Nov  3 10:51:29 tux kernel: Measured 3986266219 cycles TSC warp 
between
 CPUs, turning off TSC clock.
messages.4:Nov  4 18:14:56 tux kernel: Measured 4074214144 cycles TSC warp 
between
 CPUs, turning off TSC clock.

> Note that once TSC is disabled (it's using "jiffies" as far
> as I can see), ntpd constantly speeds up and slows down the
> clock, it jumps +/- 0.5sec every several minutes or hours -
> I guess that's when ntpd process gets moved from one core
> to another for whatever reason.  And an interesting thing
> is that with 64bits kernel this TSC problem does not occur
> on this very machine.
H That could make it a problem related to kernel rather than CPU.
 
> Something similar is reported on AMD X2 64 machines as well --
> can't check right now.
If I recall correctly, issues with AMD X2 where related to TSC being
independant for each core and not constant (speed depending of C state).
But the reason I raise the issue is that the Core2 reports constant TSC,
so there is (IMHO) no reason for that.

Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: constant_tsc and TSC unstable

2007-11-29 Thread Paul Rolland (ポール・ロラン)
Hello,

On Thu, 29 Nov 2007 15:29:49 -0800
"Pallipadi, Venkatesh" <[EMAIL PROTECTED]> wrote:



> TSCs on Core 2 Duo are supposed to be in sync unless CPU supports deep idle
> states like C2, C3. Can you send the full /proc/cpuinfo and full dmesg.
> 
Sure I can...
[EMAIL PROTECTED] log]# cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 CPU T5300  @ 1.73GHz
stepping: 2
cpu MHz : 800.000
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat 
ps
e36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
arch_perfmo
n pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
bogomips: 3461.13
clflush size: 64

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 CPU T5300  @ 1.73GHz
stepping: 2
cpu MHz : 800.000
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat 
ps
e36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
arch_perfmo
n pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
bogomips: 3458.02
clflush size: 64

Regards,
Paul

dmesg
Description: Binary data


[patch 3/3] x86_64: Make the x86_32 percpu operations usable on x86_64

2007-11-29 Thread Christoph Lameter
Relocate the x86_64 percpu variables to begin at zero. Then
we can directly use the x86_32 percpu operations. x86_32
offsets %fs by __per_cpu_start. x86_64 has %gs pointing
directly to the pda and the per cpu area if they start at zero.

Access to the pda with the x86_64 pda operations is still
possible in addition to access to the per cpu variables
using x86_32 percpu operations.

Hopefully this is helpful for arch integration.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/x86/Kconfig |5 +
 arch/x86/kernel/setup64.c|4 ++--
 arch/x86/kernel/vmlinux_64.lds.S |1 +
 include/asm-x86/percpu.h |   12 +++-
 4 files changed, 19 insertions(+), 3 deletions(-)

Index: linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/percpu.h  2007-11-29 
22:13:54.806575787 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h   2007-11-29 
22:21:42.383571603 -0800
@@ -17,6 +17,12 @@
 
 #define per_cpu_offset(x) (__per_cpu_offset(x))
 
+#define __percpu_seg "%%gs:"
+
+#else
+
+#define __percpu_seg ""
+
 #endif
 #include 
 
@@ -81,6 +87,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
 /* We can use this directly for local CPU (faster). */
 DECLARE_PER_CPU(unsigned long, this_cpu_off);
 
+#endif /* __ASSEMBLY__ */
+#endif /* !CONFIG_X86_64 */
+
+#ifndef __ASSEMBLY__
+
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -138,5 +149,4 @@ extern void __bad_percpu_size(void);
 #define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
 #define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)
 #endif /* !__ASSEMBLY__ */
-#endif /* !CONFIG_X86_64 */
 #endif /* _ASM_X86_PERCPU_H_ */
Index: linux-2.6.24-rc3-mm2/arch/x86/Kconfig
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/Kconfig  2007-11-29 22:05:39.003576212 
-0800
+++ linux-2.6.24-rc3-mm2/arch/x86/Kconfig   2007-11-29 22:12:53.942575452 
-0800
@@ -123,6 +123,11 @@ config GENERIC_TIME_VSYSCALL
 config ARCH_SETS_UP_PER_CPU_AREA
def_bool X86_64
 
+config PERCPU_ZERO_BASED
+   bool
+   depends on X86_64 && SMP
+   default y
+
 config ZONE_DMA32
bool
default X86_64
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-29 
22:12:08.962826086 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c  2007-11-29 
22:12:53.942575452 -0800
@@ -111,11 +111,11 @@ void __init setup_per_cpu_areas(void)
}
if (!ptr)
panic("Cannot allocate cpu data for CPU %d\n", i);
-   memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+   memcpy(ptr, __per_cpu_load, __per_cpu_size);
/* Relocate the pda */
memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda));
cpu_pda(i) = (struct x8664_pda *)ptr;
-   cpu_pda(i)->data_offset = ptr - __per_cpu_start;
+   cpu_pda(i)->data_offset = (unsigned long)ptr;
}
/* Fix up pda for this processor  */
pda_init(0);
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/vmlinux_64.lds.S
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/vmlinux_64.lds.S  2007-11-29 
22:05:38.987576338 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/vmlinux_64.lds.S   2007-11-29 
22:12:53.930825752 -0800
@@ -16,6 +16,7 @@ jiffies_64 = jiffies;
 _proxy_pda = 1;
 PHDRS {
text PT_LOAD FLAGS(5);  /* R_E */
+   percpu PT_LOAD FLAGS(4);/* R__ */
data PT_LOAD FLAGS(7);  /* RWE */
user PT_LOAD FLAGS(7);  /* RWE */
data.init PT_LOAD FLAGS(7); /* RWE */

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/3] X86_64: Declare pda as per cpu data thereby moving it into the cpu area

2007-11-29 Thread Christoph Lameter
Declare the pda as a per cpu variable. This will have the effect of moving
the pda data into the cpu area managed by cpu alloc.

The boot_pdas are only needed in head64.c so move the declaration
over there and make it static.

Remove the code that allocates special pda data structures.

The pda is moved to the beginning of the per cpu area. gs is pointing to the
pda. And therefore gs: is now pointing to the per cpu area of the current
processor. A per cpu variable can then be reached at

%gs:[_cpu_ - __per_cpu_start]

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/x86/kernel/head64.c  |6 ++
 arch/x86/kernel/setup64.c |   13 ++---
 arch/x86/kernel/smpboot_64.c  |   16 
 include/asm-generic/vmlinux.lds.h |1 +
 include/asm-x86/pda.h |1 -
 include/linux/percpu.h|4 
 6 files changed, 21 insertions(+), 20 deletions(-)

Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-28 
20:59:13.124188194 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c  2007-11-28 
21:08:50.473347382 -0800
@@ -30,7 +30,9 @@ cpumask_t cpu_initialized __cpuinitdata 
 
 struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
-struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
+DEFINE_PER_CPU_FIRST(struct x8664_pda, pda);
+EXPORT_PER_CPU_SYMBOL(pda);
 
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
@@ -109,10 +111,15 @@ void __init setup_per_cpu_areas(void)
}
if (!ptr)
panic("Cannot allocate cpu data for CPU %d\n", i);
-   cpu_pda(i)->data_offset = ptr - __per_cpu_start;
memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+   /* Relocate the pda */
+   memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda));
+   cpu_pda(i) = (struct x8664_pda *)ptr;
+   cpu_pda(i)->data_offset = ptr - __per_cpu_start;
}
-} 
+   /* Fix up pda for this processor  */
+   pda_init(0);
+}
 
 void pda_init(int cpu)
 { 
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/smpboot_64.c  2007-11-28 
20:59:13.136188167 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c   2007-11-28 
20:59:35.399937395 -0800
@@ -556,22 +556,6 @@ static int __cpuinit do_boot_cpu(int cpu
return -1;
}
 
-   /* Allocate node local memory for AP pdas */
-   if (cpu_pda(cpu) == _cpu_pda[cpu]) {
-   struct x8664_pda *newpda, *pda;
-   int node = cpu_to_node(cpu);
-   pda = cpu_pda(cpu);
-   newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC,
- node);
-   if (newpda) {
-   memcpy(newpda, pda, sizeof (struct x8664_pda));
-   cpu_pda(cpu) = newpda;
-   } else
-   printk(KERN_ERR
-   "Could not allocate node local PDA for CPU %d on node %d\n",
-   cpu, node);
-   }
-
alternatives_smp_switch(1);
 
c_idle.idle = get_idle_for_cpu(cpu);
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/head64.c  2007-11-28 
20:59:13.152187359 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c   2007-11-28 
20:59:35.403937534 -0800
@@ -22,6 +22,12 @@
 #include 
 #include 
 
+/*
+ * Only used before the per cpu areas are setup. The use for the non possible
+ * cpus continues after boot
+ */
+static struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
 static void __init zap_identity_mappings(void)
 {
pgd_t *pgd = pgd_offset_k(0UL);
Index: linux-2.6.24-rc3-mm2/include/asm-x86/pda.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/pda.h 2007-11-28 
20:59:13.164187921 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/pda.h  2007-11-28 20:59:35.403937534 
-0800
@@ -39,7 +39,6 @@ struct x8664_pda {
 } cacheline_aligned_in_smp;
 
 extern struct x8664_pda *_cpu_pda[];
-extern struct x8664_pda boot_cpu_pda[];
 extern void pda_init(int);
 
 #define cpu_pda(i) (_cpu_pda[i])
Index: linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-generic/vmlinux.lds.h 2007-11-28 
20:59:13.176187886 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h  2007-11-28 
20:59:35.403937534 -0800
@@ -259,6 +259,7 @@
. = ALIGN(align);   

[patch 1/3] Percpu infrastructure to rebase the per cpu area to 0UL

2007-11-29 Thread Christoph Lameter
Support an option

CONFIG_PERCPU_ZERO_BASED

that makes offsets for per cpu variables start at zero.

If a percpu area starts at zero then

1. We do not need RELOC_HIDE anymore

2. Indexes off the per cpu area for each processor are small

3. The percpu area "addresses" are offsets and we can then
   have allocpercpu/cpu_alloc in the future also use these
   offsets so that percpu functions can take any type of
   percpu address if it is provided by a percpu variable
   or a pointer obtained via allocpercpu/cpu_alloc.

The linker area boundaries variables are different for zero based
percpu segments:

__per_cpu_load  -> The address at which the percpu area was loaded
__per_cpu_size  -> The length of the per cpu area


Removes the &__per_cpu_x in lockdep. AFAICT The __per_cpu_x are already
pointers.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/asm-generic/percpu.h  |7 ++-
 include/asm-generic/sections.h|   10 ++
 include/asm-generic/vmlinux.lds.h |   15 +++
 init/main.c   |   17 +
 kernel/lockdep.c  |4 ++--
 5 files changed, 42 insertions(+), 11 deletions(-)

Index: linux-2.6.24-rc3-mm2/include/asm-generic/percpu.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-generic/percpu.h  2007-11-29 
22:05:58.359576450 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-generic/percpu.h   2007-11-29 
22:06:22.750825804 -0800
@@ -42,8 +42,13 @@ extern unsigned long __per_cpu_offset[NR
  * Only S390 provides its own means of moving the pointer.
  */
 #ifndef SHIFT_PTR
+#ifdef CONFIG_PERCPU_ZERO_BASED
+#define SHIFT_PTR(__p, __offset) \
+   ((__typeof(__p))(((void *)(__p)) + (__offset)))
+#else
 #define SHIFT_PTR(__p, __offset)   RELOC_HIDE((__p), (__offset))
-#endif
+#endif /* CONFIG_PER_CPU_ZERO_BASED */
+#endif /* SHIFT_PTR */
 
 /*
  * A percpu variable may point to a discarded reghions. The following are
Index: linux-2.6.24-rc3-mm2/include/asm-generic/sections.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-generic/sections.h2007-11-29 
22:05:58.367576240 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-generic/sections.h 2007-11-29 
22:06:22.754826440 -0800
@@ -9,7 +9,17 @@ extern char __bss_start[], __bss_stop[];
 extern char __init_begin[], __init_end[];
 extern char _sinittext[], _einittext[];
 extern char _end[];
+#ifdef CONFIG_PERCPU_ZERO_BASED
+extern char __per_cpu_load[];
+extern char per_cpu_size[];
+#define __per_cpu_size ((unsigned long)&per_cpu_size)
+#define __per_cpu_start ((char *)0)
+#define __per_cpu_end ((char *)__per_cpu_size)
+#else
 extern char __per_cpu_start[], __per_cpu_end[];
+#define __per_cpu_load __per_cpu_start
+#define __per_cpu_size (__per_cpu_end - __per_cpu_start)
+#endif
 extern char __kprobes_text_start[], __kprobes_text_end[];
 extern char __initdata_begin[], __initdata_end[];
 extern char __start_rodata[], __end_rodata[];
Index: linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-generic/vmlinux.lds.h 2007-11-29 
22:06:03.486826118 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-generic/vmlinux.lds.h  2007-11-29 
22:06:22.754826440 -0800
@@ -255,6 +255,20 @@
*(.initcall7.init)  \
*(.initcall7s.init)
 
+#ifdef CONFIG_PERCPU_ZERO_BASED
+#define PERCPU(align)  \
+   . = ALIGN(align);   \
+   percpu : { } :percpu\
+   __per_cpu_load = .; \
+   .data.percpu 0 : AT(__per_cpu_load - LOAD_OFFSET) { \
+   *(.data.percpu.first)   \
+   *(.data.percpu) \
+   *(.data.percpu.shared_aligned)  \
+   per_cpu_size = .;   \
+   }   \
+   . = __per_cpu_load + per_cpu_size;  \
+   data : { } :data
+#else
 #define PERCPU(align)  \
. = ALIGN(align);   \
__per_cpu_start = .;\
@@ -263,3 +277,4 @@
*(.data.percpu.shared_aligned)  \
}   \
__per_cpu_end = .;
+#endif
Index: linux-2.6.24-rc3-mm2/init/main.c
===
--- linux-2.6.24-rc3-mm2.orig/init/main.c   2007-11-29 

[patch 0/3] Per cpu relocation to ZERO and x86_32 percpu ops on x86_64

2007-11-29 Thread Christoph Lameter
This patchset allows the use of x86_32 percpu ops on x86_64 while maintaining
%gs pointing to the pda. It does that by moving the x86_64 pda into
the percpu area (thereby pointing %gs at the per cpu area) and then
relocating the x86_64 per cpu variables to start at 0.

Patch applies on top of the per cpu cleanup patches V2.
See http://marc.info/?l=linux-kernel=119628478316525=2

Ultimately I think we can make the per cpu accessors arch independent
(see the RFC at http://marc.info/?l=linux-kernel=119552126330405=2).
There is a performance benefit from using these in core code.

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Sample kset/ktype/kobject implementation

2007-11-29 Thread Greg KH
On Thu, Nov 29, 2007 at 05:11:35PM -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Greg KH wrote:
> 
> > > > > kobject_put(foo) is needed since it gets you through kobject_cleanup()
> > > > > where the name can be freed.
> > > > 
> > > > No, kobject_register() should have handled that for us, right?
> > > 
> > > kobject_register() doesn't do a kobject_put() if kobject_add() failed.
> > 
> > Crap.  If I can't get this code right in an example, the API is messed
> > up.  Time to take Kay seriously and start to revamp the basic kobject
> > api :)
> 
> The rule is simple enough.  After calling kobject_register() you should 
> always use kobject_put() -- even if kobject_register() failed.

Yes.

> In fact, after calling kobject_init() you should use kobject_put().  
> The first rule follows from this one, since kobject_register() calls 
> kobject_init() internally.

Yes, that makes sense, time to write it all down :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pnpacpi : exceeded the max number of IO resources

2007-11-29 Thread Valdis . Kletnieks
On Fri, 30 Nov 2007 10:21:28 +0800, Zhao Yakui said:
> Thanks for the acpidump & dmesg.
>   In the acpidump there are so many IO resource definitions in the device
> of mem2 and the number exceeds the predefined number(24).

On a semi-related note, I'm seeing 7 of these at each boot on a Dell Latitude 
D820:

pnpacpi: exceeded the max number of mem resources: 12

2.6.24-rc3-mm2 does it, it didn't do it for 2.6.23-mm1.

pnp-increase-the-maximum-number-of-resources.patch raised it from 4 to 12, but
I don't understand why it didn't complain at 4 in 23-mm1, but it does at 12 now.




pgpH0YcKmbnsZ.pgp
Description: PGP signature


Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-29 Thread Kyle McMartin
On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
> ten million is close enough to infinity for me to assume that we broke the
> driver and that's never going to terminate.
> 

how about this? doesn't break things on my pa8800:

diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c 
b/drivers/scsi/sym53c8xx_2/sym_hipd.c
index 463f119..ef01cb1 100644
--- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
+++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
@@ -1037,10 +1037,13 @@ restart_test:
/*
 *  Wait 'til done (with timeout)
 */
-   for (i=0; i=SYM_SNOOP_TIMEOUT) {
+   msleep(10);
+   } while (i++ < SYM_SNOOP_TIMEOUT);
+
+   if (i >= SYM_SNOOP_TIMEOUT) {
printf ("CACHE TEST FAILED: timeout.\n");
return (0x20);
}
diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h 
b/drivers/scsi/sym53c8xx_2/sym_hipd.h
index ad07880..85c483b 100644
--- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
+++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
@@ -339,7 +339,7 @@
 /*
  *  Misc.
  */
-#define SYM_SNOOP_TIMEOUT (1000)
+#define SYM_SNOOP_TIMEOUT (1000)
 #define BUS_8_BIT  0
 #define BUS_16_BIT 1
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-29 Thread Valdis . Kletnieks
On Thu, 29 Nov 2007 18:34:33 EST, Jon Masters said:
> 
> On Thu, 2007-11-29 at 21:45 +, Alan Cox wrote:
> > > Jargon File in all its glory. And if you still think you could look for
> > > patterns, how about executable code that self-modifies in random ways
> > > but when executed as a whole actually has the functionality of fetchmail
> > > embedded within it? How would you guard against that?
> > 
> > Thats a problem for whoever writes the ESR detection tool and to what
> > level it works. The question for the kernel is how do we provide a
> > mechanism to allow (to some extent at least) this kind of tool to run.
> 
> Right. I'm just saying reading a single page out of context (no pun
> intended) is not going to be very useful. 

Fortunately for all concerned, although Alan's self-modifying code is indeed a
possibility, it's much less of an issue than the sort of malware that can be
found with a simple "find this 27-byte sequence, which will be found in either
block 36 or 37 of the file".

And I'll make the prediction that we won't see anything doing the sorts of
things that Alan's program does, until that's the *easiest* way to get into
a system.  Until that time, they're either going to be sending simpler stuff
that a scanner can easily template and find, or using other means of attacks
that are outside the scope of a scanner.

Remember guys - we want to think about *realistic* threat models.  The e-mail
virus scanners we use catch hundreds to thousands of known viruses *every day*.
But I can count on the fingers of both hands the number of times I've had to
deal with a *real* "0-day" in a quarter century.  The scanner doesn't have to
be perfect - it just has to make it hard enough to bypass to render it
economically infeasible.  If you're targeted by a military/govt/political/
religious group that doesn't *care* if it's economically viable, you have
other, bigger problems to deal with...



pgpaezS6lQXPW.pgp
Description: PGP signature


Re: [PATCH] [RESEND] crypto test: use print_hex_dump from kernel.h instead

2007-11-29 Thread Herbert Xu
On Fri, Nov 30, 2007 at 09:20:34AM +0800, rae l wrote:
> 
> Cc: Randy Dunlap <[EMAIL PROTECTED]>
> Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>

Patch applied.  Thanks a lot Denis!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Sample kset/ktype/kobject implementation

2007-11-29 Thread Dave Young
On Fri, Nov 30, 2007 at 01:07:37PM +0800, Dave Young wrote:
> On Nov 30, 2007 6:11 AM, Alan Stern <[EMAIL PROTECTED]> wrote:
> > On Thu, 29 Nov 2007, Greg KH wrote:
> >
> > > > > > kobject_put(foo) is needed since it gets you through 
> > > > > > kobject_cleanup()
> > > > > > where the name can be freed.
> > > > >
> > > > > No, kobject_register() should have handled that for us, right?
> > > >
> > > > kobject_register() doesn't do a kobject_put() if kobject_add() failed.
> > >
> > > Crap.  If I can't get this code right in an example, the API is messed
> > > up.  Time to take Kay seriously and start to revamp the basic kobject
> > > api :)
> >
> > The rule is simple enough.  After calling kobject_register() you should
> > always use kobject_put() -- even if kobject_register() failed.
> >
> > In fact, after calling kobject_init() you should use kobject_put().
> > The first rule follows from this one, since kobject_register() calls
> > kobject_init() internally.
> >
> Hi,
> The behavior is not very clear here, the root problem is that :
> 
> 1. Should we call kobject_put so cleanup work can be done by refcount
> touch zero or call kfree every time after kobject_register failed?
> 
> 2. If kobject_put calling is true, should this be done in
> kobject_register error handling codes or by hand after
> kobject_register failed?
> 
IMO,I'd rather select kobject_put due to the kobj name should also be released.
After searching for kobject_register, I found one leaks as this issue in 
pktcdvd.

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
drivers/block/pktcdvd.c |4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff -upr linux/drivers/block/pktcdvd.c linux.new/drivers/block/pktcdvd.c
--- linux/drivers/block/pktcdvd.c   2007-11-30 13:13:44.0 +0800
+++ linux.new/drivers/block/pktcdvd.c   2007-11-30 13:24:08.0 +0800
@@ -117,8 +117,10 @@ static struct pktcdvd_kobj* pkt_kobj_cre
p->kobj.parent = parent;
p->kobj.ktype = ktype;
p->pd = pd;
-   if (kobject_register(>kobj) != 0)
+   if (kobject_register(>kobj) != 0) {
+   kobject_put(>kobj);
return NULL;
+   }
return p;
 }
 /*
> Regards
> dave
> > Alan Stern
> >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] keyspan: init termios properly

2007-11-29 Thread Borislav Petkov
On Mon, Nov 26, 2007 at 02:18:52PM -0800, Andrew Morton wrote:
> On Sun, 18 Nov 2007 14:11:30 +0100
> Borislav Petkov <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, Nov 15, 2007 at 01:10:16PM -0800, Lucy McCoy wrote:

...

> > yes, after testing this i can confirm that this one fixes the NULL ptr
> > problem here so you might want to submit a proper patch to Greg.
> 
> I'll merge revert-keyspan-init-termios-properly.patch soon, but afaik we
> are still awaiting the real fix for this problem?

Hi Andrew,
sorry for the late reply - i was away from the country and couldn't read mail.
Yes, we are still awaiting the real fix afaik but the code fragment above
removes the NULL ptr deref so we should at least merge that. Will prepare a
patch for this later today...

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc3-mm2 soft lockup while running tbench

2007-11-29 Thread Kamalesh Babulal
Andrew Morton wrote:
> On Wed, 28 Nov 2007 20:03:22 +0530
> Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
>> Hi Andrew,
>>
>> while running tbench on the powerpc with 2.6.24-rc3-mm2 softlock up occurs
>>
>> BUG: soft lockup - CPU#0 stuck for 11s! [tbench:12183]
>> NIP: c00ac978 LR: c00acff0 CTR: c005c648
>> REGS: C0076F0F3200 TRAP: 0901   Not tainted  (2.6.24-rc3-mm2-autotest)
>> MSR: 80009032   CR: 44000482  XER: 
>> TASK = C0076F4BC000[12183] 'tbench' THREAD: C0076F0F CPU: 0
>> NIP [c00ac978] .get_page_from_freelist+0x1cc/0x754
>> LR [c00acff0] .__alloc_pages+0xb0/0x3a8
>> Call Trace:
>> [c0076f0f3480] [c0076f0f3560] 0xc0076f0f3560 (unreliable)
>> [c0076f0f3590] [c00acff0] .__alloc_pages+0xb0/0x3a8
>> [c0076f0f3680] [c00ce2e4] .alloc_pages_current+0xa8/0xc8
>> [c0076f0f3710] [c00ac6ec] .__get_free_pages+0x20/0x70
>> [c0076f0f3790] [c00d75c8] .__kmalloc_node_track_caller+0x60/0x148
>> [c0076f0f3840] [c02c22b0] .__alloc_skb+0x98/0x184
>> [c0076f0f38f0] [c0306cd8] .tcp_sendmsg+0x1fc/0xe24
>> [c0076f0f3a10] [c02b963c] .sock_sendmsg+0xe4/0x128
>> [c0076f0f3c10] [c02ba4ec] .sys_sendto+0xd4/0x120
>> [c0076f0f3d90] [c02df2f8] .compat_sys_socketcall+0x148/0x214
>> [c0076f0f3e30] [c000872c] syscall_exit+0x0/0x40
>> Instruction dump:
>> 720b0001 eb97 40820070 7202 4182000c e8bc 4818 72080004 
>> 4182000c e8bc0008 4808 e8bc0010  7f83e378 7de407b4 7e078378 
>>
> 
> hm.  Beats me.  Does the machine recover OK?
> -
Hi Andrew,

In the set of test cases ran serially, the softlockup in seen in tbench,
then the remaining test cases get to run successfully after the softlockup.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Sample kset/ktype/kobject implementation

2007-11-29 Thread Dave Young
On Nov 30, 2007 6:11 AM, Alan Stern <[EMAIL PROTECTED]> wrote:
> On Thu, 29 Nov 2007, Greg KH wrote:
>
> > > > > kobject_put(foo) is needed since it gets you through kobject_cleanup()
> > > > > where the name can be freed.
> > > >
> > > > No, kobject_register() should have handled that for us, right?
> > >
> > > kobject_register() doesn't do a kobject_put() if kobject_add() failed.
> >
> > Crap.  If I can't get this code right in an example, the API is messed
> > up.  Time to take Kay seriously and start to revamp the basic kobject
> > api :)
>
> The rule is simple enough.  After calling kobject_register() you should
> always use kobject_put() -- even if kobject_register() failed.
>
> In fact, after calling kobject_init() you should use kobject_put().
> The first rule follows from this one, since kobject_register() calls
> kobject_init() internally.
>
Hi,
The behavior is not very clear here, the root problem is that :

1. Should we call kobject_put so cleanup work can be done by refcount
touch zero or call kfree every time after kobject_register failed?

2. If kobject_put calling is true, should this be done in
kobject_register error handling codes or by hand after
kobject_register failed?

Regards
dave
> Alan Stern
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [BUG] USB_PERSIST

2007-11-29 Thread Raymano Garibaldi
On 11/29/07, Alan Stern <[EMAIL PROTECTED]> wrote:
> On Thu, 29 Nov 2007, Raymano Garibaldi wrote:
>
> > The feature does work as long as the device remains plugged in and
> > that is what I have said in my previous postings too. What I'm saying
> > that should work and worked under 2.6.21 and is not working currently
> > is the ability to unplug and plug back in the device while the
> > computer is suspended before resuming without losing the mount.
>
> Okay, guess I misunderstood what you wrote before.
>
> The patch below for 2.6.23 should do what you want (and more besides).
> It forces the USB Persist feature to apply to all persist-enabled
> devices, whether they were unplugged or not.
>
> There's no chance of this getting accepted into the official kernel in
> such a simple form, but at least it will allow you to do what you want.
>
> Alan Stern
>
>
> --- 2.6.23/drivers/usb/core/driver.c1   2007-11-29 10:57:36.0 -0500
> +++ 2.6.23/drivers/usb/core/driver.c2007-11-29 11:01:44.0 -0500
> @@ -1550,6 +1550,9 @@
> if (!(udev->reset_resume && udev->do_remote_wakeup))
> return -EPERM;
> }
> +
> +   /* Force all system resumes to be reset-resumes */
> +   udev->reset_resume = 1;
> return usb_external_resume_device(udev);
>  }
>
>
>

Alan,

Thank you! Thank you! Thank you!

Who'd have thought such a simple patch could make someone so happy?

That did the trick. I just tried it and it works beautifully whether
the device remains plugged in during suspend or if it's unplugged and
plugged back in during suspend and before resume.

Now if this could only become the default behavior ;-)

Thanks again,
Raymano G.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield: delete sysctl_sched_compat_yield

2007-11-29 Thread Zhang, Yanmin
On Fri, 2007-11-30 at 14:29 +1100, Nick Piggin wrote:
> On Friday 30 November 2007 14:15, Zhang, Yanmin wrote:
> > On Fri, 2007-11-30 at 13:46 +1100, Nick Piggin wrote:
> > > On Wednesday 28 November 2007 09:57, Arjan van de Ven wrote:
> 
> > > > sounds like a bad idea; volanomark (well, technically the jvm behind
> > > > it) is abusing sched_yield() by assuming it does something it really
> > > > doesn't do, and as it happens some of the earlier 2.6 schedulers
> > > > accidentally happened to behave in a way that was nice for this
> > > > benchmark.
> > >
> > > OK, why is this still happening? Haven't we been asking JVMs to use
> > > futexes or posix locking for years and years now? Are there any sane
> > > jvms that _don't_ use yield?
> >
> > I think it's an issue of volanomark (a kind of java application) instead of
> > JVM.
> 
> volanomark itself and not the jvm is calling sched_yield()? Do we have
> any non-toy threaded java apps? (what's JAVA in the kernel-perf tests?)
I run lots of well-known benchmarks and volanoMark is the one who gets the 
largest
impact from sched_yield.

As for real-applications which use sched_yield, mostly, they are not open 
sources.
Yesterday, I got to know someone was using sched_yield in his network C 
programs,
but he didn't want to share the sources with me.

> 
> 
> > > > Todays kernel has a different behavior somewhat (and before people
> > > > scream "regression"; sched_yield() behavior isn't really specified and
> > > > doesn't make any sense at all, whatever you get is what you get
> > > > it's pretty much an insane defacto behavior that is incredibly tied to
> > > > which decisions the scheduler makes how, and no app can depend on that
> > >
> > > It is a performance regression. Is there any reason *not* to use the
> > > "compat" yield by default?
> >
> > There is no, so I suggest to set sched_compat_yield=1 by default.
> > If sched_compat_yield=0, kernel almost does nothing but returns. When
> > sched_compat_yield=1, it is closer to the meaning of sched_yield man page.
> 
> sched_yield() is really only defined for posix realtime scheduling
> AFAIK, which talks about priority lists. 
> 
> SCHED_OTHER is defined to be a single priority, below the rest of the
> realtime priorities. So at first you *might* say that the process
> should then be made to run only after all other SCHED_OTHER processes,
> however there is no such ordering requirement for SCHED_OTHER
> scheduling. The SCHED_OTHER scheduler can run any task at any time.
> 
> That said, I think people would *expect* that call be much closer to
> the compat behaviour than the current default. And that's definitely
> what Linux has done in the past. So there really does need to be a
> good reason to change it like this IMO.
That's indeed what I am thinking.

I am running many testing(SPECjbb/SPECjbb2005/cpu2000/iozone/dbench/tbench...) 
to 
see if there is any regression if sched_compat_yield=1. I think there is no
regression and the testing is just to double-check.

> 
> 
> > > As you say, for SCHED_OTHER tasks, yield
> > > can do almost anything. We may as well do something that isn't a
> > > regression...
> >
> > I just found SCHED_OTHER in man sched_setscheduler. Is it SCHED_NORMAL in
> > the latest kernel?
> 
> Yes, SCHED_NORMAL is SCHED_OTHER. Don't know why it got renamed...
Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Avoid overflows in kernel/time.c

2007-11-29 Thread H. Peter Anvin

Arjan van de Ven wrote:


Anyway, I don't think compiling bc is hard on anything which has a C 
compiler.


alternative is to just also ship the precomputed values ;-)



Oh, come on... it's not like bc is some obscure thing.  It's a POSIX 
utility.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Avoid overflows in kernel/time.c

2007-11-29 Thread Arjan van de Ven
On Thu, 29 Nov 2007 19:04:36 -0800
"H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

> Chris Snook wrote:
> > H. Peter Anvin wrote:
> >> NOTE: This patch uses a bc(1) script to compute the appropriate
> >> constants.
> > 
> > Perhaps dc would be more appropriate?  That's included in busybox.
> > 
> 
> Perhaps it would, but I think there is more variability between dc 
> implementations -- consider if the busybox version is broken, for
> eample.
> 
> Either way, how many people compile their kernels in a busybox
> environment?
> 
> Anyway, I don't think compiling bc is hard on anything which has a C 
> compiler.

alternative is to just also ship the precomputed values ;-)

> 
>   -hpa
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RFC - organize include/linux/kernel.h, add include/linux/logging.h

2007-11-29 Thread Joe Perches
2.6.25 material.

kernel.h has become a bit disorganized over a long time.
Here's an attempt to clean it up a bit.

Something for everyone to like or dislike...

Groups externs and functions by module/function
Creates a "logging.h" for printk, KERN_
Changes some macros to statement expressions
DIV_ROUND_UP, roundup and __ALIGN_MASK
Removes the unused PTR_ALIGN
Conforms to coding style and 80 columns
Passes checkpatch but for coding style defects in checkpatch
statement expressions don't need a space between "; and })"
"do {} whiles" between "; and }"

 include/linux/kernel.h  |  458 +--
 include/linux/logging.h |  154 

These files used macros to declare array elements.
Statement expressions can't be used for that,
so these now use direct calculations instead.

 include/linux/bitops.h  |2 +-
 lib/radix-tree.c|5 +-

This one used the ALIGN macro, but I'm not inclined to
figure out what it actually does right now, so copy
the old macro to this file and renames it.

 include/net/neighbour.h |5 +-

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 94bc996..2783ed9 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -1,403 +1,273 @@
 #ifndef _LINUX_KERNEL_H
 #define _LINUX_KERNEL_H
 
 /*
  * 'kernel.h' contains some often-used function prototypes etc
  */
 
 #ifdef __KERNEL__
 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
-extern const char linux_banner[];
-extern const char linux_proc_banner[];
-
+/* could be in an include linux/limits.h */
 #define INT_MAX((int)(~0U>>1))
 #define INT_MIN(-INT_MAX - 1)
 #define UINT_MAX   (~0U)
 #define LONG_MAX   ((long)(~0UL>>1))
 #define LONG_MIN   (-LONG_MAX - 1)
 #define ULONG_MAX  (~0UL)
 #define LLONG_MAX  ((long long)(~0ULL>>1))
 #define LLONG_MIN  (-LLONG_MAX - 1)
 #define ULLONG_MAX (~0ULL)
 
-#define STACK_MAGIC0xdeadbeef
+/* useful macros */
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
+#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
 
-#define ALIGN(x,a) __ALIGN_MASK(x,(typeof(x))(a)-1)
-#define __ALIGN_MASK(x,mask)   (((x)+(mask))&~(mask))
-#define PTR_ALIGN(p, a)((typeof(p))ALIGN((unsigned long)(p), 
(a)))
-#define IS_ALIGNED(x,a)(((x) % ((typeof(x))(a))) == 0)
+/*
+ * Check at compile time that something is of a particular type.
+ * Always evaluates to 1 so you may use it easily in comparisons.
+ */
+#define typecheck(type, x) \
+   ({type _dummy; typeof(x) _dummy2; (void)(&_dummy == &_dummy2); 1;})
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
+/*
+ * Check at compile time that 'function' is a certain type, or is a pointer
+ * to that type (needs to use typedef for the function type.)
+ */
+#define typecheck_fn(type, function)   \
+   ({typeof(type) _x = function; (void)_x;})
 
-#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
-#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
-#define roundup(x, y) x) + ((y) - 1)) / (y)) * (y))
+/**
+ * container_of - cast a member of a structure out to the containing structure
+ * @ptr:   the pointer to the member.
+ * @type:  the type of the container struct this is embedded in.
+ * @member:the name of the member within the struct.
+ *
+ */
+#define container_of(ptr, type, member) ({ \
+   const typeof(((type *)0)->member) *__mptr = (ptr);  \
+   (type *)((char *)__mptr - offsetof(type, member));})
 
-#ifdef CONFIG_LBD
-# include 
-# define sector_div(a, b) do_div(a, b)
-#else
-# define sector_div(n, b)( \
-{ \
-   int _res; \
-   _res = (n) % (b); \
-   (n) /= (b); \
-   _res; \
-} \
-)
-#endif
+/*
+ * min()/max() macros that also do strict type-checking..
+ * See the "unnecessary" pointer comparison.
+ */
+#define min(x, y) ({   \
+   typeof(x) _x = (x); \
+   typeof(y) _y = (y); \
+   (void)(&_x == &_y); \
+   _x < _y ? _x : _y;})
+
+#define max(x, y) ({   \
+   typeof(x) _x = (x); \
+   typeof(y) _y = (y); \
+   (void)(&_x == &_y); \
+   _x > _y ? _x : _y;})
+
+/*
+ * ..and if you can't take the strict
+ * types, you can specify one yourself.
+ *
+ * Or not use min/max at all, of course.
+ */
+#define min_t(type, x, y) \
+   ({type _x = (x); type _y = (y); _x < _y ? _x: _y;})
+
+#define max_t(type, x, y) \
+   ({type _x = (x); type _y = (y); _x > _y ? _x: _y;})
+
+#define abs(x) ({int _x = (x); (_x < 0) ? -_x : _x;})
 
 /**
  * upper_32_bits - return bits 32-63 of a number
  * @n: the number we're accessing
  *
  * A basic shift-right of a 64- or 32-bit quantity.  Use this to suppress
  * the "right shift count >= width of type" warning when that 

[PATCH] Documentation/Changes -> Documentation/Requirements (resend without truncated comment text)

2007-11-29 Thread H. Peter Anvin
Change Documentation/Changes to Documentation/Requirements, and at
least begin to separate the runtime requirements from the kernel
compilation requirements.

There are definitely kernel compilation requirements that are not
listed in this file.  It would be good to get them uncovered.

This document is obviously woefully incomplete, for one thing it has
absolutely no per-architecture information, except "may depend on the
CPU in your system."  Hopefully this will encourage people to document
those per-architecture requirements.

Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
---

As far as I can tell, Documentation/Changes is the only thing we have
that even attempts to document the basic requirements.  This attempts
to formalize that fact.

 Documentation/Changes  |  396 
 Documentation/Requirements |  394 +++
 2 files changed, 394 insertions(+), 396 deletions(-)
 delete mode 100644 Documentation/Changes
 create mode 100644 Documentation/Requirements

diff --git a/Documentation/Changes b/Documentation/Changes
deleted file mode 100644
index cb2b141..000
--- a/Documentation/Changes
+++ /dev/null
@@ -1,396 +0,0 @@
-Intro
-=
-
-This document is designed to provide a list of the minimum levels of
-software necessary to run the 2.6 kernels, as well as provide brief
-instructions regarding any other "Gotchas" users may encounter when
-trying life on the Bleeding Edge.  If upgrading from a pre-2.4.x
-kernel, please consult the Changes file included with 2.4.x kernels for
-additional information; most of that information will not be repeated
-here.  Basically, this document assumes that your system is already
-functional and running at least 2.4.x kernels.
-
-This document is originally based on my "Changes" file for 2.0.x kernels
-and therefore owes credit to the same people as that file (Jared Mauch,
-Axel Boldt, Alessandro Sigala, and countless other users all over the
-'net).
-
-Current Minimal Requirements
-
-
-Upgrade to at *least* these software revisions before thinking you've
-encountered a bug!  If you're unsure what version you're currently
-running, the suggested command should tell you.
-
-Again, keep in mind that this list assumes you are already
-functionally running a Linux 2.4 kernel.  Also, not all tools are
-necessary on all systems; obviously, if you don't have any ISDN
-hardware, for example, you probably needn't concern yourself with
-isdn4k-utils.
-
-o  Gnu C  3.2 # gcc --version
-o  Gnu make   3.79.1  # make --version
-o  binutils   2.12# ld -v
-o  util-linux 2.10o   # fdformat --version
-o  module-init-tools  0.9.10  # depmod -V
-o  e2fsprogs  1.29# tune2fs
-o  jfsutils   1.1.3   # fsck.jfs -V
-o  reiserfsprogs  3.6.3   # reiserfsck -V 2>&1|grep 
reiserfsprogs
-o  xfsprogs   2.6.0   # xfs_db -V
-o  pcmciautils004 # pccardctl -V
-o  quota-tools3.09# quota -V
-o  PPP2.4.0   # pppd --version
-o  isdn4k-utils   3.1pre1 # isdnctrl 2>&1|grep version
-o  nfs-utils  1.0.5   # showmount --version
-o  procps 3.2.0   # ps --version
-o  oprofile   0.9 # oprofiled --version
-o  udev   081 # udevinfo -V
-o  grub   0.93# grub --version
-
-Kernel compilation
-==
-
-GCC

-
-The gcc version requirements may vary depending on the type of CPU in your
-computer.
-
-Make
-
-
-You will need Gnu make 3.79.1 or later to build the kernel.
-
-Binutils
-
-
-Linux on IA-32 has recently switched from using as86 to using gas for
-assembling the 16-bit boot code, removing the need for as86 to compile
-your kernel.  This change does, however, mean that you need a recent
-release of binutils.
-
-System utilities
-
-
-Architectural changes
--
-
-DevFS has been obsoleted in favour of udev
-(http://www.kernel.org/pub/linux/utils/kernel/hotplug/)
-
-32-bit UID support is now in place.  Have fun!
-
-Linux documentation for functions is transitioning to inline
-documentation via specially-formatted comments near their
-definitions in the source.  These comments can be combined with the
-SGML templates in the Documentation/DocBook directory to make DocBook
-files, which can then be converted by DocBook stylesheets to PostScript,
-HTML, PDF files, and several other formats.  In order to convert from
-DocBook format to a format of your choice, you'll need to install Jade as
-well as the desired DocBook 

Re: sched_yield: delete sysctl_sched_compat_yield

2007-11-29 Thread Nick Piggin
On Friday 30 November 2007 14:15, Zhang, Yanmin wrote:
> On Fri, 2007-11-30 at 13:46 +1100, Nick Piggin wrote:
> > On Wednesday 28 November 2007 09:57, Arjan van de Ven wrote:

> > > sounds like a bad idea; volanomark (well, technically the jvm behind
> > > it) is abusing sched_yield() by assuming it does something it really
> > > doesn't do, and as it happens some of the earlier 2.6 schedulers
> > > accidentally happened to behave in a way that was nice for this
> > > benchmark.
> >
> > OK, why is this still happening? Haven't we been asking JVMs to use
> > futexes or posix locking for years and years now? Are there any sane
> > jvms that _don't_ use yield?
>
> I think it's an issue of volanomark (a kind of java application) instead of
> JVM.

volanomark itself and not the jvm is calling sched_yield()? Do we have
any non-toy threaded java apps? (what's JAVA in the kernel-perf tests?)


> > > Todays kernel has a different behavior somewhat (and before people
> > > scream "regression"; sched_yield() behavior isn't really specified and
> > > doesn't make any sense at all, whatever you get is what you get
> > > it's pretty much an insane defacto behavior that is incredibly tied to
> > > which decisions the scheduler makes how, and no app can depend on that
> >
> > It is a performance regression. Is there any reason *not* to use the
> > "compat" yield by default?
>
> There is no, so I suggest to set sched_compat_yield=1 by default.
> If sched_compat_yield=0, kernel almost does nothing but returns. When
> sched_compat_yield=1, it is closer to the meaning of sched_yield man page.

sched_yield() is really only defined for posix realtime scheduling
AFAIK, which talks about priority lists. 

SCHED_OTHER is defined to be a single priority, below the rest of the
realtime priorities. So at first you *might* say that the process
should then be made to run only after all other SCHED_OTHER processes,
however there is no such ordering requirement for SCHED_OTHER
scheduling. The SCHED_OTHER scheduler can run any task at any time.

That said, I think people would *expect* that call be much closer to
the compat behaviour than the current default. And that's definitely
what Linux has done in the past. So there really does need to be a
good reason to change it like this IMO.


> > As you say, for SCHED_OTHER tasks, yield
> > can do almost anything. We may as well do something that isn't a
> > regression...
>
> I just found SCHED_OTHER in man sched_setscheduler. Is it SCHED_NORMAL in
> the latest kernel?

Yes, SCHED_NORMAL is SCHED_OTHER. Don't know why it got renamed...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Documentation/Changes -> Documentation/Requirements

2007-11-29 Thread H. Peter Anvin
Change Documentation/Changes to Documentation/Requirements, and at
least begin to separate the runtime requirements from the kernel
compilation requirements.

There are definitely kernel compilation requirements that are not
listed in this file.  It would be good to get them uncovered.

This document is obviously woefully incomplete, for one thing it has
absolutely no per-architecture information, except "may depend on the
CPU in your system."  Hopefully this will encourage people to
---

As far as I can tell, Documentation/Changes is the only thing we have
that even attempts to document the basic requirements.  This attempts
to formalize that fact.

 Documentation/Changes  |  396 
 Documentation/Requirements |  394 +++
 2 files changed, 394 insertions(+), 396 deletions(-)
 delete mode 100644 Documentation/Changes
 create mode 100644 Documentation/Requirements

diff --git a/Documentation/Changes b/Documentation/Changes
deleted file mode 100644
index cb2b141..000
--- a/Documentation/Changes
+++ /dev/null
@@ -1,396 +0,0 @@
-Intro
-=
-
-This document is designed to provide a list of the minimum levels of
-software necessary to run the 2.6 kernels, as well as provide brief
-instructions regarding any other "Gotchas" users may encounter when
-trying life on the Bleeding Edge.  If upgrading from a pre-2.4.x
-kernel, please consult the Changes file included with 2.4.x kernels for
-additional information; most of that information will not be repeated
-here.  Basically, this document assumes that your system is already
-functional and running at least 2.4.x kernels.
-
-This document is originally based on my "Changes" file for 2.0.x kernels
-and therefore owes credit to the same people as that file (Jared Mauch,
-Axel Boldt, Alessandro Sigala, and countless other users all over the
-'net).
-
-Current Minimal Requirements
-
-
-Upgrade to at *least* these software revisions before thinking you've
-encountered a bug!  If you're unsure what version you're currently
-running, the suggested command should tell you.
-
-Again, keep in mind that this list assumes you are already
-functionally running a Linux 2.4 kernel.  Also, not all tools are
-necessary on all systems; obviously, if you don't have any ISDN
-hardware, for example, you probably needn't concern yourself with
-isdn4k-utils.
-
-o  Gnu C  3.2 # gcc --version
-o  Gnu make   3.79.1  # make --version
-o  binutils   2.12# ld -v
-o  util-linux 2.10o   # fdformat --version
-o  module-init-tools  0.9.10  # depmod -V
-o  e2fsprogs  1.29# tune2fs
-o  jfsutils   1.1.3   # fsck.jfs -V
-o  reiserfsprogs  3.6.3   # reiserfsck -V 2>&1|grep 
reiserfsprogs
-o  xfsprogs   2.6.0   # xfs_db -V
-o  pcmciautils004 # pccardctl -V
-o  quota-tools3.09# quota -V
-o  PPP2.4.0   # pppd --version
-o  isdn4k-utils   3.1pre1 # isdnctrl 2>&1|grep version
-o  nfs-utils  1.0.5   # showmount --version
-o  procps 3.2.0   # ps --version
-o  oprofile   0.9 # oprofiled --version
-o  udev   081 # udevinfo -V
-o  grub   0.93# grub --version
-
-Kernel compilation
-==
-
-GCC

-
-The gcc version requirements may vary depending on the type of CPU in your
-computer.
-
-Make
-
-
-You will need Gnu make 3.79.1 or later to build the kernel.
-
-Binutils
-
-
-Linux on IA-32 has recently switched from using as86 to using gas for
-assembling the 16-bit boot code, removing the need for as86 to compile
-your kernel.  This change does, however, mean that you need a recent
-release of binutils.
-
-System utilities
-
-
-Architectural changes
--
-
-DevFS has been obsoleted in favour of udev
-(http://www.kernel.org/pub/linux/utils/kernel/hotplug/)
-
-32-bit UID support is now in place.  Have fun!
-
-Linux documentation for functions is transitioning to inline
-documentation via specially-formatted comments near their
-definitions in the source.  These comments can be combined with the
-SGML templates in the Documentation/DocBook directory to make DocBook
-files, which can then be converted by DocBook stylesheets to PostScript,
-HTML, PDF files, and several other formats.  In order to convert from
-DocBook format to a format of your choice, you'll need to install Jade as
-well as the desired DocBook stylesheets.
-
-Util-linux
---
-
-New versions of util-linux provide *fdisk support for 

Re: sched_yield: delete sysctl_sched_compat_yield

2007-11-29 Thread Zhang, Yanmin
On Fri, 2007-11-30 at 13:46 +1100, Nick Piggin wrote:
> On Wednesday 28 November 2007 09:57, Arjan van de Ven wrote:
> > On Tue, 27 Nov 2007 17:33:05 +0800
> >
> > "Zhang, Yanmin" <[EMAIL PROTECTED]> wrote:
> > > If echo "1">/proc/sys/kernel/sched_compat_yield before starting
> > > volanoMark testing, the result is very good with kernel 2.6.24-rc3 on
> > > my 16-core tigerton.
> > >
> > > 1) If /proc/sys/kernel/sched_compat_yield=1, comparing with 2.6.22,
> > > 2.6.24-rc3 has more than 70% improvement;
> > > 2) If /proc/sys/kernel/sched_compat_yield=0, comparing with 2.6.22,
> > > 2.6.24-rc3 has more than 80% regression;
> > >
> > > On other machines, the volanoMark result also has much improvement if
> > > /proc/sys/kernel/sched_compat_yield=1.
> > >
> > > Would you like to change function yield_task_fair to delete codes
> > > around sysctl_sched_compat_yield, or just initiate it to 1?
> >
> > sounds like a bad idea; volanomark (well, technically the jvm behind
> > it) is abusing sched_yield() by assuming it does something it really
> > doesn't do, and as it happens some of the earlier 2.6 schedulers
> > accidentally happened to behave in a way that was nice for this
> > benchmark.
> 
> OK, why is this still happening? Haven't we been asking JVMs to use
> futexes or posix locking for years and years now? Are there any sane
> jvms that _don't_ use yield?
I think it's an issue of volanomark (a kind of java application) instead of JVM.

> 
> 
> > Todays kernel has a different behavior somewhat (and before people
> > scream "regression"; sched_yield() behavior isn't really specified and
> > doesn't make any sense at all, whatever you get is what you get
> > it's pretty much an insane defacto behavior that is incredibly tied to
> > which decisions the scheduler makes how, and no app can depend on that
> 
> It is a performance regression. Is there any reason *not* to use the
> "compat" yield by default?
There is no, so I suggest to set sched_compat_yield=1 by default.
If sched_compat_yield=0, kernel almost does nothing but returns. When
sched_compat_yield=1, it is closer to the meaning of sched_yield man page.

> As you say, for SCHED_OTHER tasks, yield
> can do almost anything. We may as well do something that isn't a
> regression...
I just found SCHED_OTHER in man sched_setscheduler. Is it SCHED_NORMAL in
the latest kernel?

> 
> 
> > in any way. In fact, I've proposed to make sched_yield() just do an
> > msleep(1)... that'd be closer to what sched_yield is supposed to do
> > standard wise than any of the current behaviors  ;_
> 
> What makes you say that? IIRC of all the things that sched_yeild can
> do, it is not allowed to block. So this is about the only thing that
> will break the standard...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What can we do to get ready for memory controller merge in 2.6.25

2007-11-29 Thread Balbir Singh
Nick Piggin wrote:
> On Friday 30 November 2007 01:43, Balbir Singh wrote:
>> They say better strike when the iron is hot.
>>
>> Since we have so many people discussing the memory controller, I would
>> like to access the readiness of the memory controller for mainline
>> merge. Given that we have some time until the merge window, I'd like to
>> set aside some time (from my other work items) to work on the memory
>> controller, fix review comments and defects.
>>
>> In the past, we've received several useful comments from Rik Van Riel,
>> Lee Schermerhorn, Peter Zijlstra, Hugh Dickins, Nick Piggin, Paul Menage
>> and code contributions and bug fixes from Hugh Dickins, Pavel Emelianov,
>> Lee Schermerhorn, YAMAMOTO-San, Andrew Morton and KAMEZAWA-San. I
>> apologize if I missed out any other names or contributions
>>
>> At the VM-Summit we decided to try the current double LRU approach for
>> memory control. At this juncture in the space-time continuum, I seek
>> your support, feedback, comments and help to move the memory controller
> 
> Do you have any test cases, performance numbers, etc.? And also some
> results or even anecdotes of where this is going to be used would be
> interesting...
> 

Some test results were posted at

http://lkml.org/lkml/2007/8/17/69
http://lkml.org/lkml/2007/8/19/36
http://lwn.net/Articles/242554/

Some results for the RSS controller can be found in the OLS paper

https://ols2006.108.redhat.com/2007/Reprints/singh-Reprint.pdf

and at

http://lkml.org/lkml/2007/5/18/1

As far as test cases are concerned, I have a simple test case that I use
that allocates memory and touches all the allocated memory in a loop. I
can post that out if required. It uses various types of allocation

1. mmaped memory
2. anonymous memory
3. shared memory

I also run various benchmarks inside a control group, limited to 400 MB
of RAM.

One interesting that I noticed was that when I booted with mem= and created a container with the same . The swapout
test case ran much faster in the container (NOTE: This was prior to the
swap cache changes).

KAMEZAWA-San posted some test results on background reclaim and per zone
reclaim

http://forum.openvz.org/index.php?t=tree=4696=23964&==

The simplest use cases that come to mind are

1. Memory control for containers/virtualization
2. Job Isolation


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Avoid overflows in kernel/time.c

2007-11-29 Thread H. Peter Anvin

Chris Snook wrote:

H. Peter Anvin wrote:

NOTE: This patch uses a bc(1) script to compute the appropriate
constants.


Perhaps dc would be more appropriate?  That's included in busybox.



Perhaps it would, but I think there is more variability between dc 
implementations -- consider if the busybox version is broken, for eample.


Either way, how many people compile their kernels in a busybox environment?

Anyway, I don't think compiling bc is hard on anything which has a C 
compiler.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield: delete sysctl_sched_compat_yield

2007-11-29 Thread Nick Piggin
On Friday 30 November 2007 13:51, Arjan van de Ven wrote:
> On Fri, 30 Nov 2007 13:46:22 +1100
>
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> > > Todays kernel has a different behavior somewhat (and before people
> > > scream "regression"; sched_yield() behavior isn't really specified
> > > and doesn't make any sense at all, whatever you get is what you
> > > get it's pretty much an insane defacto behavior that is
> > > incredibly tied to which decisions the scheduler makes how, and no
> > > app can depend on that
> >
> > It is a performance regression. Is there any reason *not* to use the
> > "compat" yield by default? As you say, for SCHED_OTHER tasks, yield
> > can do almost anything. We may as well do something that isn't a
> > regression..
>
> it just makes OTHER tests/benchmarks regress this is one of those
> things where you just can't win.

OK, which ones? Because java is slightly important...


> > > in any way. In fact, I've proposed to make sched_yield() just do an
> > > msleep(1)... that'd be closer to what sched_yield is supposed to do
> > > standard wise than any of the current behaviors  ;_
> >
> > What makes you say that? IIRC of all the things that sched_yeild can
> > do, it is not allowed to block. So this is about the only thing that
> > will break the standard...
>
> sched_yield OF COURSE can block.. it's a schedule call after all!

In unix, blocking ~= removed from runqueue, no?

OF COURSE it is allowed to cooperatively schedule another task, but
I don't see why you think it should so obviously be allowed to block
/ sleep.

It breaks the basically only invariant of sched_yeild in that the
task will no longer run when there is nothing else running.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Avoid overflows in kernel/time.c

2007-11-29 Thread H. Peter Anvin

Andrew Morton wrote:


NOTE: This patch uses a bc(1) script to compute the appropriate
constants.


Does this add the first dependency upon the availability of bc?


I believe it does.  I used bc because doing it C would have required 
arbitrary-precision code or have added a dependency on libgmp.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

2007-11-29 Thread Alan Stern
On Thu, 29 Nov 2007, Linus Torvalds wrote:

> Heh. It definitely hasn't gotten lost by "the git software".

No, it sure hasn't.  In fact it was staring me right in the face and I 
didn't realize it.

> In fact, with 
> the kinds of hints you already gave, git makes it really _trivial_ to find 
> it.
> 
> Here's what you do:
> 
>   git log v2.6.23.. --author=Wilcox
> 
> and then just search for "scan_mutex", in the hope that Matthew wrote a 
> nice commit message. And yes, he did, so in less than a blink you get:
> 
>   commit 6b7f123f378743d739377871c0cbfbaf28c7d25a
>   Author: Matthew Wilcox <[EMAIL PROTECTED]>
>   Date:   Tue Jun 26 15:18:51 2007 -0600
>   
>   [SCSI] Fix async scanning double-add problems
> 
>   Stress-testing and some thought has revealed some places where
>   asynchronous scanning needs some more attention to locking.
>   
>- Since async_scan is a bit, we need to hold the host_lock while
>  modifying it to prevent races against other CPUs modifying the 
> word
>  that bit is in.  This is probably a theoretical race for the 
> moment,
>  but other patches may change that.
>- The async_scan bit means not only that this host is being scanned
>  asynchronously, but that all the devices attached to this host 
> are not
>  yet added to sysfs.  So we must ensure that this bit is always 
> in sync.
>  I've chosen to do this with the scan_mutex since it's already 
> acquired
>  in most of the right places.
>   ...
> 
> which I assume is the commit you're talking about.

Yep, that's the one.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

2007-11-29 Thread Eric W. Biederman
Ben Woodard <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> Vivek Goyal <[EMAIL PROTECTED]> writes:
>>
>>> Ok. Got it. So in this case we route the interrupts directly through LAPIC
>>> and put LVT0 in ExtInt mode and IOAPIC is bypassed.
>>>
>>> I am looking at Intel Multiprocessor specification v1.4 and as per figure
>>> 3-3 on page 3-9, 8259 is connected to LINTIN0 line, which in turn is
>>> connected to LINTIN0 pin on all processors. If that is the case, even in
>>> this mode, all the CPU should see the timer interrupts (which is coming
>>> from 8259)?
>>
>> However things are implemented completely differently now.  I don't think
>> the coherent hypertransport domain of AMD processors actually routes
>> ExtINT interrupts to all cpus but instead one (the default route?) is
>> picked.
>>
>> So I think for the kdump case we pretty much need to use an IOAPIC
>> in virtual wire mode for recent AMD systems.
>>
>> For current Intel systems I believe either scenario still works.
>>
>>> Can you print the LAPIC registers (print_local_APIC) during normal boot
>>> and during kdump boot and paste here?
>>
>> It's worth a look.
>>
>> I still think we need to just use apic mode at kernel startup, and
>> be done with it.
>>
>
> Neil whipped up a patch to try this and evidently it worked on his test boxes
> but it didn't work very well on our problem tests box. It hung after the 
> kernel
> printed "Ready". i.e. on a normal boot I get:

Interesting can you please try an early_printk console.


I expect you made it a fair ways and it just didn't show up because you didn't
get as far as the normal serial port setup.

You don't have any output from your linux kernel.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield: delete sysctl_sched_compat_yield

2007-11-29 Thread Arjan van de Ven
On Fri, 30 Nov 2007 13:46:22 +1100
Nick Piggin <[EMAIL PROTECTED]> wrote:

> > Todays kernel has a different behavior somewhat (and before people
> > scream "regression"; sched_yield() behavior isn't really specified
> > and doesn't make any sense at all, whatever you get is what you
> > get it's pretty much an insane defacto behavior that is
> > incredibly tied to which decisions the scheduler makes how, and no
> > app can depend on that
> 
> It is a performance regression. Is there any reason *not* to use the
> "compat" yield by default? As you say, for SCHED_OTHER tasks, yield
> can do almost anything. We may as well do something that isn't a
> regression..

it just makes OTHER tests/benchmarks regress this is one of those
things where you just can't win.

> 
> 
> > in any way. In fact, I've proposed to make sched_yield() just do an
> > msleep(1)... that'd be closer to what sched_yield is supposed to do
> > standard wise than any of the current behaviors  ;_
> 
> What makes you say that? IIRC of all the things that sched_yeild can
> do, it is not allowed to block. So this is about the only thing that
> will break the standard...

sched_yield OF COURSE can block.. it's a schedule call after all!



-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield: delete sysctl_sched_compat_yield

2007-11-29 Thread Nick Piggin
On Wednesday 28 November 2007 09:57, Arjan van de Ven wrote:
> On Tue, 27 Nov 2007 17:33:05 +0800
>
> "Zhang, Yanmin" <[EMAIL PROTECTED]> wrote:
> > If echo "1">/proc/sys/kernel/sched_compat_yield before starting
> > volanoMark testing, the result is very good with kernel 2.6.24-rc3 on
> > my 16-core tigerton.
> >
> > 1) If /proc/sys/kernel/sched_compat_yield=1, comparing with 2.6.22,
> > 2.6.24-rc3 has more than 70% improvement;
> > 2) If /proc/sys/kernel/sched_compat_yield=0, comparing with 2.6.22,
> > 2.6.24-rc3 has more than 80% regression;
> >
> > On other machines, the volanoMark result also has much improvement if
> > /proc/sys/kernel/sched_compat_yield=1.
> >
> > Would you like to change function yield_task_fair to delete codes
> > around sysctl_sched_compat_yield, or just initiate it to 1?
>
> sounds like a bad idea; volanomark (well, technically the jvm behind
> it) is abusing sched_yield() by assuming it does something it really
> doesn't do, and as it happens some of the earlier 2.6 schedulers
> accidentally happened to behave in a way that was nice for this
> benchmark.

OK, why is this still happening? Haven't we been asking JVMs to use
futexes or posix locking for years and years now? Are there any sane
jvms that _don't_ use yield?


> Todays kernel has a different behavior somewhat (and before people
> scream "regression"; sched_yield() behavior isn't really specified and
> doesn't make any sense at all, whatever you get is what you get
> it's pretty much an insane defacto behavior that is incredibly tied to
> which decisions the scheduler makes how, and no app can depend on that

It is a performance regression. Is there any reason *not* to use the
"compat" yield by default? As you say, for SCHED_OTHER tasks, yield
can do almost anything. We may as well do something that isn't a
regression...


> in any way. In fact, I've proposed to make sched_yield() just do an
> msleep(1)... that'd be closer to what sched_yield is supposed to do
> standard wise than any of the current behaviors  ;_

What makes you say that? IIRC of all the things that sched_yeild can
do, it is not allowed to block. So this is about the only thing that
will break the standard...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix kmem_cache_free performance regression in slab

2007-11-29 Thread Andrew Morton
On Thu, 29 Nov 2007 12:05:13 -0700 Matthew Wilcox <[EMAIL PROTECTED]> wrote:

> The database performance group have found that half the cycles spent
> in kmem_cache_free are spent in this one call to BUG_ON.  Moving it
> into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a
> performance win of almost 0.5% on their particular benchmark.
> 
> The call was added as part of commit ddc2e812d592457747c4367fb73edcaa8e1e49ff
> with the comment that "overhead should be minimal".  It may have been
> minimal at the time, but it isn't now.
> 

It is worth noting that the offending commit hit mainline in June 2006.

It takes a very long time for some performance regressions to be
discovered.  By which time it is effectively too late to fix it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Trailing periods in kernel messages

2007-11-29 Thread Joe Perches
On Fri, 2007-11-30 at 09:54 +0800, Li Zefan wrote:
> So it doesn't deserve the effort to eliminate these periods, isn't it?

I hope these will eventually disappear.

> Or we can add a check to checkpatch.pl to prevent new ones.

Perhaps that's a good idea.

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index cbb4258..707f84c 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1390,6 +1390,10 @@ sub process {
if ($line =~ /\*\s*\)\s*k[czm]alloc\b/) {
WARN("unnecessary cast may hide bugs, see 
http://c-faq.com/malloc/mallocnocast.html\n; . $herecurr);
}
+
+   if ($rawline =~ 
/(print|pr_(emerg|alert|crit|err|warning|notice|info|debug)).*\.\\n\"/) {
+   WARN("unnecessary period before newline\n" . $herecurr);
+   }
}
 
if ($chk_patch && !$is_patch) {


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pnpacpi : exceeded the max number of IO resources

2007-11-29 Thread Shaohua Li

On Fri, 2007-11-30 at 03:18 +0100, Rene Herman wrote:
> On 29-11-07 10:11, Dave Young wrote:
> 
> > The pnpacpi rsparser.c report warnings of:
> > exceeded the max number of IO resources: 24
> > 
> > dmesg|grep exceeded|wc
> > 66 5943564
> 
> Heavens... (added CCs of people who just upped it from 8 -- I suppose the 
> problem is not new then?)
Properly we should make a bit bigger till Thomas's patch is ready.
Thomas, your patch isn't 2.6.24 staff, right?

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Something similar to inotify in 2.4.

2007-11-29 Thread Rene Herman

On 29-11-07 18:09, Vitaliy Ivanov wrote:


Can anyone advice whether there is something similar to inotify in 2.4
kernel?


inotify is 2.6 (dnotify 2.4).

Rene
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-29 Thread Rusty Russell
On Thursday 29 November 2007 10:36:06 Christoph Lameter wrote:
> The code becomes much simpler if gs would point to the beginning of the
> per cpu area and if the __per_cpu_offset[i] would do the same. No weird
> __per_cpu_start offsetting anymore.

It is a little weird, but it gave flexibility for most archs.

ISTR I had issues relocating the percpu area to 0, but I look forward to your 
code!

> The generic write/readpercpu functionality introduced by the cpu_alloc
> patchset works best with offsets relative to an arch dependent
> register. All per cpu data (pda, percpu and allocpercpu) is handles as an
> offset relative to the start of the per cpu data.

Hmm, did someone cc me on the patchset and I missed it?

> If the current offset by __per_cpu_start is kept then a per cpu allocator
> may have to dish out addresses that go beyond __per_cpu_end.

Of course; you just need congruence in your allocation across CPUs.  It's 
possible, but no worse than the requirements on other schemes where you can 
reach a variable with a single addition for the CPU.

> I think dealing with a per cpu variable as if it would be an offset
> relative to a base is natural for the typical addressing of cpus based on
> an offset relative to some register.

We've had practical problems getting the compiler to eke out the potential 
benefit.  That's why we settled for an offset between where the compiler 
expected and where the variable actually was.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pnpacpi : exceeded the max number of IO resources

2007-11-29 Thread Rene Herman

On 29-11-07 10:11, Dave Young wrote:


The pnpacpi rsparser.c report warnings of:
exceeded the max number of IO resources: 24

dmesg|grep exceeded|wc
66 5943564


Heavens... (added CCs of people who just upped it from 8 -- I suppose the 
problem is not new then?)


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-29 Thread Rusty Russell
On Friday 30 November 2007 03:53:34 Arjan van de Ven wrote:
> On Mon, 26 Nov 2007 10:25:33 -0800
>
> > Agreed. On first glance, I was intrigued but:
> >
> > 1) Why is everyone so concerned that export symbol space is large?
> > - does it cost cpu or running memory?
>
> yes. about 120 bytes per symbol

But this patch makes that worse, not better.

> > - does it cause bugs?
>
> yes, bad apis are causing bugs... sys_open is just the starter of that.

Sure, but this doesn't change the APIs, either.  We seem to have fixed 
sys_open the right way, and since we're not supposed to care about 
out-of-tree modules...

Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

2007-11-29 Thread Ben Woodard

Eric W. Biederman wrote:

Vivek Goyal <[EMAIL PROTECTED]> writes:


Ok. Got it. So in this case we route the interrupts directly through LAPIC
and put LVT0 in ExtInt mode and IOAPIC is bypassed.

I am looking at Intel Multiprocessor specification v1.4 and as per figure
3-3 on page 3-9, 8259 is connected to LINTIN0 line, which in turn is 
connected to LINTIN0 pin on all processors. If that is the case, even in

this mode, all the CPU should see the timer interrupts (which is coming
from 8259)?


However things are implemented completely differently now.  I don't think
the coherent hypertransport domain of AMD processors actually routes
ExtINT interrupts to all cpus but instead one (the default route?) is
picked.

So I think for the kdump case we pretty much need to use an IOAPIC
in virtual wire mode for recent AMD systems.

For current Intel systems I believe either scenario still works.


Can you print the LAPIC registers (print_local_APIC) during normal boot
and during kdump boot and paste here?


It's worth a look.

I still think we need to just use apic mode at kernel startup, and
be done with it.



Neil whipped up a patch to try this and evidently it worked on his test 
boxes but it didn't work very well on our problem tests box. It hung 
after the kernel printed "Ready". i.e. on a normal boot I get:



2007-11-29 13:48:29 Loading
vmlinuz-2.6.18-13chaos.ben.test
2007-11-29 13:48:29 Loading
initrd-2.6.18-13chaos.ben.test.
..
2007-11-29 13:48:29 Ready.
2007-11-29 13:48:30 Linux version 2.6.18-13chaos.ben.test ([EMAIL PROTECTED]) 
(gcc
version 4.1.2 20070626 (Red Hat 4.1.2-14
)) #10 SMP Thu Nov 29 13:11:49 PST 2007
2007-11-29 13:48:30 Command line: initrd=initrd-2.6.18-13chaos.ben.test
loglevel=8 console=ttyS0,115200n8 [EMAIL PROTECTED] elevator=deadline 
swiotlb=65536 selinux=0 apic=debug 
BOOT_IMAGE=vmlinuz-2.6.18-13chaos.ben.test BOOTIF=

01-00-30-48-57-91-56

With Neil's patch:
2007-11-29 17:12:55 PXELINUX 2.11 2004-08-16  Copyright (C) 1994-2004 H. 
Peter Anvin

2007-11-29 17:12:55 Boot options [default: 2.6.18-54.el5.bz336371]:
2007-11-29 17:12:55 linux-2.6.18-13chaos.ben.test-2.6.18-54.el5.bz336371
2007-11-29 17:12:55 linux
2007-11-29 17:12:55 linux-2.6.18-54.el5.bz336371
2007-11-29 17:12:55 linux-2.6.18-52.el5
2007-11-29 17:12:55 linux-2.6.18-13chaos.ben.test-2.6.18-13chaos.ben.test
2007-11-29 17:12:55 linux-2.6.23-0.214.rc8.git2.fc8
2007-11-29 17:12:55 linux-2.6.18-8.1.14.el5
2007-11-29 17:12:55 linux-2.6.18-7chaos
2007-11-29 17:12:55 boot:
2007-11-29 17:13:02 Loading
vmlinuz-2.6.18-13chaos.ben.test
2007-11-29 17:13:02 Loading
initrd-2.6.18-13chaos.ben.test.
..
2007-11-29 17:13:02 Ready.
(END)
That's all she wrote. End of story. Had to reboot to another kernel to 
make get it back.


Neil's patch:

--- linux-2.6.18.noarch/arch/x86_64/kernel/i8259.c.orig 2007-11-28 
18:00:31.0 -0500
+++ linux-2.6.18.noarch/arch/x86_64/kernel/i8259.c  2007-11-29 
10:37:14.0 -0500

@@ -599,4 +599,30 @@

if (!acpi_ioapic)
setup_irq(2, );
+
+   /*
+ * Switch from PIC to APIC mode.
+ */
+connect_bsp_APIC();
+setup_local_APIC();
+
+if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) {
+panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
+  GET_APIC_ID(apic_read(APIC_ID)), boot_cpu_id);
+/* Or can we switch back to PIC here? */
+}
+
+/*
+ * Now start the IO-APICs
+ */
+if (!skip_ioapic_setup && nr_ioapics)
+setup_IO_APIC();
+else
+nr_ioapics = 0;
+
+   /*
+* Disable local irqs here so start_kernel doesn't complain
+*/
+   local_irq_disable();
+
 }
--- linux-2.6.18.noarch/arch/x86_64/kernel/smpboot.c.orig 
2007-11-28 18:07:33.0 -0500
+++ linux-2.6.18.noarch/arch/x86_64/kernel/smpboot.c2007-11-29 
10:37:59.0 -0500

@@ -1088,26 +1088,6 @@


/*
-* Switch from PIC to APIC mode.
-*/
-   connect_bsp_APIC();
-   setup_local_APIC();
-
-   if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) {
-   panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
- GET_APIC_ID(apic_read(APIC_ID)), boot_cpu_id);
-   /* Or can we switch back to PIC here? */
-   }
-
-   /*
-* Now start the IO-APICs
-*/
-   if (!skip_ioapic_setup && nr_ioapics)
-   setup_IO_APIC();
-   else
-   nr_ioapics = 0;
-
-   /*
 * Set up local APIC timer on boot CPU.
 */



Eric

___

Re: What can we do to get ready for memory controller merge in 2.6.25

2007-11-29 Thread Nick Piggin
On Friday 30 November 2007 01:43, Balbir Singh wrote:
> They say better strike when the iron is hot.
>
> Since we have so many people discussing the memory controller, I would
> like to access the readiness of the memory controller for mainline
> merge. Given that we have some time until the merge window, I'd like to
> set aside some time (from my other work items) to work on the memory
> controller, fix review comments and defects.
>
> In the past, we've received several useful comments from Rik Van Riel,
> Lee Schermerhorn, Peter Zijlstra, Hugh Dickins, Nick Piggin, Paul Menage
> and code contributions and bug fixes from Hugh Dickins, Pavel Emelianov,
> Lee Schermerhorn, YAMAMOTO-San, Andrew Morton and KAMEZAWA-San. I
> apologize if I missed out any other names or contributions
>
> At the VM-Summit we decided to try the current double LRU approach for
> memory control. At this juncture in the space-time continuum, I seek
> your support, feedback, comments and help to move the memory controller

Do you have any test cases, performance numbers, etc.? And also some
results or even anecdotes of where this is going to be used would be
interesting...

Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

2007-11-29 Thread Ben Woodard

Vivek Goyal wrote:

On Wed, Nov 28, 2007 at 11:02:06AM -0500, Neil Horman wrote:

On Wed, Nov 28, 2007 at 10:36:49AM -0500, Vivek Goyal wrote:

On Tue, Nov 27, 2007 at 03:24:35PM -0800, Ben Woodard wrote:

Andi Kleen wrote:

Are we putting the system back in PIC mode or virtual wire mode? I have
not seen systems which support PIC mode. All latest systems seems
to be having virtual wire mode. I think in case of PIC mode, interrupts

Yes it's probably virtual wire. For real PIC mode we would need really
old systems without APIC.


can be delivered to cpu0 only. In virt wire mode, one can program IOAPIC
to deliver interrupt to any of the cpus and that's what we have been

The code doesn't try to program anything specific, it just restores the state
that was left over originally by the BIOS.

So if the BIOS originally left the IOAPIC in a state where the timer 
interrupts were only going to CPU0 then by restoring that state we could be 
bringing this problem upon ourselves when we restore that state.



Hi Ben,

Apart from restoring the original state (Bring APICS back to virtual wire
mode), we also reprogram IOAPIC so that timer interrupt can go to crashing
cpu (and not necessarily cpu0). Look at following code in disable_IO_APIC.

entry.dest.physical.physical_dest =
GET_APIC_ID(apic_read(APIC_ID));

Here we read the apic id of crashing cpu and program IOAPIC accordingly.
This will make sure that even in virtual wire mode, timer interrupts
will be delivered to crashing cpu APIC.


Yes, but according to Bens last debug effort, the APIC printout regarding the
timer setup, indicates that ioapic_i8259.pin == -1, meaning that the 8259 is not
routed through the ioapic.  In those cases, disable_IO_APIC does not take us
through the path you reference above, and does not revert to virtual wire mode.
Instead, it simply disables legacy vector 0, which if I understand this
correctly, simply tells the ioapic to not handle timer interrupts, trusting that
the 8259 in the system will deliver that interrupt where it needs to be.  If the
8259 is wired to deliver timer interrupts to cpu0 only, then you get the problem
that we have, do you?



Ok. Got it. So in this case we route the interrupts directly through LAPIC
and put LVT0 in ExtInt mode and IOAPIC is bypassed.

I am looking at Intel Multiprocessor specification v1.4 and as per figure
3-3 on page 3-9, 8259 is connected to LINTIN0 line, which in turn is 
connected to LINTIN0 pin on all processors. If that is the case, even in

this mode, all the CPU should see the timer interrupts (which is coming
from 8259)?

Can you print the LAPIC registers (print_local_APIC) during normal boot
and during kdump boot and paste here?


Here are the ones from a normal bootup.

I was unable to get info from a kdump boot. I haven't figured out why 
yet. With the same patch that I used to capture this, when I tried to 
kdump the kernel, it paused a second or two after the backtrace and then 
dropped to BIOS and came up normally.


Here is a little trick, at the point where we are trying to get the info 
to print out, the kernel command line hasn't been completely parsed yet. 
That tricked me for part of the day. I had apic=debug on the command 
line but the logic in print_local_APIC saw the default value because the 
kernel command line had yet to be parsed.


2007-11-29 17:58:07 ***Here is the info you requested
2007-11-29 17:58:07
2007-11-29 17:58:07 printing local APIC contents on CPU#0/0:
2007-11-29 17:58:07 ... APIC ID:   (0)
2007-11-29 17:58:07 ... APIC VERSION: 80050010
2007-11-29 17:58:07 ... APIC TASKPRI:  (00)
2007-11-29 17:58:07 ... APIC ARBPRI:  (00)
2007-11-29 17:58:07 ... APIC PROCPRI: 
2007-11-29 17:58:07 ... APIC EOI: 
2007-11-29 17:58:07 ... APIC RRR: 0002
2007-11-29 17:58:07 ... APIC LDR: 
2007-11-29 17:58:07 ... APIC DFR: 
2007-11-29 17:58:07 ... APIC SPIV: 010f
2007-11-29 17:58:07 ... APIC ISR field:
2007-11-29 17:58:07 ... APIC TMR field:
2007-11-29 17:58:07 ... APIC IRR field:
2007-11-29 17:58:07 ... APIC ESR: 
2007-11-29 17:58:07 ... APIC ICR: 4630
2007-11-29 17:58:07 ... APIC ICR2: 0700
2007-11-29 17:58:07 ... APIC LVTT: 0001
2007-11-29 17:58:07 ... APIC LVTPC: 0001
2007-11-29 17:58:07 ... APIC LVT0: 0700
2007-11-29 17:58:07 ... APIC LVT1: 0400
2007-11-29 17:58:07 ... APIC LVTERR: 0001000f
2007-11-29 17:58:07 ... APIC TMICT: 8000
2007-11-29 17:58:07 ... APIC TMCCT: 
2007-11-29 17:58:07 ... APIC TDCR: 
2007-11-29 17:58:07
2007-11-29 17:58:07 number of MP IRQ sources: 15.
2007-11-29 17:58:07 number of IO-APIC #8 registers: 0.
2007-11-29 17:58:07 number of IO-APIC #9 registers: 0.
2007-11-29 17:58:07 number of IO-APIC #10 registers: 0.
2007-11-29 17:58:07 testing the IO APIC...
2007-11-29 17:58:07
2007-11-29 17:58:07 IO APIC #8..
2007-11-29 17:58:07  register #00: 

Re: kondemand: kernel BUG at kernel/workqueue.c:258!

2007-11-29 Thread Arjan van de Ven
On Thu, 29 Nov 2007 13:47:34 -0800
"Pallipadi, Venkatesh" <[EMAIL PROTECTED]> wrote:

>  
> 
> >-Original Message-
> >From: Jiri Slaby [mailto:[EMAIL PROTECTED] 
> >Sent: Thursday, November 29, 2007 1:43 PM
> >To: Pallipadi, Venkatesh; Nakajima, Jun
> >Cc: Linux kernel mailing list
> >Subject: kondemand: kernel BUG at kernel/workqueue.c:258!
> >
> >Hi,
> >
> >while trying to evoke another bug by endlessly change 
> >governors, this appeared:
> >kernel BUG at .../kernel/workqueue.c:258!
> >invalid opcode:  [1] PREEMPT SMP
> >CPU 0
> >Modules linked in: iwl3945 mac80211 cfg80211 tun 
> >cpufreq_userspace rfcomm
> >l2cap hci_usb bluetooth kvm_intel arc4 ecb blkcipher kvm cryptomgr
> >crypto_algapi acpi_cpufreq fglrx(P) asus_laptop sr_mod cdrom ehci_hcd
> >uhci_hcd battery
> >Pid: 443, comm: kondemand/0 Tainted: P2.6.23 #38
> 
> Kernel version?


on the same line as the tainted flag and 2 below the binary module that
is in use I assume Jiri is now working on reproducing this
untainted ... ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Avoid overflows in kernel/time.c

2007-11-29 Thread Chris Snook

H. Peter Anvin wrote:

NOTE: This patch uses a bc(1) script to compute the appropriate
constants.


Perhaps dc would be more appropriate?  That's included in busybox.

-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Trailing periods in kernel messages

2007-11-29 Thread Li Zefan
Joe Perches wrote:
> On Fri, 2007-11-30 at 09:12 +0800, Li Zefan wrote:
>> Just a roughly grep:
>> # grep -r -P --include=*.[ch] 'printk.*\.\\n' * | wc -l
>> 6025
>> # grep -r -P --include=*.[ch] '\.\\n' * | wc -l
>> 12723
> 
> Inequivalent.
> 
> Try:
>   grep -rP --include=*.[ch] 'printk.*\.\\n' * | wc -l
> and
>   grep -rp --include=*.[ch] 'printk.*[^\.]\\n' * | wc -l
> 
> 6k/38k
> 

My 2nd grep finds out how many strings are terminated with '.'.
Those strings may finally pass to prink().

So it doesn't deserve the effort to eliminate these periods, isn't it?
Or we can add a check to checkpatch.pl to prevent new ones.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Avoid overflows in kernel/time.c

2007-11-29 Thread Andrew Morton
On Thu, 29 Nov 2007 16:19:51 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

> When the conversion factor between jiffies and milli- or microseconds
> is not a single multiply or divide, as for the case of HZ == 300, we
> currently do a multiply followed by a divide.  The intervening
> result, however, is subject to overflows, especially since the
> fraction is not simplified (for HZ == 300, we multiply by 300 and
> divide by 1000).
> 
> This is exposed to the user when passing a large timeout to poll(),
> for example.
> 
> This patch replaces the multiply-divide with a reciprocal
> multiplication on 32-bit platforms.  When the input is an unsigned
> long, there is no portable way to do this on 64-bit platforms there is
> no portable way to do this since it requires a 128-bit intermediate
> result (which gcc does support on 64-bit platforms but may generate
> libgcc calls, e.g. on 64-bit s390), but since the output is a 32-bit
> integer in the cases affected, just simplify the multiply-divide
> (*3/10 instead of *300/1000).
> 
> The reciprocal multiply used can have off-by-one errors in the upper
> half of the valid output range.  This could be avoided at the expense
> of having to deal with a potential 65-bit intermediate result.  Since
> the intent is to avoid overflow problems and most of the other time
> conversions are only semiexact, the off-by-one errors were considered
> an acceptable tradeoff.
> 
> NOTE: This patch uses a bc(1) script to compute the appropriate
> constants.

Does this add the first dependency upon the availability of bc?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4, v3] Physical PCI slot objects

2007-11-29 Thread Alex Chiang
Hi Kenji-san,

* Kenji Kaneshige <[EMAIL PROTECTED]>:
> > Hi Gary, Kenji-san, et. al,
> > 
> > * Gary Hade <[EMAIL PROTECTED]>:
> >> Alex, What I was trying to suggest is a boot-time kernel
> >> option, not a kernel configuration option.  The basic idea is
> >> to give the user (with a single binary kernel) the ability to
> >> include your ACPI-PCI slot driver feature changes only when
> >> they are really needed.  In addition to reducing the number of
> >> system/PCI hotplug driver combinations where your changes would
> >> need to be validated, I believe would also help alleviate other
> >> worries (e.g. Andi Kleen's memory consumption concern).  I
> >> believe this goal could also be achieved with the kernel config
> >> option by making the pci_slot module runtime loadable with the
> >> PCI hotplug drivers only visiting your new code when the
> >> pci_slot driver is loaded, although I think this would be more
> >> difficult to implement.
> > 
> > I have modified my patch series so that the final patch that
> > introduces my ACPI-PCI slot driver is a full-fledged module, that
> > has a tristate Kconfig option.
> > 
> 
> Thank you for your good job.

Thanks for testing. :)

> I tested shpchp and pciehp both with and without pci_slot
> module. There seems no regression from shpchp and pciehp's
> point of view.  (I had a little concern about the hotplug
> slots' name that vary depending on whether pci_slot
> functionality is enabled or disabled. But, now that we can
> build pci_slot driver as a kernel module, I don't think it is a
> big problem).

Hm, you are right. On my machine, if I load pciehp first and
acpiphp second (even without loading pci_slot), I will see the
following:

[EMAIL PROTECTED] slots]# ls
0016_0006  0197_0005  10  3  4  7  8  9

[EMAIL PROTECTED] slots]# lsmod | grep pci_slot
[EMAIL PROTECTED] slots]# lsmod | grep hp
acpiphp   115984  0 
pciehp140616  0 
pci_hotplug   123972  2 acpiphp,pciehp

On the other hand, if I do load pci_slot first, and then pciehp,
you are right, I will see something like this:

[EMAIL PROTECTED] slots]# ls
1  10  2  3  4  5  6  7  8  9

[EMAIL PROTECTED] slots]# lsmod | grep pci_slot
pci_slot   74436  0 
[EMAIL PROTECTED] slots]# lsmod | grep hp
pciehp140616  0 
pci_hotplug   123972  1 pciehp

But I do agree, people don't need to load pci_slot at all if they
don't want it, and they won't be bothered.

> Only the problems is that I got Call Traces with the following
> error messages when pci_slot driver was loaded, and one strange
> slot named '1023' was registered (other slots are fine). This
> is the same problem I reported before.
> 
> sysfs: duplicate filename '1023' can not be created
> WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
> 
> kobject_add failed for 1023 with -EEXIST, don't try to
> register things with the same name in the same directory.
> 
> On my system, hotplug slots themselves can be added, removed
> and replaced with the ohter type of I/O box. The ACPI firmware
> tells OS the presence of those slots using _STA method (That
> is, it doesn't use 'LoadTable()' AML operator). On the other
> hand, current pci_slot driver doesn't check _STA.  As a result,
> pci_slot driver tryied to register the invalid (non-existing)
> slots. The ACPI firmware of my system returns '1023' if the
> invalid slot's _SUN is evaluated. This is the cause of Call
> Traces mentioned above. To fix this problem, pci_slot driver
> need to check _STA when scanning ACPI Namespace.

Now this is very curious. The relevant line in pci_slot is:

check_slot()
status = acpi_evaluate_integer(handle, "_SUN", NULL, sun);
if (ACPI_FAILURE(status))
return -1;

Why does your firmware return the error information inside sun,
instead of returning an error in status? That doesn't seem right
to me...

> I'm sorry for reporting this so late. I'm attaching the patch
> to fix the problem. This is against 2.6.24-rc3 with your
> patches applied. Could you try it?

Applying this patch causes me to only detect populated slots in
my system, which isn't what I want -- otherwise, I could have
just enumerated the PCI bus and found the devices that way. :)

Maybe on your machine, checking existence of _STA might do the
right thing, but I don't think we should actually be looking at
any of the actual bits returned. 

If we check ACPI_STA_DEVICE_PRESENT, then we will not detect
empty slots on my system. Can you try this patch to see if at
least the first call to acpi_evaluate_integer helps? If that
doesn't help, maybe the second block will help you, but it breaks
my machine...

Thanks.

/ac


diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c
index 724f4f0..63a4dc8 100644
--- a/drivers/acpi/pci_slot.c
+++ b/drivers/acpi/pci_slot.c
@@ -55,9 +65,21 @@ static struct acpi_pci_driver acpi_pci_slot_driver = {
 static int
 check_slot(acpi_handle handle, int *device, unsigned long 

Re: Out of tree module using LSM

2007-11-29 Thread Al Viro
On Thu, Nov 29, 2007 at 03:12:38PM -0700, Justin Banks wrote:

> It's not perfect, but as was recently pointed out, if you can only get
> 98% of the way there rather than 100% is that a reason for not trying to
> make it possible?

BTW, that's a fine example of a common fallacy: "$FOO is 98% of the way to
$TARGET" does not allow to interpolate the properties of $TARGET to those
of $FOO.

Telling that a condom is a 98% approximation to platonic ideal of such is
not particulary useful, especially if it turns out that what this number 
really means is that there's a hole on its tip covering 2% of surface...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Markers Implementation for RCU Tracing

2007-11-29 Thread Paul E. McKenney
On Fri, Nov 30, 2007 at 12:11:28AM +0530, K. Prasad wrote:
> Hi,
>   Please review the ensuing set of patches which convert the
> existing RCU tracing mechanism for Preempt RCU and RCU Boost into
> markers.
> 
> These patches are based upon the 2.6.24-rc2-rt1 kernel tree.
> 
> Along with marker transition, the RCU Tracing infrastructure has also
> been modularised to be built as a kernel module, thereby enabling
> runtime changes to the RCU Tracing infrastructure.
> 
> Patch [1/2] - Patch that converts the Preempt RCU tracing in
> rcupreempt.c into markers.
> 
> Patch [1/2] - Patch that converts the Preempt RCU Boost tracing in
> rcupreempt-boost.c into markers.

Looks good to me, though I do not pretend to understand the markers
implementation.  I presume that the markers implementation forces the
varargs usage -- though the markers do seem quite a bit nicer in allowing
the formatting to be specified more naturally.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] Writeback fix for concurrent large and small file writes

2007-11-29 Thread Fengguang Wu
On Thu, Nov 29, 2007 at 12:16:36PM -0800, Michael Rubin wrote:
> Due to my faux pas of top posting (see
> http://www.zip.com.au/~akpm/linux/patches/stuff/top-posting.txt) I am
> resending this email.
> 
> On Nov 28, 2007 4:34 PM, Fengguang Wu <[EMAIL PROTECTED]> wrote:
> > Could you demonstrate the situation? Or if I guess it right, could it
> > be fixed by the following patch? (not a nack: If so, your patch could
> > also be considered as a general purpose improvement, instead of a bug
> > fix.)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 0fca820..62e62e2 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -301,7 +301,7 @@ __sync_single_inode(struct inode *inode, struct 
> > writeback_control *wbc)
> >  * Someone redirtied the inode while were writing 
> > back
> >  * the pages.
> >  */
> > -   redirty_tail(inode);
> > +   requeue_io(inode);
> > } else if (atomic_read(>i_count)) {
> > /*
> >  * The inode is clean, inuse
> >
> 
> By testing the situation I can confirm that the one line patch above
> fixes the problem.
> 
> I will continue testing some other cases to see if it cause any other
> issues but I don't expect it to.

One major concern could be whether a continuous writer dirting pages
at the 'right' pace will generate a steady flow of write I/Os which are
_tiny_hence_inefficient_.

I have gathered some timing info about writeback speed in
http://lkml.org/lkml/2007/10/4/468. For ext3, it takes wb_kupdate()
~15ms to submit 4MB. Whereas one disk I/O typically takes ~5ms. So if
there are too many tiny write I/Os, they will simply get delayed and
merged into bigger ones.

So it's not a problem in *theory* :-)

> I will post this change for 2.6.24 and list Feng as author. If that's
> ok with Feng.

Thank you.

> As for the original patch I will resubmit it for 2.6.25 as a general
> purpose improvement.

There are some discussions and patches on inode number based writeback
clustering which you may want to reference/compare with:
http://lkml.org/lkml/2007/8/21/396
http://lkml.org/lkml/2007/8/27/45

Cheers,
Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Trailing periods in kernel messages

2007-11-29 Thread Joe Perches
On Fri, 2007-11-30 at 09:12 +0800, Li Zefan wrote:
> Just a roughly grep:
> # grep -r -P --include=*.[ch] 'printk.*\.\\n' * | wc -l
> 6025
> # grep -r -P --include=*.[ch] '\.\\n' * | wc -l
> 12723

Inequivalent.

Try:
grep -rP --include=*.[ch] 'printk.*\.\\n' * | wc -l
and
grep -rp --include=*.[ch] 'printk.*[^\.]\\n' * | wc -l

6k/38k

cheers, Joe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RESEND] crypto test: use print_hex_dump from kernel.h instead

2007-11-29 Thread rae l
On Nov 29, 2007 7:13 PM, Herbert Xu <[EMAIL PROTECTED]> wrote:
...
> > uninlining this function shrinks crypto/tcrypt.o's .text from 20,009 bytes
> > down to 19,701.
> >
> > inlining is almost always wrong.
>
> I agree.  Please do as Andrew suggests and resubmit.
inline disabled.

Cc: Randy Dunlap <[EMAIL PROTECTED]>
Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 24141fb..13efc72 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -83,10 +83,9 @@ static char *check[] = {

 static void hexdump(unsigned char *buf, unsigned int len)
 {
-   while (len--)
-   printk("%02x", *buf++);
-
-   printk("\n");
+   print_hex_dump(KERN_CONT, "", DUMP_PREFIX_OFFSET,
+   16, 1,
+   buf, len, false);
 }

 static void tcrypt_complete(struct crypto_async_request *req, int err)

-- 
Denis Cheng
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question regarding mutex locking

2007-11-29 Thread Bryan O'Sullivan

Larry Finger wrote:

If a particular routine needs to lock a mutex, but it may be entered with that 
mutex already locked,
would the following code be SMP safe?

hold_lock = mutex_trylock()


The common way to deal with this is first to restructure your function 
into two.  One always acquires the lock, and the other (often written 
with a "__" prefix) never acquires it.  The never-acquire code does the 
actual work, and the always-acquire function calls it.


You then refactor the callers so that you don't have any code paths on 
which you can't predict whether or not the lock will be held.


http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4, v3] Physical PCI slot objects

2007-11-29 Thread Alex Chiang
Hi Gary,

First, thanks for all the help and testing -- I really appreciate
it.

* Gary Hade <[EMAIL PROTECTED]>:
> 
> I'm getting back to you but unfortunately with not so good
> news.  Sorry Alex.

:-/

> On the x3950 (configured single node) I encountered the below
> problem when attempting to hotplug a PCIe adapter when 'pci_slot'
> was loaded prior to 'acpiphp'.  I did not see the problem when
> the drivers were loaded in the opposite order.

Very bizarre, especially given the stack trace below, which
doesn't really make any sense to me at all.

> FYI, the node contains 2 hotpluggable PCIe slots and 5
> non-hotpluggable PCIe slots but 'pci_slot' only exposed
> the 2 hotpluggable slots.  This does not appear to be due
> to a 'pci_slot' driver problem since I looked at the DSDT
> and SSDT and found that there are currently no _SUN methods
> for the non-hotpluggable slots.

Ok, this is not too surprising, but it's a different can o'
worms. ;) Let's save this for another day...

> invalid opcode:  [1] SMP 
> CPU 1 
> Modules linked in: acpiphp pci_slot e1000 aic79xx scsi_transport_spi shpchp 
> dock pci_hotplug ipt_LOG xt_limit xt_pkttype button battery ac power_supply 
> ip6t_REJECT xt_tcpudp ipt_REJECT iptable_mangle iptable_filter 
> ip6table_mangle ip_tables ip6table_filter ip6_tables x_tables ipv6 usbhid 
> ff_memless ext3 jbd loop dm_mod ehci_hcd uhci_hcd usbcore ide_cd bnx2 cdrom 
> rng_core reiserfs ata_piix ahci libata thermal processor piix sg megaraid_sas 
> fan edd sd_mod scsi_mod ide_disk ide_core
> Pid: 121, comm: kacpi_notify Not tainted 2.6.24-rc3-gh-smp #1
> RIP: 0010:[]  [] 
> :pci_slot:__this_module+0x21c4/0xf204
> RSP: 0018:81103fa43ea8  EFLAGS: 00010216
> RAX: 81103f944a18 RBX: 81103d4fe910 RCX: 000f
> RDX:  RSI:  RDI: 8110400d13d0
> RBP: 8032d97b R08: 8110400fc7e0 R09: 0002
> R10:  R11: 8021d193 R12: 811040105cf0
> R13:  R14: 80635820 R15: 
> FS:  () GS:8110400ed8c0() knlGS:
> CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> CR2: 2b266d876471 CR3: 00103c825000 CR4: 06e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process kacpi_notify (pid: 121, threadinfo 81103fa42000, task 
> 81103f9f8040)
> Stack:  809c 81103d119a00 8032d99e 81103f9fc540
>  8024618d 81103f9fc540 81103f9fc540 8024696c
>  80246a46  81103f9f8040 80249ada
> Call Trace:
>  [] acpi_ev_notify_dispatch+0x57/0x60
>  [] acpi_os_execute_notify+0x23/0x2c
>  [] run_workqueue+0x7f/0x10b
>  [] worker_thread+0x0/0xe4
>  [] worker_thread+0xda/0xe4
>  [] autoremove_wake_function+0x0/0x2e
>  [] kthread+0x47/0x73
>  [] child_rip+0xa/0x12
>  [] kthread+0x0/0x73
>  [] child_rip+0x0/0x12

Maybe we're trying to kick off a hotplug event on the wrong slot?
I really have no idea...

> Code: ff ff ff ff 40 23 2c 88 ff ff ff ff 00 c8 c6 3b 10 81 ff ff 
> RIP  [] :pci_slot:__this_module+0x21c4/0xf204
>  RSP 

Can you apply this debug patch on top of your tree, and send me
the output?

I'd be curious to see the output for your failure case:

  # modprobe pci_slot debug=1
  # modprobe acpiphp debug=1

Thanks.

/ac

diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c
index 724f4f0..5a62def 100644
--- a/drivers/acpi/pci_slot.c
+++ b/drivers/acpi/pci_slot.c
@@ -30,12 +30,16 @@
 #include 
 #include 
 
+static int debug;
+
 #define DRIVER_VERSION "0.1"
 #define DRIVER_AUTHOR  "Alex Chiang <[EMAIL PROTECTED]>"
 #define DRIVER_DESC"ACPI PCI Slot Detection Driver"
 MODULE_AUTHOR(DRIVER_AUTHOR);
 MODULE_DESCRIPTION(DRIVER_DESC);
 MODULE_LICENSE("GPL");
+MODULE_PARM_DESC(debug, "Debugging mode enabled or not");
+module_param(debug, bool, 0644);
 
 #define _COMPONENT ACPI_PCI_COMPONENT
 ACPI_MODULE_NAME("pci_slot");
@@ -43,6 +47,12 @@ ACPI_MODULE_NAME("pci_slot");
 #define MY_NAME "pci_slot"
 #define err(format, arg...) printk(KERN_ERR "%s: " format , MY_NAME , ## arg)
 #define info(format, arg...) printk(KERN_INFO "%s: " format , MY_NAME , ## arg)
+#define dbg(format, arg...)\
+   do {\
+   if (debug)  \
+   printk(KERN_DEBUG "%s: " format,\
+   MY_NAME , ## arg);  \
+   } while (0)
 
 static int acpi_pci_slot_add(acpi_handle handle);
 static void acpi_pci_slot_remove(acpi_handle handle);
@@ -125,6 +135,9 @@ register_slot(acpi_handle handle, u32 lvl, void *context, 
void **rv)
if (IS_ERR(pci_slot))
err("pci_create_slot returned %ld\n", PTR_ERR(pci_slot));
 
+   

Re: Trailing periods in kernel messages

2007-11-29 Thread Li Zefan
Andrew Morton wrote:
> On Thu, 29 Nov 2007 11:20:18 +0100 Frans Pop <[EMAIL PROTECTED]> wrote:
> 
>> Well, for one it needlessly increases the size of log files.
>> It also IMO just looks weird to have a trailing period only for some 
>> messages and it certainly is completely inappropriate for messages like:
> 
> I'll confess to stealthily deleting some of those periods when nobody is 
> looking.
> I don't find them to have any value and they do have some cost, including 
> screen
> real estate at the source-code level.
> 
> 

Just a roughly grep:

# grep -r -P --include=*.[ch] 'printk.*\.\\n' * | wc -l
6025
# grep -r -P --include=*.[ch] '\.\\n' * | wc -l
12723

:)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bluez-users] Lost connections - mouse and keyboard

2007-11-29 Thread Dave Young
On Nov 30, 2007 4:43 AM, Jiri Kosina <[EMAIL PROTECTED]> wrote:
> On Thu, 29 Nov 2007, Marcel Holtmann wrote:
>
> > > >Nov 28 18:53:39 pico kernel: WARNING: at drivers/hid/hid-core.c:784
> [ ... ]
>
> > > Does bluetooth input devices have something to do with usbhid? I don't
> > > know, perhaps this is another problem in kernel.
> > in case you have a HID proxy dongle the usbhid driver can be involved. And
> > since this is hiddev, then it will be caused by the hid2hci program.
>
> Absolutely.
>
> This particular warning means, that someone (usually indeed hid2hci)
> passed usage through hiddev that was out of bounds, with respect to the
> device's report descriptor.

Is this behaviour the normal one? IMHO, userspace program should not
cause kernel warnings like this no mater what input from users.

>
> This usually means that hid2hci has chosen the wrong method to switch the
> modes. Unfortunately, it's not easy to implement always the switching
> properly, if we don't know the vendor-specific packet that has to be sent.
>
> --
> Jiri Kosina
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] bdi patches

2007-11-29 Thread Neil Brown
On Thursday November 29, [EMAIL PROTECTED] wrote:
> > http://programming.kicks-ass.net/kernel-patches/foo/
> > 
> > bdi-task-dirty.patch
> > bdi-sysfs.patch
> > bdi-min.patch
> > bdi-max.patch
> > 
> > 
> > Is my current rather experimental stack, I just wrote the max part after
> > having slept on it. I'm not fond of the multiplication there, but I
> > dno't see a way around it.
> > 
> > Compile tested only.
> 
> I've done some testing on these patches and did some changes. So here
> they go.
> 
> Thanks,
> Miklos
> 
> -
> Subject: mm: sysfs: expose the BDI object in sysfs
> 
> Provide a place in sysfs for the backing_dev_info object.
> This allows us to see and set the various BDI specific variables.

You don't say what the place is, and I'm not quite familiar enough
with sysfs internals to figure it out my self.  Help?

And while I was looking I noticed that bdi_register (and bdi_init_fmt)
takes a second argument 'parent', which is always NULL, and which is
undocumented as to purpose.
If no-one would ever add another call to bdi_register, why have the
second arg, and if they might, how would they know what to put there?

Finally, the omission of NFS bothers me - and makes me wonder if the
choice of name in sysfs is appropriate.

Would a program ever want to generate the name (in sysfs) for a
particular bdi?  If so, how would it do it.

It seems to me after a fairly quick look that a bdi is always
associated with a device number.  For block devices the device number
is obvious.  For NFS and FUSE, the device number is an anon device
number allocated at mount time.
Maybe the name of the bdi should be based on that number.  Then it
would be possible to map directly from e.g. a file to the bdi that the
file would be written to. 

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-29 Thread James Morris
On Thu, 29 Nov 2007, Al Viro wrote:

> Incidentally, I would really love to see the threat profile we are talking
> about.  

Exactly.

Please come up with a set of requirements that can be reviewed by the core 
kernel folk, and perhaps then focus on how to meet those requirements once 
they have been accepted.



- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6] time: fix typo in comments

2007-11-29 Thread Li Zefan
>>  
>> -/* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, the we can
>> +/* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, then we can
> 
> divide
> 

Yes, I missed it.

>> - * which, buy the way, it can do, but it take more code and at least 2
>> + * which, buy the way, it can do, but it takes more code and at least 2
> 
> by the way 
> (and does this really add anything to the sentence?)
> 

Thanks for pointing it out :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: at kernel/resource.c:189 __release_resource

2007-11-29 Thread Andrew Morton
On Thu, 29 Nov 2007 16:40:37 -0700
Bjorn Helgaas <[EMAIL PROTECTED]> wrote:

> On Monday 26 November 2007 11:05:38 pm Andrew Morton wrote:
> > On Thu, 22 Nov 2007 22:41:16 +0100 Jiri Slaby <[EMAIL PROTECTED]> wrote:
> > > Ok, I hit the bug, suspend of 00:06 device complains about it:
> > > WARNING: at .../kernel/resource.c:185 __release_resource()
> > > 
> > > Call Trace:
> > >  [] release_resource+0xb5/0xf0
> > >  [] pnp_release_resources+0x70/0x130
> > >  [] pnp_stop_dev+0x45/0x90
> > >  [] pnp_bus_suspend+0x92/0xb0
> > >  [] suspend_device+0x113/0x180
> > >  [] device_suspend+0x200/0x320
> > >  [] suspend_devices_and_enter+0xa5/0x170
> > >  [] enter_state+0x209/0x270
> > >  [] state_store+0xaf/0xf0
> > >  [] kobj_attr_store+0x17/0x20
> > >  [] sysfs_write_file+0xce/0x140
> > >  [] vfs_write+0xc7/0x170
> > >  [] sys_write+0x50/0x90
> > >  [] system_call+0x7e/0x83
> > > 
> > > # LANG=en ll /sys/devices/pnp0/00:06/
> > > total 0
> > > lrwxrwxrwx 1 root root0 Nov 22 22:35 driver -> 
> > > ../../../bus/pnp/drivers/serial
> > > -r--r--r-- 1 root root 4096 Nov 22 22:35 id
> > > -r--r--r-- 1 root root 4096 Nov 22 22:35 options
> > > drwxr-xr-x 2 root root0 Nov 22 22:35 power
> > > -rw-r--r-- 1 root root 4096 Nov 22 22:35 resources
> > > lrwxrwxrwx 1 root root0 Nov 22 22:35 subsystem -> ../../../bus/pnp
> > > drwxr-xr-x 3 root root0 Nov 22 22:35 tty
> > > -rw-r--r-- 1 root root 4096 Nov 22 22:35 uevent
> > 
> > I suppose that's a genuine leak, presumably in 8250_pnp.
> 
> We used to have only the serial driver resource reservation.  We now
> have an additional 00:06 resource that is the parent of the serial
> resource, e.g.,
> 
>   03f8-03ff : 00:06
> 03f8-03ff : serial
> 
> I think this problem happens because pnp_bus_suspend() calls
> serial_pnp_suspend(), which suspends the driver but does nothing
> with the resources.  Then it calls pnp_stop_dev(), which releases
> the 00:06 resource, which still has a serial child resource.
> 
> The corresponding PCI code in pci_device_suspend() does not do
> any generic device disable or resource release.  I don't know
> why PNP disables the device on suspend.  I glanced through the
> ACPI spec but didn't see a requirement for it.  Maybe Pierre [1]
> remembers.
> 
> Maybe we could either remove the pnp_{stop,start}_dev() calls
> from the suspend/resume path, or move the PNP resource management
> out of pnp_{start,stop}_dev().
> 
> Bjorn
> 
> [1] http://lkml.org/lkml/2005/11/30/39

So was this particular problem caused/exposed by
pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch, or is
it in mainline?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: constant_tsc and TSC unstable

2007-11-29 Thread Frans Pop
Paul Rolland wrote:
> Total of 2 processors activated (6919.15 BogoMIPS).
> ENABLING IO-APIC IRQs
> ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> checking TSC synchronization [CPU#0 -> CPU#1]:
> Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock.
> Marking TSC unstable due to: check_tsc_sync_source failed.
> Brought up 2 CPUs
> ...

Not sure if this is related, but thought I'd contribute it anyway...

I've got a Pentium D system (dual core, single processor) and I on some
boots I get "Marking TSC unstable due to check_tsc_sync_source failed" with
some cycles warp between CPUs, while most boots are OK. This kind of
inconsistency seems more due to a failure in the kernel to deal with
differences between boots than with something inherent to the hardware.

I conclude that because basically I never have any problems with the system
once it has booted and the TSC has passed.

>From my kern.logs since Okt 26, I get the following data:
2.6.23+cfs:  2 passes
2.6.23.1:1 pass;   1 failure  (48 cycles warp)
2.6.24-rc1: 15 passes
2.6.24-rc2: 13 passes; 1 failure  (8 cycles warp)
2.6.24-rc3:  5 passes; 3 failures (8, 8 and 16 cycles warp)

Note that this is not a new issue. For 2.6.21/2.6.23-RCx kernels I reported
similar data in http://lkml.org/lkml/2007/9/16/45.

Cheers,
FJP
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Avoid overflows in kernel/time.c

2007-11-29 Thread H. Peter Anvin
When the conversion factor between jiffies and milli- or microseconds
is not a single multiply or divide, as for the case of HZ == 300, we
currently do a multiply followed by a divide.  The intervening
result, however, is subject to overflows, especially since the
fraction is not simplified (for HZ == 300, we multiply by 300 and
divide by 1000).

This is exposed to the user when passing a large timeout to poll(),
for example.

This patch replaces the multiply-divide with a reciprocal
multiplication on 32-bit platforms.  When the input is an unsigned
long, there is no portable way to do this on 64-bit platforms there is
no portable way to do this since it requires a 128-bit intermediate
result (which gcc does support on 64-bit platforms but may generate
libgcc calls, e.g. on 64-bit s390), but since the output is a 32-bit
integer in the cases affected, just simplify the multiply-divide
(*3/10 instead of *300/1000).

The reciprocal multiply used can have off-by-one errors in the upper
half of the valid output range.  This could be avoided at the expense
of having to deal with a potential 65-bit intermediate result.  Since
the intent is to avoid overflow problems and most of the other time
conversions are only semiexact, the off-by-one errors were considered
an acceptable tradeoff.

NOTE: This patch uses a bc(1) script to compute the appropriate
constants.

Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
---
 kernel/Makefile |8 +++
 kernel/time.c   |   29 +---
 kernel/timeconst.bc |  123 +++
 3 files changed, 152 insertions(+), 8 deletions(-)
 create mode 100644 kernel/timeconst.bc

diff --git a/kernel/Makefile b/kernel/Makefile
index dfa9695..f136d18 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -80,3 +80,11 @@ quiet_cmd_ikconfiggz = IKCFG   $@
 targets += config_data.h
 $(obj)/config_data.h: $(obj)/config_data.gz FORCE
$(call if_changed,ikconfiggz)
+
+$(obj)/time.o: $(obj)/timeconst.h
+
+quiet_cmd_timeconst  = BC  $@
+  cmd_timeconst = (echo $(CONFIG_HZ) | bc -q $<) > $@
+targets += timeconst.h
+$(obj)/timeconst.h: $(src)/timeconst.bc $(wildcard include/config/hz.h) FORCE
+   $(call if_changed,timeconst)
diff --git a/kernel/time.c b/kernel/time.c
index 09d3c45..8e790b5 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -39,6 +39,8 @@
 #include 
 #include 
 
+#include "timeconst.h"
+
 /*
  * The timezone where the local system is located.  Used as a default by some
  * programs who obtain this value by using gettimeofday.
@@ -93,7 +95,8 @@ asmlinkage long sys_stime(time_t __user *tptr)
 
 #endif /* __ARCH_WANT_SYS_TIME */
 
-asmlinkage long sys_gettimeofday(struct timeval __user *tv, struct timezone 
__user *tz)
+asmlinkage long sys_gettimeofday(struct timeval __user *tv,
+struct timezone __user *tz)
 {
if (likely(tv != NULL)) {
struct timeval ktv;
@@ -118,7 +121,7 @@ asmlinkage long sys_gettimeofday(struct timeval __user *tv, 
struct timezone __us
  * hard to make the program warp the clock precisely n hours)  or
  * compile in the timezone information into the kernel.  Bad, bad
  *
- * - TYT, 1992-01-01
+ * - TYT, 1992-01-01
  *
  * The best thing to do is to keep the CMOS clock in universal time (UTC)
  * as real UNIX machines always do it. This avoids all headaches about
@@ -239,7 +242,11 @@ unsigned int inline jiffies_to_msecs(const unsigned long j)
 #elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
return (j + (HZ / MSEC_PER_SEC) - 1)/(HZ / MSEC_PER_SEC);
 #else
-   return (j * MSEC_PER_SEC) / HZ;
+# if BITS_PER_LONG == 32
+   return ((u64)HZ_TO_MSEC_MUL32 * j) >> HZ_TO_MSEC_SHR32;
+# else
+   return (j * HZ_TO_MSEC_NUM) / HZ_TO_MSEC_DEN;
+# endif
 #endif
 }
 EXPORT_SYMBOL(jiffies_to_msecs);
@@ -251,7 +258,11 @@ unsigned int inline jiffies_to_usecs(const unsigned long j)
 #elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
return (j + (HZ / USEC_PER_SEC) - 1)/(HZ / USEC_PER_SEC);
 #else
-   return (j * USEC_PER_SEC) / HZ;
+# if BITS_PER_LONG == 32
+   return ((u64)HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32;
+# else
+   return (j * HZ_TO_USEC_NUM) / HZ_TO_USEC_DEN;
+# endif
 #endif
 }
 EXPORT_SYMBOL(jiffies_to_usecs);
@@ -351,7 +362,7 @@ EXPORT_SYMBOL(mktime);
  * normalize to the timespec storage format
  *
  * Note: The tv_nsec part is always in the range of
- * 0 <= tv_nsec < NSEC_PER_SEC
+ * 0 <= tv_nsec < NSEC_PER_SEC
  * For negative values only the tv_sec field is negative !
  */
 void set_normalized_timespec(struct timespec *ts, time_t sec, long nsec)
@@ -452,12 +463,13 @@ unsigned long msecs_to_jiffies(const unsigned int m)
/*
 * Generic case - multiply, round and divide. But first
 * check that if we are doing a net multiplication, that
-* we wouldnt overflow:
+* we 

Relation between nr_dirty and nr_inactive

2007-11-29 Thread Kunal Trivedi
Hi,
I am running older kernel (CentOS 2.6.9-34 SMP) on 32 bit arch. Some
of my systems got hung, while trying to write some data to disk. All
those systems exhibit similar pattern where during this time,
/proc/meminfo suggesting 'Inactive' < 'Dirty'. All of machines have 2G
of physical memory and ~1.5G memory is locked (via mlock).

I tried reading code but could not establish any direct relationship
between Zone->in_active pages vs. per-cpu_page_state->nr_dirty.

Has anybody seen system in this kind of state before ? And are these 2
parameters affect each-other ?

Thanks
-Kunal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm2 (bugfix for memory cgroup per-zone-struct allocation.)

2007-11-29 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 16:25:33 -0500
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:
> > -   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
> > +   /*
> > +* This routine is called against possible nodes.
> > +* But it's BUG to call kmalloc() against offline node.
> > +*
> > +* TODO: this routine can waste much memory for nodes which will
> > +*   never be onlined. It's better to use memory hotplug callback
> > +*   function.
> > +*/
> > +   if (node_state(node, N_HIGH_MEMORY))
> > +   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
> > +   else
> > +   pn = kmalloc(sizeof(*pn), GFP_KERNEL);
> > if (!pn)
> > return 1;
> >  
> > 
> 
> This worked for me.  Can boot 24-rc3-mm2 [if I turn off async scsi scan,
> that is--not related to mem controller].  
> 
Thank you !

> Just FYI, on my ia64 platform, with NODES_SHIFT == 8 [RHEL & SLES ship
> with 10, I believe], the size of the mem_cgroup structure is ~10KB.
> 
Yes. But...
I'll ask Goto-san how memory hotplug callback works and try it.

Thanks,
-Kame


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] x86 setup: don't recalculate ss:esp unless really necessary

2007-11-29 Thread H. Peter Anvin
Hi Linus,

It appears that unconditionally resetting the stack, which fixes old
LILO, breaks LOADLIN after all.  This patch should work with either,
as well as work around the command-line truncation bug in old versions
of SYSLINUX.

Please pull:

  git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup.git 
for-linus

Jens Rottmann (1):
  x86 setup: don't recalculate ss:esp unless really necessary

 arch/x86/boot/header.S |   41 -
 1 files changed, 16 insertions(+), 25 deletions(-)

commit 16252da654800461e0e1c32697cb59f4cda15aa9
Author: Jens Rottmann <[EMAIL PROTECTED]>
Date:   Tue Nov 27 12:35:13 2007 +0100

x86 setup: don't recalculate ss:esp unless really necessary

In order to work around old LILO versions providing an invalid ss
register, the current setup code always sets up a new stack,
immediately following .bss and the heap. But this breaks LOADLIN.

This rewrite of the workaround checks for an invalid stack (ss!=ds)
first, and leaves ss:sp alone otherwise (apart from aligning esp).

[hpa note: LOADLIN has a number of arbitrary hard-coded limits that
are being pushed up against.  Without some major revision of LOADLIN
itself it will not be sustainable keeping it alive.  This gives it
another brief lease on life, however.  This patch also helps the
cmdline truncation problem with old versions of SYSLINUX.]

Signed-off-by: Jens Rottmann 
Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 6ef5a06..4cc5b04 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -236,39 +236,30 @@ start_of_setup:
movw%ax, %es
cld
 
-# Apparently some ancient versions of LILO invoked the kernel
-# with %ss != %ds, which happened to work by accident for the
-# old code.  If the CAN_USE_HEAP flag is set in loadflags, or
-# %ss != %ds, then adjust the stack pointer.
+# Apparently some ancient versions of LILO invoked the kernel with %ss != %ds,
+# which happened to work by accident for the old code.  Recalculate the stack
+# pointer if %ss is invalid.  Otherwise leave it alone, LOADLIN sets up the
+# stack behind its own code, so we can't blindly put it directly past the heap.
 
-   # Smallest possible stack we can tolerate
-   movw$(_end+STACK_SIZE), %cx
-
-   movwheap_end_ptr, %dx
-   addw$512, %dx
-   jnc 1f
-   xorw%dx, %dx# Wraparound - whole segment available
-1: testb   $CAN_USE_HEAP, loadflags
-   jnz 2f
-
-   # No CAN_USE_HEAP
movw%ss, %dx
cmpw%ax, %dx# %ds == %ss?
movw%sp, %dx
-   # If so, assume %sp is reasonably set, otherwise use
-   # the smallest possible stack.
-   jne 4f  # -> Smallest possible stack...
+   je  2f  # -> assume %sp is reasonably set
+
+   # Invalid %ss, make up a new stack
+   movw$_end, %dx
+   testb   $CAN_USE_HEAP, loadflags
+   jz  1f
+   movwheap_end_ptr, %dx
+1: addw$STACK_SIZE, %dx
+   jnc 2f
+   xorw%dx, %dx# Prevent wraparound
 
-   # Make sure the stack is at least minimum size.  Take a value
-   # of zero to mean "full segment."
-2:
+2: # Now %dx should point to the end of our stack space
andw$~3, %dx# dword align (might as well...)
jnz 3f
movw$0xfffc, %dx# Make sure we're not zero
-3: cmpw%cx, %dx
-   jnb 5f
-4: movw%cx, %dx# Minimum value we can possibly use
-5: movw%ax, %ss
+3: movw%ax, %ss
movzwl  %dx, %esp   # Clear upper half of %esp
sti # Now we should have a working stack
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + proc-fix-the-threaded-proc-self.patch added to -mm tree

2007-11-29 Thread Ingo Molnar

* Eric W. Biederman <[EMAIL PROTECTED]> wrote:

> > You'll never run out of this sort of problem. Keeping Linux lean and 
> > simple would be far better.
> 
> Nah.  The control group stuff has all kinds of corner cases because it 
> is a new and untested API.  The namespace work after we get the code 
> cleanup up so it is maintainable and we can work with it is usually 
> just finding our globals through a pointer instead of from a static 
> variable.  Hardly a measurable cost on the best day.

yeah - anyone who claims that containers are 'fat' has likely not even 
looked at the code. Even maintainance-wise there's very visible positive 
effects: we do discover and properly map our "global resource" 
dependencies and abstract them. That increases cleanliness of our code 
and APIs all around.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH x86/mm 01/11] x86-32 thread_struct.debugreg

2007-11-29 Thread Jeff Dike
On Thu, Nov 29, 2007 at 01:50:55PM -0800, Roland McGrath wrote:
> UML is also a good test, though I have never been set up to verify
> anything beyond "UML seems to boot far enough to complain I don't
> have a userland filesystem for it".  

BTW, this doesn't exercise ptrace at all.  Interesting ptrace things
only start happening when userspace runs.

Grab an interesting-looking image from http://uml.nagafix.co.uk,
uncompress it, and run
./linux ubda=the-filesystem-image

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH x86/mm 11/11] x86 ptrace merge removals

2007-11-29 Thread Jeff Dike
On Thu, Nov 29, 2007 at 02:38:03PM -0800, Roland McGrath wrote:
> > Can you make sure that UML still runs when you're done with ptrace?
> 
> I'd be glad to, especially if you give me some advice on testing (.config
> for um-i386 and um-x86_64, what do try that constitutes "UML still runs").

Use defconfig and boot it.  If you break ptrace, I think it's
overwhelmingly likely that UML will stop booting.  So if UML boots,
I'd say you're good to go, with one caveat.  That is, UML should
report at boot that PTRACE_SYSEMU works.  I put in a fallback from
PTRACE_SYSEMU to PTRACE_SYSCALL when Fedora broke PTRACE_SYSEMU.

> Right now (before these), UML
> doesn't build for x86_64 or i386 from this tree to begin with.

For current -mm, you'll need
http://marc.info/?l=linux-kernel=119635496908681=raw to build.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Robert Hancock

Phillip Susi wrote:

Tejun Heo wrote:

Agreed.  Nobody cared on ATA controllers is usually very effective at
taking the whole machine down.  Is there any reason why we don't turn on
irqpoll on turned off IRQs automatically?


Why does a single spurious interrupt cause it to be shut down?  I can 
see if the interrupt is stuck on and keeps interrupting constantly, but 
if it's just the occasional spurious interrupt, why not just ignore it 
and move on?


I'm not certain offhand, but I think there may be such a threshold. 
However, an occasional spurious interrupt isn't likely. For a 
level-triggered interrupt, an unhandled interrupt will keep interrupting 
forever since nobody knows how to clear it (until we decide to disable 
the IRQ entirely).


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Tejun Heo
Phillip Susi wrote:
> Tejun Heo wrote:
>> Agreed.  Nobody cared on ATA controllers is usually very effective at
>> taking the whole machine down.  Is there any reason why we don't turn on
>> irqpoll on turned off IRQs automatically?
> 
> Why does a single spurious interrupt cause it to be shut down?  I can
> see if the interrupt is stuck on and keeps interrupting constantly, but
> if it's just the occasional spurious interrupt, why not just ignore it
> and move on?

Because SFF ATA controller don't have IRQ pending bit.  You don't know
whether IRQ is raised or not.  Plus, accessing the status register which
clears pending IRQ can be very slow on PATA machines.  It has to go
through the PCI and ATA bus and come back.  So, unconditionally trying
to clear IRQ by accessing Status can incur noticeable overhead if the
IRQ is shared with devices which raise a lot of IRQs.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-11-29 Thread Andrew Morton
On Thu, 29 Nov 2007 08:14:10 -
"Metzger, Markus T" <[EMAIL PROTECTED]> wrote:

> Support for Intel's last branch recording to ptrace. This gives
> debuggers
> access to this hardware feature and allows them to show an execution
> trace
> of the debugged application.
> 
> Last branch recording (see section 18.5 in the Intel 64 and IA-32
> Architectures Software Developer's Manual) allows taking an execution
> trace of the running application without instrumentation. When a branch
> is executed, the hardware logs the source and destination address in a
> cyclic buffer given to it by the OS.
> 
> This can be a great debugging aid. It shows you how exactly you got
> where you currently are without requiring you to do lots of single
> stepping and rerunning.
> 
> This patch manages the various buffers, configures the trace
> hardware, disentangles the trace, and provides a user interface via
> ptrace. On the high-level design:
> - there is one optional trace buffer per thread_struct
> - upon a context switch, the trace hardware is reconfigured to either
>   disable tracing or to use the appropriate buffer for the new task.
>   - tracing induces ~20% overhead as branch records are sent out on
> the bus. 
>   - the hardware collects trace per processor. To disentangle the
> traces for different tasks, we use separate buffers and reconfigure
> the trace hardware.
> - the low-level data layout is configured at cpu initialization time
>   - different processors use different branch record formats
> 
> 
> patch 1/2 contains the kernel changes
> patch 2/2 contains changes to the ptrace man pages
> 
> 

Is there any userspace code avaialble which people can use to play with
this?

How do you envisage it being used in the long term?  Do you expect any of
the standard performance tuning tools will be tweaked to understand this
feature and if so which ones?

I'm generally wondering "how will developers be using this in a year or
two's time?"

Please cc Michael Kerrisk <[EMAIL PROTECTED]> on future versions of
these patches.

The patches were horridly wordwrapped.

Is there any likelihood that any other CPUs do now or will in the future
support any similar feature to this?  If so, is an implementation which is
100% contained to arch/x86 appropriate?  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Reproducible data corruption with sendfile+vsftp - splice regression?

2007-11-29 Thread Holger Hoffstaette

Hi -

This regular Linux user and lkml lurker just noticed data corruption in
ftp'ed files and narrowed it down to vsftpd using sendfile(). So far this
has never caused problems in the past; I have not noticed this with
2.6.22.x but may have missed it. I do remember reading about some changes
to the underlying splice stuff since .23 so that may have something to do
with it.

The scenario:

- created a file with known bit pattern on Linux server
- ftp-got this file to Windows client: file has bad crc (yes, binary)
- verified with another client: same result

I have thus far eliminated (to the best of my knowledge) NICs, switches,
cables, the Windows FTP clients, the hard disk in the server (SATA, ext3):
nothing suspicious in any logs. Box is an AMD Sempron 2600+ with 1.5 GB
RAM, added rt8169 card, Gentoo, vsftpd stable 2.0.5 - nothing fancy.
Transferring the file with samba (interestingly with sendfile enabled) and
via ftp but from /dev/shm repeatably works fine; pulling from disk creates
bad crc, every time. The file is readable and can be copied, verified etc.
over and over so I'm sure that I'm not falling prey to a false positive.
ifconfig indicates no dropped or otherwise corrupted packets.
I noticed this first with 2.6.4-rc3, but also just tried the latest stable
2.6.23.9 with the same config, with no change in behaviour. After setting
vsftpd to use_sendfile=NO, gigs can be transferred without corruption.

The data corruption is sporadic, but absolutely repeatable. The file with
the known good pattern just contains multiple lines of:

012345678901234567890123456789012345678901234567890
012345678901234567890123456789012345678901234567890
012345678901234567890123456789012345678901234567890
..etc..

A corrupted file is missing random characters, so that the corrupted lines
looks like this (line numbers added by me):

19785: 012345678901234567890123456789012345678901234567890
19786: 01234567890123456789012345678901234567890123678901234567890
19787: 012345678901234567890123456789012345678901234567890

or:

20074: 012345678901234567890123456789012345678901234567890
20075: 
01234567890123456789012345678901234567890123012345678901234567890123456789012345678901234567890
20076: 012345678901234567890123456789012345678901234567890

Again, other network or hd traffic shows no signs of gremlins; the box is
perfectly stable, and turning sendfile on or off triggers/untriggers the
corruption reliably.  I will try 2.6.22.x over the weekend, and before I
bother lkml with dmesg/.config etc. I wanted to fish for initial thoughts.

thanks
Holger


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xfs: revert to double-buffering readdir

2007-11-29 Thread Christian Kujau

On Sun, 25 Nov 2007, Christoph Hellwig wrote:

This patch does exactly that and reverts xfs_file_readdir to what's
basically the 2.6.23 version minus the uio and vnops junk.


Thanks, works here too (without nordirplus as a mountoption).
Am I supposed to close the bug[0] or do you guys want to leave this
open to track the Real Fix (TM) for 2.6.25?

Again, thank you for the fix!
Christian.

[0] http://bugzilla.kernel.org/show_bug.cgi?id=9400
--
BOFH excuse #112:

The monitor is plugged into the serial port
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: at kernel/resource.c:189 __release_resource

2007-11-29 Thread Bjorn Helgaas
On Monday 26 November 2007 11:05:38 pm Andrew Morton wrote:
> On Thu, 22 Nov 2007 22:41:16 +0100 Jiri Slaby <[EMAIL PROTECTED]> wrote:
> > Ok, I hit the bug, suspend of 00:06 device complains about it:
> > WARNING: at .../kernel/resource.c:185 __release_resource()
> > 
> > Call Trace:
> >  [] release_resource+0xb5/0xf0
> >  [] pnp_release_resources+0x70/0x130
> >  [] pnp_stop_dev+0x45/0x90
> >  [] pnp_bus_suspend+0x92/0xb0
> >  [] suspend_device+0x113/0x180
> >  [] device_suspend+0x200/0x320
> >  [] suspend_devices_and_enter+0xa5/0x170
> >  [] enter_state+0x209/0x270
> >  [] state_store+0xaf/0xf0
> >  [] kobj_attr_store+0x17/0x20
> >  [] sysfs_write_file+0xce/0x140
> >  [] vfs_write+0xc7/0x170
> >  [] sys_write+0x50/0x90
> >  [] system_call+0x7e/0x83
> > 
> > # LANG=en ll /sys/devices/pnp0/00:06/
> > total 0
> > lrwxrwxrwx 1 root root0 Nov 22 22:35 driver -> 
> > ../../../bus/pnp/drivers/serial
> > -r--r--r-- 1 root root 4096 Nov 22 22:35 id
> > -r--r--r-- 1 root root 4096 Nov 22 22:35 options
> > drwxr-xr-x 2 root root0 Nov 22 22:35 power
> > -rw-r--r-- 1 root root 4096 Nov 22 22:35 resources
> > lrwxrwxrwx 1 root root0 Nov 22 22:35 subsystem -> ../../../bus/pnp
> > drwxr-xr-x 3 root root0 Nov 22 22:35 tty
> > -rw-r--r-- 1 root root 4096 Nov 22 22:35 uevent
> 
> I suppose that's a genuine leak, presumably in 8250_pnp.

We used to have only the serial driver resource reservation.  We now
have an additional 00:06 resource that is the parent of the serial
resource, e.g.,

  03f8-03ff : 00:06
03f8-03ff : serial

I think this problem happens because pnp_bus_suspend() calls
serial_pnp_suspend(), which suspends the driver but does nothing
with the resources.  Then it calls pnp_stop_dev(), which releases
the 00:06 resource, which still has a serial child resource.

The corresponding PCI code in pci_device_suspend() does not do
any generic device disable or resource release.  I don't know
why PNP disables the device on suspend.  I glanced through the
ACPI spec but didn't see a requirement for it.  Maybe Pierre [1]
remembers.

Maybe we could either remove the pnp_{stop,start}_dev() calls
from the suspend/resume path, or move the PNP resource management
out of pnp_{start,stop}_dev().

Bjorn

[1] http://lkml.org/lkml/2005/11/30/39
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-29 Thread Jon Masters

On Thu, 2007-11-29 at 21:45 +, Alan Cox wrote:
> > Jargon File in all its glory. And if you still think you could look for
> > patterns, how about executable code that self-modifies in random ways
> > but when executed as a whole actually has the functionality of fetchmail
> > embedded within it? How would you guard against that?
> 
> Thats a problem for whoever writes the ESR detection tool and to what
> level it works. The question for the kernel is how do we provide a
> mechanism to allow (to some extent at least) this kind of tool to run.

Right. I'm just saying reading a single page out of context (no pun
intended) is not going to be very useful. They need to scan the entire
file, which means that there are limited ways this is practical - it's
not practical to do that on every write into a shared mapping, hence a
solution that scans on open, etc. is probably the best there is.

(I know you know this)

Jon.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata NCQ blacklist entry

2007-11-29 Thread Bjoern Olausson
On 11/29/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
>
> I now have affected drives on my desk and am gonna try reproduce it.  My
> gut feeling says it's timing related problem on controller / driver
> side.  Please wait a bit.
>

> > by the way, and OT, did the Plextor DVD-RW drive reach you, Tejun?
>
> No, not yet.  Do you have a tracking number or something?
>
> Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-29 Thread Jon Masters

On Thu, 2007-11-29 at 15:56 -0500, [EMAIL PROTECTED] wrote:
> On Thu, 29 Nov 2007 14:45:51 EST, Jon Masters said:
> > Ah, but I could write a sequence of pages that on their own looked
> > garbage, but in reality, when executed would print out a copy of the
> > Jargon File in all its glory. And if you still think you could look for
> > patterns, how about executable code that self-modifies in random ways
> > but when executed as a whole actually has the functionality of fetchmail
> > embedded within it? How would you guard against that?
> 
> So, just because Fred Cohen showed in his PhD thesis that *perfect* 
> virus/malware
> scanning is equivalent to the Turing Halting Problem, we should abandon
> efforts to make a 99.9998% workable system?

I think you misread what I said. I implied the exact opposite :-)

I'm trying to show that I understand the problem by saying the above,
that doing this perfectly is impossible, but I also happen to believe
that there are those who have solutions that provide a level of
protection to their users, who ask for such things. Hence my point is
that it's not really our place to debate whether virus scanning is
good/bad but more how to provide a sane API. I'll get a spec.

Jon.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: remap_file_pages() broken in 2.6.23?

2007-11-29 Thread Nick Piggin
On Thu, Nov 29, 2007 at 02:45:23PM -0500, Chuck Ebbert wrote:
> Original report: https://bugzilla.redhat.com/show_bug.cgi?id=404201
> 
> The test case below, taken from the LTP test code, prints -1 (as
> expected) on 2.6.22 and 0 on 2.6.23. It tries to remap an out-of-range
> page. Proposed patch follows the program. Bug was apparently caused by
> commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7.

Ah, that's not such good behaviour anyway. mmap is allowed to map
outside the file offset, so you're telling me that remap_file_pages
just magically should not be allowed to remap these...?
 

> Patch:
> 
> Signed-off-by: Supriya Kannery <[EMAIL PROTECTED]>
> 
> --- linux-2.6.23/mm/fremap.c.orig 2007-11-22 00:56:09.0 -0600
> +++ linux-2.6.23/mm/fremap.c  2007-11-26 03:08:55.0 -0600
> @@ -124,6 +124,7 @@ asmlinkage long sys_remap_file_pages(uns
>   struct vm_area_struct *vma;
>   int err = -EINVAL;
>   int has_write_lock = 0;
> + unsigned long f_size = 0;
>  
>   if (__prot)
>   return err;
> @@ -181,6 +182,14 @@ asmlinkage long sys_remap_file_pages(uns
>   goto retry;
>   }
>   mapping = vma->vm_file->f_mapping;
> +
> + f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
> + f_size = f_size >> PAGE_CACHE_SHIFT;
> + if ((pgoff + size >> PAGE_CACHE_SHIFT) > f_size) {
> + err = -EINVAL;
> + goto out;
> + }
> +
>   /*
>* page_mkclean doesn't work on nonlinear vmas, so if
>* dirty pages need to be accounted, emulate with linear


I don't think there is anything preventing truncate races here. Theoretically
we could do it by taking i_mutex around here, but anyway then a subsequent
truncate is just going to be able to cause the mapping to be out of bounds
anyway.

If it were any other syscall than remap_file_pages, I'd be much more
hesitant to say this: I propose we change the test case instead. I
also changed other elements of the API, and we had the result tested
and verified by Oracle...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Reduce stack used by lib/hexdump.c

2007-11-29 Thread Joe Perches
On Thu, 2007-11-29 at 22:07 +0100, Jan Engelhardt wrote:
> I'd add GFP_ATOMIC here. Who knows whether tomorrow, the oops dumper
> or warn_on will use print_hex_dump.

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

diff --git a/lib/hexdump.c b/lib/hexdump.c
index 70e23fb..be94934 100644
--- a/lib/hexdump.c
+++ b/lib/hexdump.c
@@ -140,13 +140,20 @@ EXPORT_SYMBOL(hex_dump_to_buffer);
  * Example output using %DUMP_PREFIX_ADDRESS and 4-byte mode:
  * 88089af0: 73727170 77767574 7b7a7978 7f7e7d7c  pqrstuvwxyz{|}~.
  */
+
+#define HEX_LINE_SIZE 200
+
 void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
int rowsize, int groupsize,
const void *buf, size_t len, bool ascii)
 {
const u8 *ptr = buf;
int i, linelen, remaining = len;
-   unsigned char linebuf[200];
+   unsigned char *linebuf;
+
+   linebuf = kmalloc(HEX_LINE_SIZE, GFP_ATOMIC);
+   if (!linebuf) {
+   WARN_ON(1);
+   return;
+   }
 
if (rowsize != 16 && rowsize != 32)
rowsize = 16;
@@ -155,7 +162,7 @@ void print_hex_dump(const char *level, const char 
*prefix_str, int prefix_type,
linelen = min(remaining, rowsize);
remaining -= rowsize;
hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize,
-   linebuf, sizeof(linebuf), ascii);
+   linebuf, HEX_LINE_SIZE, ascii);
 
switch (prefix_type) {
case DUMP_PREFIX_ADDRESS:
@@ -170,6 +177,7 @@ void print_hex_dump(const char *level, const char 
*prefix_str, int prefix_type,
break;
}
}
+   kfree(linebuf);
 }
 EXPORT_SYMBOL(print_hex_dump);
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata NCQ blacklist entry

2007-11-29 Thread Bjoern Olausson
On 11/29/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
>
> I now have affected drives on my desk and am gonna try reproduce it.  My
> gut feeling says it's timing related problem on controller / driver
> side.  Please wait a bit.
>
Okay, no problem, I am just curious.

> > by the way, and OT, did the Plextor DVD-RW drive reach you, Tejun?
>
> No, not yet.  Do you have a tracking number or something?
>
No, I havn't... all I got is the bill... but that doesn't help because
we choosed to use shipment without enshurance... there is no tracking
number. Mhhh that sucks... i can't get rid of the bad feeling that it
got lost. But I'll try to make some checks.

CU
Bjoern
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: constant_tsc and TSC unstable

2007-11-29 Thread Pallipadi, Venkatesh
 

>-Original Message-
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of Paul 
>Rolland (???・???)
>Sent: Thursday, November 29, 2007 8:12 AM
>To: Linux Kernel
>Cc: [EMAIL PROTECTED]
>Subject: constant_tsc and TSC unstable
>
>Hello,
>
>I've a machine with a Core2Duo CPU. /proc/cpuinfo reports the flag
>constant_tsc, but at boot time, I have the log :
>
>...
>Total of 2 processors activated (6919.15 BogoMIPS).
>ENABLING IO-APIC IRQs
>..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
>checking TSC synchronization [CPU#0 -> CPU#1]:
>Measured 3978592228 cycles TSC warp between CPUs, turning off 
>TSC clock.
>Marking TSC unstable due to: check_tsc_sync_source failed.
>Brought up 2 CPUs
>...
>
>This machine is running 2.6.23.1-21.fc7. I know I should 
>report to Fedora,
>but I was wondering if this is a bug or a feature ;)
>

TSCs on Core 2 Duo are supposed to be in sync unless CPU supports deep idle 
states like C2, C3. Can you send the full /proc/cpuinfo and full dmesg.

Thanks,
Venki 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add the infamous Huawei E220 to option.c

2007-11-29 Thread Oliver Neukum
Am Donnerstag, 29. November 2007 19:53:39 schrieb Jaime Velasco Juan:
> Hi,
>
> El jue. 29 de nov. de 2007, a las 15:05:50 +0100, Johann Wilhelm escribió:
> > If everything's working please also add code to also support the other
> > E220 device... so both PID 0x1003 and 0x1004 should be treaded the same
> > way...
> >
> > to test the device with the 0x1004-PID maybe Jaime Velasco
> > <[EMAIL PROTECTED]> could be asked.. he initialy added the lines for
> > this device in option.c
>
> The following patch works for me (on kernel 2.6.23).

Jaime, please add your signed off by line and resend the patch with
both lines to Greg.

Signed-off-by: Oliver Neukum <[EMAIL PROTECTED]>

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] hugetlbfs :shmget with SHM_HUGETLB only works as root

2007-11-29 Thread William Lee Irwin III
On Fri, Nov 30, 2007 at 12:02:32AM +0530, Ciju Rajan K wrote:
>   I tested your patch. But that is not solving the problem.
>   If the code change to user_shm_lock() is not a good solution, could 
> you please suggest a method so that the normal user is able to allocate 
> the huge pages, if his gid is added to /proc/sys/vm/hugetlb_shm_group

The patch I posted resolves a race unrelated to your issue. Raising your
locked memory limits should not be difficult. /etc/limits.conf or similar
should set it up for you. You can also change the default rlimit in the
kernel and compile it with default limits elevated to what you want your
unprivileged process to have to start with if you're truly having lots
of trouble getting userspace to set the default limits properly. I'd
look in include/asm-generic/resource.h


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

2007-11-29 Thread Linus Torvalds


On Thu, 29 Nov 2007, Alan Stern wrote:
> 
> Yes indeed.  I wish I could point you to the exact patch containing the 
> fix, but the git software seems to have lost track of it (it's combined
> in with a large number of other patches with no obvious way to separate 
> it out).  It's also available in the various mailing list archives, but 
> I don't have a pointer to it and there's no reasonable way to search 
> for it.
> 
> The patch in question was written by Matthew Wilcox; it added code to 
> the SCSI async-scanning routines to utilize the scan_mutex.  IMO it 
> should have been applied to 2.6.23 but it wasn't.

Heh. It definitely hasn't gotten lost by "the git software". In fact, with 
the kinds of hints you already gave, git makes it really _trivial_ to find 
it.

Here's what you do:

git log v2.6.23.. --author=Wilcox

and then just search for "scan_mutex", in the hope that Matthew wrote a 
nice commit message. And yes, he did, so in less than a blink you get:

commit 6b7f123f378743d739377871c0cbfbaf28c7d25a
Author: Matthew Wilcox <[EMAIL PROTECTED]>
Date:   Tue Jun 26 15:18:51 2007 -0600

[SCSI] Fix async scanning double-add problems

Stress-testing and some thought has revealed some places where
asynchronous scanning needs some more attention to locking.

 - Since async_scan is a bit, we need to hold the host_lock while
   modifying it to prevent races against other CPUs modifying the 
word
   that bit is in.  This is probably a theoretical race for the 
moment,
   but other patches may change that.
 - The async_scan bit means not only that this host is being scanned
   asynchronously, but that all the devices attached to this host 
are not
   yet added to sysfs.  So we must ensure that this bit is always 
in sync.
   I've chosen to do this with the scan_mutex since it's already 
acquired
   in most of the right places.
...

which I assume is the commit you're talking about.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


alloc_page_vma: should be called from a module? (not exported in x86_64)

2007-11-29 Thread Alejandro Homs Puron
Hi,

We've developed a driver for an image acquisition card, which maps
kernel alloc'ed buffers into user space vma's. We use alloc_page +
remap_pfn_range in the driver mmap file_operation.

After looking at alloc_page_vma, I thought that it might be more
appropiate than alloc_page in this context. However, if CONFIG_NUMA=y
(x86_64), this function is not visible to modules.

Is this limitation intentional?

We alloc RAM in a page-by-page basis. Is vm_insert_page more
appropiate than remap_pfn_range?

Thanks a lot for your help.

Alejandro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Race between generic_forget_inode() and sync_sb_inodes()?

2007-11-29 Thread Neil Brown
On Friday November 30, [EMAIL PROTECTED] wrote:
> On Fri, Nov 30, 2007 at 09:07:06AM +1100, Neil Brown wrote:
> > 
> > Hi David,
> > 
> > On Friday November 30, [EMAIL PROTECTED] wrote:
> > > 
> > > 
> > > I came across this because I've been making changes to XFS to avoid the
> > > inode hash, and I've found that I need to remove the inode from the
> > > dirty list when setting I_WILL_FREE to avoid this race. I can't see
> > > how this race is avoided when inodes are hashed, so I'm wondering
> > > if we've just been lucky or there's something that I'm missing that
> > > means the above does not occur.
> > 
> > Looking at inode.c in 2.6.23-mm1, in generic_forget_inode, I see code:
> > 
> > if (!hlist_unhashed(>i_hash)) {
> > if (!(inode->i_state & (I_DIRTY|I_SYNC)))
> > list_move(>i_list, _unused);
> > 
> > so it looks to me like:
> >If the inode is hashed and dirty, then move it (off the s_dirty
> >list) to inode_unused.
> 
> That check is for if the inode is _not_ dirty or being sync, right?
> Or have I just not had enough coffee this morning?

:-)  And I cannot even blame the lack of coffee as I don't drink it.

My second guess is that we have been lucky which is hard to believe.

I wonder if iput (and even iget) should BUG on I_WILL_FREE as well...

Perplexed.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH x86/mm 01/11] x86-32 thread_struct.debugreg

2007-11-29 Thread Chuck Ebbert
On 11/29/2007 04:50 PM, Roland McGrath wrote:
> Jan Kratochvil has helped me a great deal with ptrace testing lately.
> We have started to collect a small regression test suite, see
> http://sourceware.org/systemtap/wiki/utrace/tests for pointers.  That
> has tests for individual problems that have come up, and not anything
> exhaustive for testing all ptrace functionality.

You could contribute them to LTP?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH x86/mm 6/6] x86-64 ia32 ptrace get/putreg32 current task

2007-11-29 Thread Chuck Ebbert
On 11/29/2007 05:21 PM, Roland McGrath wrote:
>>> case offsetof(struct user32, regs.gs):
>>> *val = child->thread.gsindex;
>>> +   if (child == current)
>>> +   asm("movl %%gs,%0" : "=r" (*val));
>> Won't this return the kernel's GS instead of the user's?
> [...]
>> But this is x86_64, where swapgs is done on kernel entry.
> 
> As I understand it, and from what the documentation I have says, swapgs has
> nothing to do with the %gs selector.  It affects the "GS base register",
> i.e. the MSR.
> 

Yep, I confused the GS selector with the base address in the descriptor.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata NCQ blacklist entry

2007-11-29 Thread Tejun Heo
Bjoern Olausson wrote:
> On 11/7/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
>> Thanks.  We're currently trying to find out what's actually going on
>> with all these drives.  At first, drives which got blacklisted aren't
>> many and made sense (had other problems with NCQ, etc..) but with new
>> generation drives from many vendors showing the same symptom, we aren't
>> too sure now.
>>
>> I'll keep your email in my todo list and add the drive to the blacklist
>> once the problem is verified.
>>
>> Thanks.
> 
> Something new on the NCQ front?
> Just asking if you need someone to test some of your ideas?
> 
> I got the "WDC WD740ADFD-00NLR1"

I now have affected drives on my desk and am gonna try reproduce it.  My
gut feeling says it's timing related problem on controller / driver
side.  Please wait a bit.

> by the way, and OT, did the Plextor DVD-RW drive reach you, Tejun?

No, not yet.  Do you have a tracking number or something?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-29 Thread Andi Kleen
Alan Cox <[EMAIL PROTECTED]> writes:
>
> The simple case is
>   open
>   write cathedral and bazaar in some order
>   close
>process -> label eric_t>
>
>   open (eric_t) - SELinux "no"
>
>
> Anyone smart will then write it out of order and keep the file open, or

That would assume Eric already has a program running on your system
optimized to inject his works in a obfuscated way. And if he has a
program running he can do nearly everything already.  You already
lost the game.

The normal case Tvrtko et.al. are trying to handle would be more the
work getting downloaded from somewhere or read from a usb stick using
normal programs like web browsers or file managers who don't do any
out of order writing tricks and other obfuscation.

Important exception might be things like BitTorrent who write 
out of order or parallel downloaders to cheat TCP congestion control.
Or simply tar+gzip with automatic depacking in desktops.
There are probably more and it's probably tricky but it is not a 
"need to handle arbitary nastiness by a determined attacker" situation.

Anyways I'm not saying that pattern matching is a useful security
measure (just the interaction with compression and encryption makes it
very dubious), but if you're talking hypothetically you should at
least look closely at the hypothetical use cases @)

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >