date:20070117

Re: [PATCH] Provide an interface to limit total page cache.

2007-01-17 Thread Eric W. Biederman

"Roy Huang" <[EMAIL PROTECTED]> writes:

> A patch provide a interface to limit total page cache in
> /proc/sys/vm/pagecache_ratio. The default value is 90 percent. Any
> feedback is appreciated.

Anything except a default value of 100% will change the behavior
and probably reduce the performance on most systems.

> -Roy
>
> diff -urp a/include/linux/sysctl.h b/include/linux/sysctl.h
> --- a/include/linux/sysctl.h  2007-01-15 17:18:46.0 +0800
> +++ b/include/linux/sysctl.h  2007-01-15 17:03:09.0 +0800
> @@ -202,6 +202,7 @@ enum
>   VM_PANIC_ON_OOM=33, /* panic at out-of-memory */
>   VM_VDSO_ENABLED=34, /* map VDSO into new processes? */
>   VM_MIN_SLAB=35,  /* Percent pages ignored by zone reclaim */
> + VM_PAGECACHE_RATIO=36,  /* Percent memory is used as page cache */
> };
>
>
> diff -urp a/kernel/sysctl.c b/kernel/sysctl.c
> --- a/kernel/sysctl.c 2007-01-15 17:18:46.0 +0800
> +++ b/kernel/sysctl.c 2007-01-15 17:03:09.0 +0800
> @@ -1035,6 +1035,15 @@ static ctl_table vm_table[] = {
>   .extra1 = ,
>   },
> #endif
> + {
> + .ctl_name   = VM_PAGECACHE_RATIO,
> + .procname   = "pagecache_ratio",
> + .data   = _ratio,
> + .maxlen = sizeof(pagecache_ratio),
> + .mode   = 0644,
> + .proc_handler   = _ratio_sysctl_handler,
> + .strategy   = _intvec,
> + },
>   { .ctl_name = 0 }
> };

This is broken.

You have allocated a binary number for use with sys_sysctl but
did not test it.

If you need a special proc_handler to take action when the
value is changed you need a special strategy routine.

So since you aren't going to test the binary interface and don't
care about it please don't allocate a number for it and just
use CTL_UNNUMBERED.

And of course please read the top of linux/sysctl.h

Thank you.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.19.2 : Oops

2007-01-17 Thread Nicolas Bareil


On 18 Jan 2007, Greg KH wrote:
> > Since 2.6.19, I get the following Oops once a day, always with the same
> > process, newspipe[1] which use a lot of CPU, threads and I/O.
> > ...
> Can you reproduce it without the grsec patch applied?

I'm compiling a new kernel and will try it soon!

-- 
Nicolas Bareil  http://chdir.org/~nico/
OpenPGP=0xAE4F7057 Fingerprint=34DB22091049FB2F33E6B71580F314DAAE4F7057

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: after effects of a kernel API change

2007-01-17 Thread Eric W. Biederman

"Rajat Jain" <[EMAIL PROTECTED]> writes:

>> >
>> > Is there any way volunteers like me can help in this exercise?
>>
>> See the /APIchanges in the Kernel Janitors TODO list
>> http://kernelnewbies.org/KernelJanitors/Todo
>>
>
> Hi,
>
> This is regarding the link posted above.
>
> 1) How do I make sure if some one is NOT working on any of the
> mentioned bullet points? Who coordinates? On what mailing list?

Depends on the issue.  Release early and release often and there
won't be much duplicate work :)

> 2) Do any patches for the above Todo list have the chances of getting
> merged into the mainstream kernel? Who approves? I suppose the
> respective maintainer of the driver / subsystem getting affected?

Generally. Occasionally for small things other paths are available.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.20-rc5 4/4] sys_futex64 : allows 64bit futexes

2007-01-17 Thread Ingo Molnar

* Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Wed, Jan 17, 2007 at 10:04:53AM +0100, Pierre Peiffer wrote:
> > Hi,
> > 
> > This latest patch is an adaptation of the sys_futex64 syscall 
> > provided in -rt patch (originally written by Ingo). It allows the 
> > use of 64bit futex.
> 
> Big NACK here, we don't need yet another goddamn multiplexer.  Please 
> make this individual syscalls for the actual operations.

actually, we have a big multiplexer there already, so it's only 
symmetric. Nothing is served by doing it half-assed. I raised the issue 
of the multiplexer back when the first futex API was merged (years ago), 
and it was rejected. Now whether you like it or not we've got to live 
with that decision. You are certainly free to introduce a patchset with 
a completely new set of syscall vectors to demultiplex all futex APIs, 
but to just start a half-done demultiplexing makes zero sense.

> > +   if (!ret) {
> > +   switch (cmp) {
> > +   case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
> > +   case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
> 
> Please indent this properly, the ret = .. and reak need to go onto a 
> line on it's own.

this is the standard (already upstream) arithmetics style there for the 
futex cmp ops, and it expresses things in a compact way. See 
include/asm-i386/futex.h:

switch (cmp) {
case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
default: ret = -ENOSYS;
}

Pierre correctly matched the existing style.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] futex null pointer timeout

2007-01-17 Thread Ingo Molnar


* Daniel Walker <[EMAIL PROTECTED]> wrote:

> This fix is mostly from Thomas ..
> 
> The problem was that a futex can be called with a zero timeout (0 
> seconds, 0 nanoseconds) and it's a valid expired timeout. However, the 
> current futex in -rt assumes a zero timeout is an infinite timeout.
> 
> Kevin Hilman found this using LTP's nptl01 test case which would soft 
> hang occasionally.
> 
> The patch reworks do_futex, and futex_wait* so a NULL pointer in the 
> timeout position is infinite, and anything else is evaluated as a real 
> timeout.

thanks, applied.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Ingo Molnar

* Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> > I'll be happy to move this over to the utrace setting, once it is 
> > merged.  Do you think it would be better to include the current 
> > version of kwatch now or to wait for utrace?
> > 
> > Roland, is there a schedule for when you plan to get utrace into 
> > -mm?
> 
> Even if it goes into mainline soon we'll need a lot of time for all 
> architectures to catch up, so I think kwatch should definitely comes 
> first.

i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
/huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
years, precisely because it was done in the 'oh, lets get this feature 
added first, think about it later' manner. Roland's work is a large 
logistical undertaking and we should not make it more complex than it 
is. Once it's in we can add debugging features ontop of that. To me work 
that cleans up existing mess takes precedence before work that adds to 
the mess.

Ingo

ps. please fix your mailer to not emit those silly Mail-Followup-To 
headers! It collapses To: and Cc: lines into one huge unnecessary To: 
line.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH: Update disable_IO_APIC to use 8-bit destination field (X86_64)

2007-01-17 Thread Eric W. Biederman

Vivek Goyal <[EMAIL PROTECTED]> writes:

>> Hi Eric,
>> 
>> In physical destination mode, the destination APIC is determined by
>> APIC ID and in logical destination mode, destination apics are determined
>> by the configurations based on LDR and DFR registers in APIC (Depending
>> on Flat mode or cluster mode).
>> 
>> Looks like previously one supported only 4bit apic ids if system is
>> operating in physical mode and 8bit ids if IOAPIC is put in logical
>> destination mode. That's why, struct IO_APIC_route_entry is containing
>> 4bits for physical apic id.
>> 
>> http://www.intel.com/design/chipsets/datashts/290566.htm
>> 
>> And now newer systems have switched to 8bit apic ids in physical mode.
>> That's why if somebody is crashing on a cpu whose apic id is more than
>> 16, kexec/kdump code will fail as 4bits are not sufficient.
>> 
>> Hence above change makes sense. Given the fact that logical and physical
>> apic id is basically a union, it will work even for older systems where
>> physical apic ids were 4bits only.
>> 
>> OTOH, I think down the line we can get rid of physical dest
>> field all together in struct IO_APIC_route_entry and use logical dest
>> field.
>> 
>
> Or how about making physical_dest field also 8bit like logical_dest field.
> This will work both for 4bit and 8bit physical apic ids at the same time
> code becomes more intutive and it is easier to know whether IOAPIC is being
> put in physical or destination logical mode.

Exactly what I was trying to suggest.

Looking closer at the code I think it makes sense to just kill the union and
stop the discrimination between physical and logical modes and just have a
dest field in the structure.  Roughly as you were suggesting at first.

The reason we aren't bitten by this on a regular basis is the normal code
path uses logical.logical_dest in both logical and physical modes.
Which is a little confusing.

Since there really isn't a distinction to be made we should just stop
trying, which will make maintenance easier :)

Currently there are several non-common case users of physical_dest
that are probably bitten by this problem under the right
circumstances.

So I think we should just make the structure:

struct IO_APIC_route_entry {
__u32   vector  :  8,
delivery_mode   :  3,   /* 000: FIXED
 * 001: lowest prio
 * 111: ExtINT
 */
dest_mode   :  1,   /* 0: physical, 1: logical */
delivery_status :  1,
polarity:  1,
irr :  1,
trigger :  1,   /* 0: edge, 1: level */
mask:  1,   /* 0: enabled, 1: disabled */
__reserved_2: 15;

__u32   __reserved_3: 24,
__dest  :  8;
} __attribute__ ((packed));

And fixup the users.  This should keep us from getting bit by this bug
in the future.  Like when people start introducing support for more
than 256 cores and the low 24bits start getting used.

Or when someone new starts working on the code and thinks the fact
the field name says logical we are actually using the apic in logical
mode.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] futex null pointer timeout

2007-01-17 Thread Thomas Gleixner

On Wed, 2007-01-17 at 16:25 -0800, Daniel Walker wrote:
> The patch reworks do_futex, and futex_wait* so a NULL pointer in the timeout
> position is infinite, and anything else is evaluated as a real timeout.
> 
> Signed-Off-By: Daniel Walker <[EMAIL PROTECTED]>

Ack.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Josef Sipek

On Wed, Jan 17, 2007 at 04:55:49PM -0600, Dave Kleikamp wrote:
...
> diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h linux/fs/jfs/jfs_lock.h
> --- linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h2006-11-29 15:57:37.0 
> -0600
> +++ linux/fs/jfs/jfs_lock.h   2007-01-17 15:30:19.0 -0600
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   *   jfs_lock.h
> @@ -42,6 +43,7 @@ do {
> \
>   if (cond)   \
>   break;  \
>   unlock_cmd; \
> + blk_replug_current_nested();\
>   schedule(); \
>   lock_cmd;   \
>   }   \

Is {,un}lock_cmd a macro? ...

Jeff.

-- 
Keyboard not found!
Press F1 to enter Setup
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: query related to serial console

2007-01-17 Thread Seetharam Dharmosoth

--- Randy Dunlap <[EMAIL PROTECTED]> wrote:

> On Thu, 18 Jan 2007 04:10:17 + (GMT) Seetharam
> Dharmosoth wrote:
> 
> (please don't top-post)
> 
> 
> > Generally sysrq will work with serial console
> right?
> > 
> > suppose system is connected through serial port to
> the
> > other system, (ie serial console), at this time we
> can
> > fire some set of commands through the serial
> console.
> > 
> > the sequesnce is as follows  
> > do ctrl+]
> > send brk
> > then some commands
> > 
> > What is my question is con't we pass commands
> directly
> > 
> > to the console (without send brk signal) ?
> > 
> > This is a feature in Solris..
> > 
> > I am looking in Linux but, uable to find it.
> > 
> > can you please help me
> > 
> > Thanks
> > Seetharam
> 
> Hi,
> It's quite possible that I misunderstand your
> question,
> but anyway:
> 
> Alt-Sysrq- is a route into the kernel sysrq
> handler instead
> of a route into the shell that the serial console is
> connected to,
> so something needs to signal that condition (like a
> BREAK).
> 
> Or a specialized (serial) console app could know
> other ways of
> recognizing sysrq keys.  Or you could use
> /proc/sysrq-trigger:
>   echo b > /proc/sysrq-trigger
> 
Hi Randy,

It's ok.
Thanks for reply.

I have one doubt in this regard.
1) once we connected to the serial console we don't
   want to login into the shell.
   (without login into the shell we want to fire the
   sysrq command like b, r m, etc.)

 for this I am doing like 
  grabing the serial console then
  doing ctrl+]
  so that getting 
  telnet> 
now i want to give command like b, m ,r etc.

but it is not accepting my commands until I do 
telnet> send brk

can you please explain me why like this behavior ?

Thanks
Seetharam

> 
> > --- Erik Mouw <[EMAIL PROTECTED]> wrote:
> > 
> > > On Wed, Jan 17, 2007 at 11:26:54AM +,
> Seetharam
> > > Dharmosoth wrote:
> > > > Is Linux having 'non-break interface for
> serial
> > > > console' ?
> > > 
> > > No idea. Could you explain what a 'non-break
> > > interface for serial
> > > console' is?
> 
> 
> ---
> ~Randy
> 

__
Yahoo! India Answers: Share what you know. Learn something new
http://in.answers.yahoo.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: after effects of a kernel API change

2007-01-17 Thread Rajat Jain


>
> Is there any way volunteers like me can help in this exercise?

See the /APIchanges in the Kernel Janitors TODO list
http://kernelnewbies.org/KernelJanitors/Todo



Hi,

This is regarding the link posted above.

1) How do I make sure if some one is NOT working on any of the
mentioned bullet points? Who coordinates? On what mailing list?

2) Do any patches for the above Todo list have the chances of getting
merged into the mainstream kernel? Who approves? I suppose the
respective maintainer of the driver / subsystem getting affected?

Thanks,

Rajat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/8] Cpuset aware writeback

2007-01-17 Thread Christoph Lameter

On Wed, 17 Jan 2007, Andrew Morton wrote:

> > The problem there is that we do a GFP_ATOMIC allocation (no allocation 
> > context) that may fail when the first page is dirtied. We must therefore 
> > be able to subsequently allocate the nodemask_t in set_page_dirty(). 
> > Otherwise the first failure will mean that there will never be a dirty 
> > map for the inode/mapping.
> 
> True.  But it's pretty simple to change __mark_inode_dirty() to fix this.

Ok I tried it but this wont work unless I also pass the page struct pointer to 
__mark_inode_dirty() since the dirty_node pointer could be freed 
when the inode_lock is droppped. So I cannot dereference the 
dirty_nodes pointer outside of __mark_inode_dirty. 

If I expand __mark_inode_dirty then all variations of mark_inode_dirty() 
need to be changed and we need to pass a page struct everywhere. This 
result in extensive changes.

I think I need to stick with the tree_lock. This also makes more sense 
since we modify dirty information in the address_space structure and the 
radix tree is already protected by that lock.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: after effects of a kernel API change

2007-01-17 Thread Ahmed S. Darwish

On Thu, Jan 18, 2007 at 09:45:04AM +0530, Daniel Rodrick wrote:
> Hi list,
> 
> Whenever there is a change in the kernel API (or a new API is
> introduced), all of the drivers that use the older API need to be
> changed (or recommended to be changed). I believe it is the
> responsibility of the person changing the kernel API, to change all
> the drivers that have found their way into the kernel code?
> 
> How does this happen? Because the person who brought the change in the
> API might not know the internals of all the drivers?
> 
> Is there any way volunteers like me can help in this exercise?

See the /APIchanges in the Kernel Janitors TODO list
http://kernelnewbies.org/KernelJanitors/Todo

Also: Documentation/stable_api_nonsense.txt

-- 
Ahmed S. Darwish
http://darwish-07.blogspot.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: after effects of a kernel API change

2007-01-17 Thread Greg KH

On Thu, Jan 18, 2007 at 09:45:04AM +0530, Daniel Rodrick wrote:
> Hi list,
> 
> Whenever there is a change in the kernel API (or a new API is
> introduced), all of the drivers that use the older API need to be
> changed (or recommended to be changed). I believe it is the
> responsibility of the person changing the kernel API, to change all
> the drivers that have found their way into the kernel code?

Yes, that is the case.

> How does this happen? Because the person who brought the change in the
> API might not know the internals of all the drivers?

But they know why they made the change, so it's usually pretty obvious.
If not, they merely ask for help from the original author / maintainer
of the code, but that doesn't happen very often.

> Is there any way volunteers like me can help in this exercise?

Sure, go through the kernel building all of the different arches and all
of the modules and report what breaks due to api changes.  The -mm tree
is the best place to test this stuff out, as that is where the changes
usually occur first.

good luck,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS causing oops when freeing namespace

2007-01-17 Thread Eric W. Biederman

Oleg Nesterov <[EMAIL PROTECTED]> writes:

> On 01/17, Eric W. Biederman wrote:
>>
>> Cedric Le Goater <[EMAIL PROTECTED]> writes:
>> >
>> > your first analysis was correct : exit_task_namespaces() should be moved 
>> > above exit_notify(tsk). It will require some extra fixes for nsproxy 
>> > though.
>> 
>> I think the only issue is the child_reaper and currently we only have one of
>> those.  When we really do the pid namespace we are going to have to revisit
>> this.  My gut feel says that we won't be able to exit our pid namespace until
>> the process is waited on.  So we may need to break up exit_task_namespace 
>> into
>> individual components.
>
> I agree, but please note that the child_reaper is not the only issue.

To be clear I believe the only issue keeping us from moving exit_namespaces
back where it used to be is the child reaper as that is the only part of
the pid namespace that has been implemented.  

There is more when we revisit this.
> Think
> about sub-thread which auto-reaps itself. I'd suggest to add the comment in
> do_exit() after exit_notify() to remind that the task is really dead now, it
> has no ->signal, it can't be seen in /proc/, we can't send a signal to it, 
> etc.

A very interesting case is what happens when we reparent a zombie.  I think
that needs the child reaper and it happens well after exit_namespaces is 
currently
being called.

In the very stupid test we need our struct pid that identifies the process until
we are reaped.  Therefore our pid namespace must continue to exist, even if we
don't keep a pointer to it in struct nsproxy.

A comment after exit_notify would certainly be useful.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH: Update disable_IO_APIC to use 8-bit destination field (X86_64)

2007-01-17 Thread Vivek Goyal

On Thu, Jan 18, 2007 at 09:11:53AM +0530, Vivek Goyal wrote:
> On Wed, Jan 17, 2007 at 12:08:48PM -0700, Eric W. Biederman wrote:
> > Benjamin Romer <[EMAIL PROTECTED]> writes:
> >
> > > On the Unisys ES7000/ONE system, we encountered a problem where
> > > performing a kexec reboot or dump on any cell other than cell 0 causes
> > > the system timer to stop working, resulting in a hang during timer
> > > calibration in the new kernel.
> > >
> > > We traced the problem to one line of code in disable_IO_APIC(), which
> > > needs to restore the timer's IO-APIC configuration before rebooting. The
> > > code is currently using the 4-bit physical destination field, rather
> > > than using the 8-bit logical destination field, and it cuts off the
> > > upper 4 bits of the timer's APIC ID. If we change this to use the
> > > logical destination field, the timer works and we can kexec on the upper
> > > cells. This was tested on two different cells (0 and 2) in an ES7000/ONE
> > > system.
> > >
> > > For reference, the relevant Intel xAPIC spec is kept at
> > > ftp://download.intel.com/design/chipsets/e8501/datashts/30962001.pdf,
> > > specifically on page 334.
> >
> > Looks like good bug hunting.  I will have to look but it might
> > make more sense to simply fix: struct IO_APIC_route_entry,
> > or use whatever technique we normally use to generate the io_apic
> > vectors.
> >
> > I don't recall enough off of the top of my head to recall what
> > the discrimination rule between logical and physical is but
> > I think setting the system in physical mode is a good clue :)
> 
> Hi Eric,
> 
> In physical destination mode, the destination APIC is determined by
> APIC ID and in logical destination mode, destination apics are determined
> by the configurations based on LDR and DFR registers in APIC (Depending
> on Flat mode or cluster mode).
> 
> Looks like previously one supported only 4bit apic ids if system is
> operating in physical mode and 8bit ids if IOAPIC is put in logical
> destination mode. That's why, struct IO_APIC_route_entry is containing
> 4bits for physical apic id.
> 
> http://www.intel.com/design/chipsets/datashts/290566.htm
> 
> And now newer systems have switched to 8bit apic ids in physical mode.
> That's why if somebody is crashing on a cpu whose apic id is more than
> 16, kexec/kdump code will fail as 4bits are not sufficient.
> 
> Hence above change makes sense. Given the fact that logical and physical
> apic id is basically a union, it will work even for older systems where
> physical apic ids were 4bits only.
> 
> OTOH, I think down the line we can get rid of physical dest
> field all together in struct IO_APIC_route_entry and use logical dest
> field.
> 

Or how about making physical_dest field also 8bit like logical_dest field.
This will work both for 4bit and 8bit physical apic ids at the same time
code becomes more intutive and it is easier to know whether IOAPIC is being
put in physical or destination logical mode.

Thanks
Vivek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: query related to serial console

2007-01-17 Thread Randy Dunlap

On Thu, 18 Jan 2007 04:10:17 + (GMT) Seetharam Dharmosoth wrote:

(please don't top-post)


> Generally sysrq will work with serial console right?
> 
> suppose system is connected through serial port to the
> other system, (ie serial console), at this time we can
> fire some set of commands through the serial console.
> 
> the sequesnce is as follows  
> do ctrl+]
> send brk
> then some commands
> 
> What is my question is con't we pass commands directly
> 
> to the console (without send brk signal) ?
> 
> This is a feature in Solris..
> 
> I am looking in Linux but, uable to find it.
> 
> can you please help me
> 
> Thanks
> Seetharam

Hi,
It's quite possible that I misunderstand your question,
but anyway:

Alt-Sysrq- is a route into the kernel sysrq handler instead
of a route into the shell that the serial console is connected to,
so something needs to signal that condition (like a BREAK).

Or a specialized (serial) console app could know other ways of
recognizing sysrq keys.  Or you could use /proc/sysrq-trigger:
echo b > /proc/sysrq-trigger


> --- Erik Mouw <[EMAIL PROTECTED]> wrote:
> 
> > On Wed, Jan 17, 2007 at 11:26:54AM +, Seetharam
> > Dharmosoth wrote:
> > > Is Linux having 'non-break interface for serial
> > > console' ?
> > 
> > No idea. Could you explain what a 'non-break
> > interface for serial
> > console' is?


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Vectored AIO breakage for sockets and pipes ?

2007-01-17 Thread Suparna Bhattacharya


The call to aio_advance_iovec() in aio_rw_vect_retry() becomes problematic
when it comes to pipe and socket operations which internally modify/advance
the iovec themselves. As a result AIO writes to sockets fail to return
the correct result. 

I'm not sure what the best way to fix this is. One option is to always make
a copy of the iovec and pass that down. Any other thoughts ?

Regards
Suparna

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: problems with latest smbfs changes on 2.4.34 and security backports

2007-01-17 Thread Willy Tarreau

Hi Grant !

On Thu, Jan 18, 2007 at 11:09:57AM +1100, Grant Coady wrote:
(...)
> > } else {
> >-mnt->file_mode = mnt->dir_mode = S_IRWXU | S_IRGRP | S_IXGRP |
> >-S_IROTH | S_IXOTH | S_IFREG;
> >-mnt->dir_mode = mnt->dir_mode = S_IRWXU | S_IRGRP | S_IXGRP |
> >-S_IROTH | S_IXOTH | S_IFDIR;
> >+mnt->file_mode = S_IRWXU | S_IRGRP | S_IXGRP |
> >+S_IROTH | S_IXOTH | S_IFREG | S_IFLNK;
> >+mnt->dir_mode = S_IRWXU | S_IRGRP | S_IXGRP |
> >+S_IROTH | S_IXOTH | S_IFDIR;
> > if (parse_options(mnt, raw_data))
> > goto out_bad_option;
> 
> I'm comparing 2.4.33.3 with 2.4.34, with 2.4.34 and above patch symlinks 
> to directories seen as target, nor can they be created (Operation not 
> permitted...)

Thanks very much Grant for the test. Could you try a symlink to a file ?
And while we're at it, would you like to try to add "|S_IFLNK" to
mnt->dir_mode ? If my suggestion was stupid, at least let's test it to
full extent ;-)

I had another idea looking at the code but since I really don't know it,
I would not like to propose random changes till this magically works. I'd
wait for Dann who understood the code.

Best regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

after effects of a kernel API change

2007-01-17 Thread Daniel Rodrick


Hi list,

Whenever there is a change in the kernel API (or a new API is
introduced), all of the drivers that use the older API need to be
changed (or recommended to be changed). I believe it is the
responsibility of the person changing the kernel API, to change all
the drivers that have found their way into the kernel code?

How does this happen? Because the person who brought the change in the
API might not know the internals of all the drivers?

Is there any way volunteers like me can help in this exercise?

Thanks,

Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: query related to serial console

2007-01-17 Thread Seetharam Dharmosoth

Generally sysrq will work with serial console right?

suppose system is connected through serial port to the
other system, (ie serial console), at this time we can
fire some set of commands through the serial console.

the sequesnce is as follows  
do ctrl+]
send brk
then some commands

What is my question is con't we pass commands directly

to the console (without send brk signal) ?

This is a feature in Solris..

I am looking in Linux but, uable to find it.

can you please help me

Thanks
Seetharam

--- Erik Mouw <[EMAIL PROTECTED]> wrote:

> On Wed, Jan 17, 2007 at 11:26:54AM +, Seetharam
> Dharmosoth wrote:
> > Is Linux having 'non-break interface for serial
> > console' ?
> 
> No idea. Could you explain what a 'non-break
> interface for serial
> console' is?
> 
> 
> Erik
> 
> -- 
> +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70
> 370 12 90 --
> | Lab address: Delftechpark 26, 2628 XH, Delft, The
> Netherlands
> -
> To unsubscribe from this list: send the line
> "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at 
> http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

__
Yahoo! India Answers: Share what you know. Learn something new
http://in.answers.yahoo.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH: Update disable_IO_APIC to use 8-bit destination field (X86_64)

2007-01-17 Thread Vivek Goyal

On Wed, Jan 17, 2007 at 12:08:48PM -0700, Eric W. Biederman wrote:
> Benjamin Romer <[EMAIL PROTECTED]> writes:
> 
> > On the Unisys ES7000/ONE system, we encountered a problem where
> > performing a kexec reboot or dump on any cell other than cell 0 causes
> > the system timer to stop working, resulting in a hang during timer
> > calibration in the new kernel.
> >
> > We traced the problem to one line of code in disable_IO_APIC(), which
> > needs to restore the timer's IO-APIC configuration before rebooting. The
> > code is currently using the 4-bit physical destination field, rather
> > than using the 8-bit logical destination field, and it cuts off the
> > upper 4 bits of the timer's APIC ID. If we change this to use the
> > logical destination field, the timer works and we can kexec on the upper
> > cells. This was tested on two different cells (0 and 2) in an ES7000/ONE
> > system.
> >
> > For reference, the relevant Intel xAPIC spec is kept at
> > ftp://download.intel.com/design/chipsets/e8501/datashts/30962001.pdf,
> > specifically on page 334.
> 
> Looks like good bug hunting.  I will have to look but it might
> make more sense to simply fix: struct IO_APIC_route_entry,
> or use whatever technique we normally use to generate the io_apic
> vectors.
> 
> I don't recall enough off of the top of my head to recall what
> the discrimination rule between logical and physical is but
> I think setting the system in physical mode is a good clue :)

Hi Eric,

In physical destination mode, the destination APIC is determined by 
APIC ID and in logical destination mode, destination apics are determined
by the configurations based on LDR and DFR registers in APIC (Depending
on Flat mode or cluster mode).

Looks like previously one supported only 4bit apic ids if system is
operating in physical mode and 8bit ids if IOAPIC is put in logical
destination mode. That's why, struct IO_APIC_route_entry is containing
4bits for physical apic id.

http://www.intel.com/design/chipsets/datashts/290566.htm

And now newer systems have switched to 8bit apic ids in physical mode.
That's why if somebody is crashing on a cpu whose apic id is more than
16, kexec/kdump code will fail as 4bits are not sufficient.

Hence above change makes sense. Given the fact that logical and physical
apic id is basically a union, it will work even for older systems where
physical apic ids were 4bits only.

OTOH, I think down the line we can get rid of physical dest
field all together in struct IO_APIC_route_entry and use logical dest
field.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] pci_bus conversion to struct device

2007-01-17 Thread Greg KH

On Wed, Jan 17, 2007 at 07:23:52PM -0700, Matthew Wilcox wrote:
> On Wed, Jan 17, 2007 at 04:53:45PM -0800, Greg KH wrote:
> > I'm trying to clean up all the usages of struct class_device to use
> > struct device, and I ran into the pci_bus code.  Right now you create a
> > symlink called "bridge" under the /sys/class/pci_bus/:XX/ directory
> > to the pci device that is the bridge.
> 
> I recommend we just delete the pci_bus class.  I don't think it serves
> any useful purpose.  The bridge can be inferred frmo the sysfs hierarchy
> (not to mention lspci will tell you).  The cpuaffinity file should be
> moved from the bus to the device -- it really doesn't make any sense to
> talk about which cpu a bus is affine to, only a device.

I would like to do that, but I want to make sure that no userspace tools
are using this information.

But, as Matt Dobson is now gone off somewhere in Europe, not doing Linux
stuff anymore, he's not going to answer this.  So I'll just make up a
removal patch and let it sit in -mm for a while to see if anything
breaks.

I really only think the big NUMA boxes care about that information, if
anything.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Dave Kleikamp

On Thu, 2007-01-18 at 10:46 +1100, Jens Axboe wrote:
> On Wed, Jan 17 2007, Dave Kleikamp wrote:
> > On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:
> >
> > > Can you try io_schedule() and verify that things just work?
> >
> > I actually did do that in the first place, but wondered if it was the
> > right thing to introduce the accounting changes that came with that.
> > I'll change it back to io_schedule() and test it again, just to make
> > sure.
> 
> It appears to be the correct change to me - you really are waiting for
> IO resources (otherwise it would not hang with the plug change), so
> doing an inc/dec of iowait around the schedule should be done.

Okay, here it is.

> > If that's the right fix, I can push it directly since it won't have any
> > dependencies on your patches.
> 
> Perfect!

It should make the next -mm.

JFS: call io_schedule() instead of schedule() to avoid deadlock

The introduction of Jens Axboe's explicit i/o plugging patches introduced a
deadlock in jfs.  This was caused by the process initiating I/O not
unplugging the queue before waiting on the commit thread.  The commit
thread itself was waiting for that I/O to complete.  Calling io_schedule()
rather than schedule() unplugs the I/O queue avoiding the deadlock, and it
appears to be the right function to call in any case.

Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>

---
commit 4aa0d230c2cfc1ac4bcf7c5466f9943cf14233a9
tree b873dce6146f4880c6c48ab53c0079566f52a60b
parent 82d5b9a7c63054a9a2cd838ffd177697f86e7e34
author Dave Kleikamp <[EMAIL PROTECTED]> Wed, 17 Jan 2007 21:18:35 -0600
committer Dave Kleikamp <[EMAIL PROTECTED]> Wed, 17 Jan 2007 21:18:35 -0600

 fs/jfs/jfs_lock.h |2 +-
 fs/jfs/jfs_metapage.c |2 +-
 fs/jfs/jfs_txnmgr.c   |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/jfs/jfs_lock.h b/fs/jfs/jfs_lock.h
index 7d78e83..df48ece 100644
--- a/fs/jfs/jfs_lock.h
+++ b/fs/jfs/jfs_lock.h
@@ -42,7 +42,7 @@ do {  \
if (cond)   \
break;  \
unlock_cmd; \
-   schedule(); \
+   io_schedule();  \
lock_cmd;   \
}   \
current->state = TASK_RUNNING;  \
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index ceaf03b..58deae0 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -56,7 +56,7 @@ static inline void __lock_metapage(struct metapage *mp)
set_current_state(TASK_UNINTERRUPTIBLE);
if (metapage_locked(mp)) {
unlock_page(mp->page);
-   schedule();
+   io_schedule();
lock_page(mp->page);
}
} while (trylock_metapage(mp));
diff --git a/fs/jfs/jfs_txnmgr.c b/fs/jfs/jfs_txnmgr.c
index d558e51..6988a10 100644
--- a/fs/jfs/jfs_txnmgr.c
+++ b/fs/jfs/jfs_txnmgr.c
@@ -135,7 +135,7 @@ static inline void TXN_SLEEP_DROP_LOCK(wait_queue_head_t * 
event)
add_wait_queue(event, );
set_current_state(TASK_UNINTERRUPTIBLE);
TXN_UNLOCK();
-   schedule();
+   io_schedule();
current->state = TASK_RUNNING;
remove_wait_queue(event, );
 }

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RPC][PATCH 2.6.20-rc5] limit total vfs page cache

2007-01-17 Thread Aubrey Li


Here is the newest patch against 2.6.20-rc5.
==

From ad9ca9a32bdcaddce9988afbf0187bfd04685a0c Mon Sep 17 00:00:00 2001

From: Aubrey.Li <[EMAIL PROTECTED]>
Date: Thu, 18 Jan 2007 11:08:31 +0800
Subject: [PATCH] Add an interface to limit total vfs page cache.
The default percent is using 90% memory for page cache.

Signed-off-by: Aubrey.Li <[EMAIL PROTECTED]>
---
include/linux/gfp.h |1 +
include/linux/pagemap.h |2 +-
include/linux/sysctl.h  |2 ++
kernel/sysctl.c |   11 +++
mm/page_alloc.c |   17 +++--
5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 00c314a..531360e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -46,6 +46,7 @@ struct vm_area_struct;
#define __GFP_NOMEMALLOC ((__force gfp_t)0x1u) /* Don't use
emergency reserves */
#define __GFP_HARDWALL   ((__force gfp_t)0x2u) /* Enforce
hardwall cpuset memory allocs */
#define __GFP_THISNODE  ((__force gfp_t)0x4u)/* No fallback, no policies */
+#define __GFP_PAGECACHE((__force gfp_t)0x8u) /* Is page cache
allocation ? */

#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c3e255b..890bb23 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -62,7 +62,7 @@ static inline struct page *__page_cache_

static inline struct page *page_cache_alloc(struct address_space *x)
{
-   return __page_cache_alloc(mapping_gfp_mask(x));
+   return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_PAGECACHE);
}

static inline struct page *page_cache_alloc_cold(struct address_space *x)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 81480e6..d3c9174 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -202,6 +202,7 @@ enum
VM_PANIC_ON_OOM=33, /* panic at out-of-memory */
VM_VDSO_ENABLED=34, /* map VDSO into new processes? */
VM_MIN_SLAB=35,  /* Percent pages ignored by zone reclaim */
+   VM_PAGECACHE_RATIO=36,  /* percent of RAM to use as page cache */
};


@@ -955,6 +956,7 @@ extern ctl_handler sysctl_string;
extern ctl_handler sysctl_intvec;
extern ctl_handler sysctl_jiffies;
extern ctl_handler sysctl_ms_jiffies;
+extern int sysctl_pagecache_ratio;


/*
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 600b333..92db115 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1035,6 +1035,17 @@ static ctl_table vm_table[] = {
.extra1 = ,
},
#endif
+   {
+   .ctl_name   = VM_PAGECACHE_RATIO,
+   .procname   = "pagecache_ratio",
+   .data   = _pagecache_ratio,
+   .maxlen = sizeof(sysctl_pagecache_ratio),
+   .mode   = 0644,
+   .proc_handler   = _dointvec_minmax,
+   .strategy   = _intvec,
+   .extra1 = ,
+.extra2 = _hundred,
+   },
{ .ctl_name = 0 }
};

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fc5b544..5802b39 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -82,6 +82,8 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
#endif
};

+int sysctl_pagecache_ratio = 10;
+
EXPORT_SYMBOL(totalram_pages);

static char * const zone_names[MAX_NR_ZONES] = {
@@ -895,6 +897,7 @@ failed:
#define ALLOC_HARDER0x10 /* try to alloc harder */
#define ALLOC_HIGH  0x20 /* __GFP_HIGH set */
#define ALLOC_CPUSET0x40 /* check for correct cpuset */
+#define ALLOC_PAGECACHE0x80 /* __GFP_PAGECACHE set */

#ifdef CONFIG_FAIL_PAGE_ALLOC

@@ -998,6 +1001,9 @@ int zone_watermark_ok(struct zone *z, in
if (alloc_flags & ALLOC_HARDER)
min -= min / 4;

+   if (alloc_flags & ALLOC_PAGECACHE)
+   min = min + (sysctl_pagecache_ratio * z->present_pages) / 100;
+
if (free_pages <= min + z->lowmem_reserve[classzone_idx])
return 0;
for (o = 0; o < order; o++) {
@@ -1236,8 +1242,12 @@ restart:
return NULL;
}

-   page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
-   zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET);
+   if (gfp_mask & __GFP_PAGECACHE) 
+   page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
+   zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_PAGECACHE);
+   else
+   page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
+   zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET);
if (page)
goto got_pg;

@@ -1273,6 +1283,9 @@ restart:
if (wait)
alloc_flags |= ALLOC_CPUSET;

+   if (gfp_mask & __GFP_PAGECACHE)
+

Re: [RFC] pci_bus conversion to struct device

2007-01-17 Thread Matthew Wilcox

On Wed, Jan 17, 2007 at 04:53:45PM -0800, Greg KH wrote:
> I'm trying to clean up all the usages of struct class_device to use
> struct device, and I ran into the pci_bus code.  Right now you create a
> symlink called "bridge" under the /sys/class/pci_bus/:XX/ directory
> to the pci device that is the bridge.

I recommend we just delete the pci_bus class.  I don't think it serves
any useful purpose.  The bridge can be inferred frmo the sysfs hierarchy
(not to mention lspci will tell you).  The cpuaffinity file should be
moved from the bus to the device -- it really doesn't make any sense to
talk about which cpu a bus is affine to, only a device.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: block_device usage and incorrect block writes

2007-01-17 Thread Jens Axboe

On Wed, Jan 17 2007, Chris Frost wrote:
> We are working on a kernel module which uses the linux block device
> interface as part of a larger project, are seeing unexpected block
> write behavior from our usage of the noop scheduler, and were
> wondering whether anyone might have feedback on what the behavior we
> see?
> 
> We would like to send block writes such that they are written to the
> drive controller in fifo order, so we are using the noop scheduler.
> However, a small percentage (1-5 of ~50,000) of block writes end up
> with incorrect data on the disk. We have determined that for each of
> these incorrect blocks, the last write for the block was issued while
> a previous write to the block was still queued (that is, the bio end
> function had not yet been called) and that the next to last issued
> write (that is, the generic_make_request function call) for the block
> contains the data that ends up on the disk.

noop doesn't guarentee that IO will be queued with the device in the
order in which they are submitted, and it definitely doesn't guarentee
that the device will process them in the order in which they are
dispatched. noop being FIFO basically means that it will not sort
requests. You can still have reordering if one request gets merged with
another, for instance.

The block layer in general provides no guarentees about ordering of
requests, unless you use barriers. So if you require ordering across a
given write request, it needs to be a write barrier.

> A possibly related (and unexpected) behavior we have noticed is that
> the bio end function is not always called in the same order as our
> calls to generic_make_request(). We are not sure whether this
> indicates that the requests are being written to disk in the callback
> order, but would like to fix this if so (since we want the writes made
> in the order of our requests).

The drive could complete requests in any order it sees fit, within the
depth level of the drive. If write caching is enabled, it can reorder
writes easily.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How to flush the disk write cache from userspace

2007-01-17 Thread Ricardo Correia

On Tuesday 16 January 2007 00:38, you wrote:
> As always with these things, the devil is in the details. It requires
> the device to support a ->prepare_flush() queue hook, and not all
> devices do that. It will work for IDE/SATA/SCSI, though. In some devices
> you don't want/need to do a real disk flush, it depends on the write
> cache settings, battery backing, etc.

Is there any chance that someone could implement this (I don't have the 
skills, unfortunately)? Maybe add a new ioctl() to block devices, so that it 
doesn't break any existing code?

I believe it's a very useful (and relatively simple) feature that increases 
data integrity and reliability for applications that need this functionality.

I think it must be considered that most people have disk write caches enabled 
and are using IDE, SATA or SCSI disks.

I also think there's no point in disabling disks' write caches, since it slows 
writes and decreases disks' lifetime, and because there's a better solution.

Personally, I'm not really interested in specific filesystem behaviour, since 
my application uses block devices directly (it's a filesystem itself). 
Although I think all filesystems should guarantee data integrity in the face 
of fsync() or metadata modifications, even if it costs a little performance.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

block_device usage and incorrect block writes

2007-01-17 Thread Chris Frost

We are working on a kernel module which uses the linux block device
interface as part of a larger project, are seeing unexpected block write
behavior from our usage of the noop scheduler, and were wondering whether
anyone might have feedback on what the behavior we see?

We would like to send block writes such that they are written to the
drive controller in fifo order, so we are using the noop scheduler.
However, a small percentage (1-5 of ~50,000) of block writes
end up with incorrect data on the disk. We have determined that for each
of these incorrect blocks, the last write for the block was issued while
a previous write to the block was still queued (that is, the bio end
function had not yet been called) and that the next to last issued
write (that is, the generic_make_request function call) for the block contains
the data that ends up on the disk.

Here are more details we have noticed. The behavior appears to be
timing sensitive; multiple runs each may or may not work and
introducing slow downs (i.e. write barriers or many printks) make the
problem disappear with high certainty. About 2% of our block device
writes are writes to a block when there is still a write to that block
in the queue. About 0.3% of these 2% result in incorrect data on
the disk.

A possibly related (and unexpected) behavior we have noticed is that the
bio end function is not always called in the same order as our calls
to generic_make_request(). We are not sure whether this indicates that
the requests are being written to disk in the callback order, but would
like to fix this if so (since we want the writes made in the order of our
requests).

Below is the essence of our write code. We also make read requests, but do
not include the read code below.

struct block_device *dev;

int my_end(struct bio *bio, unsigned int done, int error)
{
if (bio->bi_size)
return 1; /* everyone else (in 2.6.12) returns in this case */
__free_page(bio_iovec_idx(bio, 0)->bv_page);
bio_iovec_idx(bio, 0)->bv_page = NULL:
bio_iovec_idx(bio, 0)->bv_len = 0:
bio_iovec_idx(bio, 0)->bv_offset = 0:
bio_put(bio);
return error;
}

void write_block(...)
{
struct bio *bio = bio_alloc(GFP_KERNEL, 1);

struct bio_vec *bv = bio_iovec_idx(bio, 0);
bv->bv_page = alloc_page(GFP_KERNEL | GFP_DMA);
memcpy(page_address(bv->bv_page), ...);
bv->bv_len = ...;
bv->bv_offset = 0;

bio->bi_idx = 0;
bio->bi_vcnt = 1;
bio->bi_sector = ...;
bio->bi_size = ...;
bio->bi_bdev = dev;
bio->bi_rw = WRITE;
bio->bi_end_io = my_end;

generic_make_request(bio);
}

void init(const char *path)
{
path_lookup(path, LOOKUP_FOLLOW, nd);
dev = open_by_devnum(nd.dentry->d_inode->i_rde, mode);
bd_claim(dev, claimer);
}

thanks in advance for any feedback!
-- 
Chris Frost  |  
-+--
Public PGP Key:
   Email [EMAIL PROTECTED] with the subject "retrieve pgp key"
   or visit 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/8] Cpuset aware writeback

2007-01-17 Thread Andrew Morton

> On Wed, 17 Jan 2007 17:10:25 -0800 (PST) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> On Wed, 17 Jan 2007, Andrew Morton wrote:
> 
> > > The inode lock is not taken when the page is dirtied.
> > 
> > The inode_lock is taken when the address_space's first page is dirtied.  It 
> > is
> > also taken when the address_space's last dirty page is cleaned.  So the 
> > place
> > where the inode is added to and removed from sb->s_dirty is, I think, 
> > exactly
> > the place where we want to attach and detach 
> > address_space.dirty_page_nodemask.
> 
> The problem there is that we do a GFP_ATOMIC allocation (no allocation 
> context) that may fail when the first page is dirtied. We must therefore 
> be able to subsequently allocate the nodemask_t in set_page_dirty(). 
> Otherwise the first failure will mean that there will never be a dirty 
> map for the inode/mapping.

True.  But it's pretty simple to change __mark_inode_dirty() to fix this.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19.1, sata_sil: sata dvd writer doesn't work

2007-01-17 Thread Tejun Heo

Harald Dunkel wrote:
> How comes that there is no such problem if I connect the drive
> via an USB SATA adapter?

Ah... right.  I forgot about that.  Scrap my analysis.  What happens is
really weird tho.

> Do you think it would be reasonable to send a bug report to Samsung,
> and see what they say? I would need some documentation about these
> MMC commands, though. Is this part of some "Red Book" standard, or
> so?

Yeap, reporting is probably a good idea.  libdvdread ppl would be
interested too.  MMC is SCSI command set standard for ODDs and can be
found at t10.org.

I don't think we can proceed with kernel debugging before gathering more
info about this problem.  Feel free to cc me when you ask people about
this problem.  I really like the dvd writer and would love to see the
problem ironed out.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/8] Cpuset aware writeback

2007-01-17 Thread Christoph Lameter

On Wed, 17 Jan 2007, Andrew Morton wrote:

> > The inode lock is not taken when the page is dirtied.
> 
> The inode_lock is taken when the address_space's first page is dirtied.  It is
> also taken when the address_space's last dirty page is cleaned.  So the place
> where the inode is added to and removed from sb->s_dirty is, I think, exactly
> the place where we want to attach and detach 
> address_space.dirty_page_nodemask.

The problem there is that we do a GFP_ATOMIC allocation (no allocation 
context) that may fail when the first page is dirtied. We must therefore 
be able to subsequently allocate the nodemask_t in set_page_dirty(). 
Otherwise the first failure will mean that there will never be a dirty 
map for the inode/mapping.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] pci_bus conversion to struct device

2007-01-17 Thread Greg KH

Hi Matt,

I'm trying to clean up all the usages of struct class_device to use
struct device, and I ran into the pci_bus code.  Right now you create a
symlink called "bridge" under the /sys/class/pci_bus/:XX/ directory
to the pci device that is the bridge.

This is messy to try to convert to struct device, but I have hack of a
patch below.  I had some questions for you:
  - is there userspace tools that use the 'bridge' symlink?
  - do we really need a separate device here?  If I just create a tree
of symlinks to the pci devices that the bridges are, and add the
sysfs attributes that are currently in the class_device, to the
pci_device location, will that be acceptable?

Any thoughts you have about this would be appreciated.

thanks,

greg k-h

---
 drivers/pci/bus.c|   10 -
 drivers/pci/pci.h|2 -
 drivers/pci/probe.c  |   54 +--
 drivers/pci/remove.c |7 ++
 include/linux/pci.h  |4 +--
 5 files changed, 38 insertions(+), 39 deletions(-)

--- gregkh-2.6.orig/drivers/pci/bus.c
+++ gregkh-2.6/drivers/pci/bus.c
@@ -138,11 +138,11 @@ void __devinit pci_bus_add_devices(struc
   up_write(_bus_sem);
}
pci_bus_add_devices(dev->subordinate);
-   retval = 
sysfs_create_link(>subordinate->class_dev.kobj,
-  >dev.kobj, "bridge");
-   if (retval)
-   dev_err(>dev, "Error creating sysfs "
-   "bridge symlink, continuing...\n");
+// retval = 
sysfs_create_link(>subordinate->class_dev.kobj,
+//>dev.kobj, "bridge");
+// if (retval)
+// dev_err(>dev, "Error creating sysfs "
+// "bridge symlink, continuing...\n");
}
}
 }
--- gregkh-2.6.orig/drivers/pci/pci.h
+++ gregkh-2.6/drivers/pci/pci.h
@@ -78,7 +78,7 @@ static inline int pci_no_d1d2(struct pci
 }
 extern int pcie_mch_quirk;
 extern struct device_attribute pci_dev_attrs[];
-extern struct class_device_attribute class_device_attr_cpuaffinity;
+extern struct device_attribute dev_attr_cpuaffinity;
 
 /**
  * pci_match_one_device - Tell if a PCI device structure has a matching
--- gregkh-2.6.orig/drivers/pci/probe.c
+++ gregkh-2.6/drivers/pci/probe.c
@@ -42,7 +42,7 @@ static void pci_create_legacy_files(stru
b->legacy_io->attr.owner = THIS_MODULE;
b->legacy_io->read = pci_read_legacy_io;
b->legacy_io->write = pci_write_legacy_io;
-   class_device_create_bin_file(>class_dev, b->legacy_io);
+   device_create_bin_file(>dev, b->legacy_io);
 
/* Allocated above after the legacy_io struct */
b->legacy_mem = b->legacy_io + 1;
@@ -51,15 +51,15 @@ static void pci_create_legacy_files(stru
b->legacy_mem->attr.mode = S_IRUSR | S_IWUSR;
b->legacy_mem->attr.owner = THIS_MODULE;
b->legacy_mem->mmap = pci_mmap_legacy_mem;
-   class_device_create_bin_file(>class_dev, b->legacy_mem);
+   device_create_bin_file(>dev, b->legacy_mem);
}
 }
 
 void pci_remove_legacy_files(struct pci_bus *b)
 {
if (b->legacy_io) {
-   class_device_remove_bin_file(>class_dev, b->legacy_io);
-   class_device_remove_bin_file(>class_dev, b->legacy_mem);
+   device_remove_bin_file(>dev, b->legacy_io);
+   device_remove_bin_file(>dev, b->legacy_mem);
kfree(b->legacy_io); /* both are allocated here */
}
 }
@@ -71,26 +71,27 @@ void pci_remove_legacy_files(struct pci_
 /*
  * PCI Bus Class Devices
  */
-static ssize_t pci_bus_show_cpuaffinity(struct class_device *class_dev,
+static ssize_t pci_bus_show_cpuaffinity(struct device *dev,
+   struct device_attribute *attr,
char *buf)
 {
int ret;
cpumask_t cpumask;
 
-   cpumask = pcibus_to_cpumask(to_pci_bus(class_dev));
+   cpumask = pcibus_to_cpumask(to_pci_bus(dev));
ret = cpumask_scnprintf(buf, PAGE_SIZE, cpumask);
if (ret < PAGE_SIZE)
buf[ret++] = '\n';
return ret;
 }
-CLASS_DEVICE_ATTR(cpuaffinity, S_IRUGO, pci_bus_show_cpuaffinity, NULL);
+DEVICE_ATTR(cpuaffinity, S_IRUGO, pci_bus_show_cpuaffinity, NULL);
 
 /*
  * PCI Bus Class
  */
-static void release_pcibus_dev(struct class_device *class_dev)
+static void release_pcibus_dev(struct device *dev)
 {
-   struct pci_bus *pci_bus = to_pci_bus(class_dev);
+   struct pci_bus *pci_bus = to_pci_bus(dev);
 
if (pci_bus->bridge)
put_device(pci_bus->bridge);
@@ -99,7 +100,7 @@ static void release_pcibus_dev(struct

[PATCH] futex null pointer timeout

2007-01-17 Thread Daniel Walker

This fix is mostly from Thomas .. 

The problem was that a futex can be called with a zero timeout (0 seconds,
0 nanoseconds) and it's a valid expired timeout. However, the current futex
in -rt assumes a zero timeout is an infinite timeout. 

Kevin Hilman found this using LTP's nptl01 test case which would
soft hang occasionally. 

The patch reworks do_futex, and futex_wait* so a NULL pointer in the timeout
position is infinite, and anything else is evaluated as a real timeout.

Signed-Off-By: Daniel Walker <[EMAIL PROTECTED]>

---
 kernel/futex.c|   14 --
 kernel/futex_compat.c |5 +++--
 2 files changed, 11 insertions(+), 8 deletions(-)

Index: linux-2.6.19/kernel/futex.c
===
--- linux-2.6.19.orig/kernel/futex.c
+++ linux-2.6.19/kernel/futex.c
@@ -1466,7 +1466,7 @@ static int futex_wait(u32 __user *uaddr,
 
current->flags &= ~PF_NOSCHED;
 
-   if (time->tv_sec == 0 && time->tv_nsec == 0)
+   if (!time)
schedule();
else {
to = 
@@ -3070,7 +3070,7 @@ static int futex_wait64(u64 __user *uadd
 
current->flags &= ~PF_NOSCHED;
 
-   if (time->tv_sec == 0 && time->tv_nsec == 0)
+   if (!time)
schedule();
else {
to = 
@@ -3560,7 +3560,7 @@ asmlinkage long
 sys_futex64(u64 __user *uaddr, int op, u64 val,
struct timespec __user *utime, u64 __user *uaddr2, u64 val3)
 {
-   struct timespec t = {.tv_sec=0, .tv_nsec=0};
+   struct timespec t, *tp = NULL;
u64 val2 = 0;
 
if (utime && (op == FUTEX_WAIT || op == FUTEX_LOCK_PI)) {
@@ -3568,6 +3568,7 @@ sys_futex64(u64 __user *uaddr, int op, u
return -EFAULT;
if (!timespec_valid())
return -EINVAL;
+   tp = 
}
/*
 * requeue parameter in 'utime' if op == FUTEX_REQUEUE.
@@ -3576,7 +3577,7 @@ sys_futex64(u64 __user *uaddr, int op, u
|| op == FUTEX_CMP_REQUEUE_PI)
val2 = (unsigned long) utime;
 
-   return do_futex64(uaddr, op, val, , uaddr2, val2, val3);
+   return do_futex64(uaddr, op, val, tp, uaddr2, val2, val3);
 }
 
 #endif
@@ -3585,7 +3586,7 @@ asmlinkage long sys_futex(u32 __user *ua
  struct timespec __user *utime, u32 __user *uaddr2,
  u32 val3)
 {
-   struct timespec t = {.tv_sec=0, .tv_nsec=0};
+   struct timespec t, *tp = NULL;
u32 val2 = 0;
 
if (utime && (op == FUTEX_WAIT || op == FUTEX_LOCK_PI)) {
@@ -3593,6 +3594,7 @@ asmlinkage long sys_futex(u32 __user *ua
return -EFAULT;
if (!timespec_valid())
return -EINVAL;
+   tp = 
}
/*
 * requeue parameter in 'utime' if op == FUTEX_REQUEUE.
@@ -3601,7 +3603,7 @@ asmlinkage long sys_futex(u32 __user *ua
|| op == FUTEX_CMP_REQUEUE_PI)
val2 = (u32) (unsigned long) utime;
 
-   return do_futex(uaddr, op, val, , uaddr2, val2, val3);
+   return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
 }
 
 static int futexfs_get_sb(struct file_system_type *fs_type,
Index: linux-2.6.19/kernel/futex_compat.c
===
--- linux-2.6.19.orig/kernel/futex_compat.c
+++ linux-2.6.19/kernel/futex_compat.c
@@ -141,7 +141,7 @@ asmlinkage long compat_sys_futex(u32 __u
struct compat_timespec __user *utime, u32 __user *uaddr2,
u32 val3)
 {
-   struct timespec t = {.tv_sec = 0, .tv_nsec = 0};
+   struct timespec t, *tp = NULL;
int val2 = 0;
 
if (utime && (op == FUTEX_WAIT || op == FUTEX_LOCK_PI)) {
@@ -149,10 +149,11 @@ asmlinkage long compat_sys_futex(u32 __u
return -EFAULT;
if (!timespec_valid())
return -EINVAL;
+   tp = 
}
if (op == FUTEX_REQUEUE || op == FUTEX_CMP_REQUEUE
|| op == FUTEX_CMP_REQUEUE_PI)
val2 = (int) (unsigned long) utime;
 
-   return do_futex(uaddr, op, val, , uaddr2, val2, val3);
+   return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
 }
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.20-rc5 4/4] sys_futex64 : allows 64bit futexes

2007-01-17 Thread Christoph Hellwig

On Wed, Jan 17, 2007 at 10:04:53AM +0100, Pierre Peiffer wrote:
> Hi,
> 
> This latest patch is an adaptation of the sys_futex64 syscall provided in 
> -rt
> patch (originally written by Ingo). It allows the use of 64bit futex.

Big NACK here, we don't need yet another goddamn multiplexer.  Please
make this individual syscalls for the actual operations.

> + if (!ret) {
> + switch (cmp) {
> + case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
> + case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;

Please indent this properly, the ret = .. and reak need to go onto
a line on it's own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Weird harddisk behaviour

2007-01-17 Thread Ken Moffat

On Wed, Jan 17, 2007 at 11:09:21AM +0100, Turbo Fredriksson wrote:
> Quoting Ken Moffat <[EMAIL PROTECTED]>:
> 
> >  Certainly, fdisk from util-linux doesn't know about mac disks, and
> > I thought the same was true for cfdisk and sfdisk.  Many years ago
> > there was mac-fdisk, I think also known as pdisk, but nowadays the
> > common tool for partitioning mac disks is probably parted.
> 
> Yes. See now that 'fdisk' is a softlink to 'mac-fdisk'...
> 

 Sorry for not replying earlier, cutting the Cc: list on lkml is not
always conducive to quick replies.

 So, you were using a valid tool, but what you put in your original
mail shows garbage - something like apple_partition_ma[mamama...
followed later by some garbage which could admittedly have been UTF-8
getting trashed in the mail.  I'm on my ibook at the moment, which
has an old debian mac-fdisk on another partition.  If I chroot to
that and look at the disk I see things like

/dev/hda
#type name length   base ( size 
)  system
/dev/hda1 Apple_partition_map Apple63 @ 1( 
31.5k)  Partition map
/dev/hda2 Apple_Bootstrap untitled   1954 @ 64   
(977.0k)  NewWorld bootblock

 and so forth.  Notice that everything there is in legible ascii and
can be read with sensible values.  If what you actually see is
similar, then it's just a problem in the mail.  But if it isn't,
somehow the data on the disk (or the data being read from it) is
corrupt.

ĸen
-- 
das eine Mal als Tragödie, das andere Mal als Farce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Christoph Hellwig

On Wed, Jan 17, 2007 at 11:17:37AM -0500, Alan Stern wrote:
> I'll be happy to move this over to the utrace setting, once it is merged.  
> Do you think it would be better to include the current version of kwatch 
> now or to wait for utrace?
> 
> Roland, is there a schedule for when you plan to get utrace into -mm?

Even if it goes into mainline soon we'll need a lot of time for all
architectures to catch up, so I think kwatch should definitely comes first.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -mm] sata_nv: cleanup ADMA error handling v2

2007-01-17 Thread Robert Hancock


Should apply to -mm tree or current libata-dev git tree. This version
leaves out part of a change in the first version which in retrospect
should have been left alone.

---

This cleans up a few issues with the error handling in sata_nv in ADMA
mode to make it more consistent with other NCQ-capable drivers like ahci
and sata_sil24:

-When a command failed, we would effectively set AC_ERR_DEV on the
queued command always. In the case of NCQ commands this prevents libata
from doing a log page query to determine the details of the failed
command, since it thinks we've already analyzed. Just set flags in the
port ehi->err_mask, then freeze or abort and let libata figure out what
went wrong.

-The code handled NV_ADMA_STAT_CPBERR as a "really bad error" which
caused it to set error flags on every queued command. I don't know
exactly what this flag means (no docs, grr!) but from what I can guess
from the standard ADMA spec, it just means that one or more of the CPBs
had an error, so we just need to go through and do our normal checks in
this case.

-In the error_handler function the code would always dump the state of
all the CPBs. This output seems redundant at this point since libata
already dumps the state of all active commands on errors (and it also
triggers at times when it shouldn't, like when suspending). Take this
out.

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.20-rc5/drivers/ata/sata_nv.c  2007-01-12 23:18:08.0 
-0600
+++ linux-2.6.20-rc5edit/drivers/ata/sata_nv.c  2007-01-17 17:22:05.0 
-0600
@@ -646,53 +646,64 @@ static unsigned int nv_adma_tf_to_cpb(st
return idx;
 }
 
-static void nv_adma_check_cpb(struct ata_port *ap, int cpb_num, int force_err)
+static int nv_adma_check_cpb(struct ata_port *ap, int cpb_num, int force_err)
 {
struct nv_adma_port_priv *pp = ap->private_data;
-   int complete = 0, have_err = 0;
u8 flags = pp->cpb[cpb_num].resp_flags;
 
VPRINTK("CPB %d, flags=0x%x\n", cpb_num, flags);
 
-   if (flags & NV_CPB_RESP_DONE) {
-   VPRINTK("CPB flags done, flags=0x%x\n", flags);
-   complete = 1;
-   }
-   if (flags & NV_CPB_RESP_ATA_ERR) {
-   ata_port_printk(ap, KERN_ERR, "CPB flags ATA err, 
flags=0x%x\n", flags);
-   have_err = 1;
-   complete = 1;
-   }
-   if (flags & NV_CPB_RESP_CMD_ERR) {
-   ata_port_printk(ap, KERN_ERR, "CPB flags CMD err, 
flags=0x%x\n", flags);
-   have_err = 1;
-   complete = 1;
-   }
-   if (flags & NV_CPB_RESP_CPB_ERR) {
-   ata_port_printk(ap, KERN_ERR, "CPB flags CPB err, 
flags=0x%x\n", flags);
-   have_err = 1;
-   complete = 1;
+   if(unlikely((force_err ||
+flags & (NV_CPB_RESP_ATA_ERR |
+ NV_CPB_RESP_CMD_ERR |
+ NV_CPB_RESP_CPB_ERR {
+   struct ata_eh_info *ehi = >eh_info;
+   int freeze = 0;
+   ata_ehi_clear_desc(ehi);
+   ata_ehi_push_desc(ehi, "CPB resp_flags 0x%x", flags );
+   if(flags & NV_CPB_RESP_ATA_ERR) {
+   ata_ehi_push_desc(ehi, ": ATA error");
+   ehi->err_mask |= AC_ERR_DEV;
+   }
+   else if(flags & NV_CPB_RESP_CMD_ERR) {
+   ata_ehi_push_desc(ehi, ": CMD error");
+   ehi->err_mask |= AC_ERR_DEV;
+   }
+   else if(flags & NV_CPB_RESP_CPB_ERR) {
+   ata_ehi_push_desc(ehi, ": CPB error");
+   ehi->err_mask |= AC_ERR_SYSTEM;
+   freeze = 1;
+   }
+   else {
+   /* notifier error, but no error in CPB flags? */
+   ehi->err_mask |= AC_ERR_OTHER;
+   freeze = 1;
+   }
+   /* Kill all commands. EH will determine what actually failed. */
+   if(freeze)
+   ata_port_freeze(ap);
+   else
+   ata_port_abort(ap);
+   return 1;
}
-   if(complete || force_err)
-   {
+   
+   if (flags & NV_CPB_RESP_DONE) {
struct ata_queued_cmd *qc = ata_qc_from_tag(ap, cpb_num);
+   VPRINTK("CPB flags done, flags=0x%x\n", flags);
if(likely(qc)) {
-   u8 ata_status = 0;
-   /* Only use the ATA port status for non-NCQ commands.
+   /* Grab the ATA port status for non-NCQ commands.
   For NCQ commands the current status may have nothing 
to do with
   the command just completed. */
-   if(qc->tf.protocol != ATA_PROT_NCQ)
-   ata_status = readb(pp->ctl_block + 
(ATA_REG_STATUS * 4));
-
-

[PATCH 2.6.20-rc5 01/01] usb: Sierra Wireless auto set D0

2007-01-17 Thread Kevin Lloyd


from: Kevin Lloyd <[EMAIL PROTECTED]>

This patch ensures that the device is turned on when inserted into the system.
It also adds more VID/PIDs and matches the N_OUT_URB with the airprime driver.

Signed-off-by: Kevin Lloyd <[EMAIL PROTECTED]>

--- 


--- linux-2.6.20-rc5/drivers/usb/serial/sierra.c.orig   2007-01-15 
15:17:15.0 -0800
+++ linux-2.6.20-rc5/drivers/usb/serial/sierra.c2007-01-17 
15:41:59.0 -0800
@@ -13,10 +13,9 @@
  Portions based on the option driver by Matthias Urlichs <[EMAIL PROTECTED]>
  Whom based his on the Keyspan driver by Hugh Blemings <[EMAIL PROTECTED]>

-  History:
*/

-#define DRIVER_VERSION "v.1.0.5"
+#define DRIVER_VERSION "v.1.0.6"
#define DRIVER_AUTHOR "Kevin Lloyd <[EMAIL PROTECTED]>"
#define DRIVER_DESC "USB Driver for Sierra Wireless USB modems"

@@ -31,14 +30,15 @@


static struct usb_device_id id_table [] = {
+   { USB_DEVICE(0x1199, 0x0017) }, /* Sierra Wireless EM5625 */
{ USB_DEVICE(0x1199, 0x0018) }, /* Sierra Wireless MC5720 */
+   { USB_DEVICE(0x1199, 0x0218) }, /* Sierra Wireless MC5720 */
{ USB_DEVICE(0x1199, 0x0020) }, /* Sierra Wireless MC5725 */
-   { USB_DEVICE(0x1199, 0x0017) }, /* Sierra Wireless EM5625 */
{ USB_DEVICE(0x1199, 0x0019) }, /* Sierra Wireless AirCard 595 */
-   { USB_DEVICE(0x1199, 0x0218) }, /* Sierra Wireless MC5720 */
+   { USB_DEVICE(0x1199, 0x0021) }, /* Sierra Wireless AirCard 597E */
{ USB_DEVICE(0x1199, 0x6802) }, /* Sierra Wireless MC8755 */
+   { USB_DEVICE(0x1199, 0x6804) }, /* Sierra Wireless MC8755 */
{ USB_DEVICE(0x1199, 0x6803) }, /* Sierra Wireless MC8765 */
-   { USB_DEVICE(0x1199, 0x6804) }, /* Sierra Wireless MC8755 for Europe */
{ USB_DEVICE(0x1199, 0x6812) }, /* Sierra Wireless MC8775 */
{ USB_DEVICE(0x1199, 0x6820) }, /* Sierra Wireless AirCard 875 */

@@ -55,14 +55,15 @@ static struct usb_device_id id_table_1po
};

static struct usb_device_id id_table_3port [] = {
+   { USB_DEVICE(0x1199, 0x0017) }, /* Sierra Wireless EM5625 */
{ USB_DEVICE(0x1199, 0x0018) }, /* Sierra Wireless MC5720 */
+   { USB_DEVICE(0x1199, 0x0218) }, /* Sierra Wireless MC5720 */
{ USB_DEVICE(0x1199, 0x0020) }, /* Sierra Wireless MC5725 */
-   { USB_DEVICE(0x1199, 0x0017) }, /* Sierra Wireless EM5625 */
{ USB_DEVICE(0x1199, 0x0019) }, /* Sierra Wireless AirCard 595 */
-   { USB_DEVICE(0x1199, 0x0218) }, /* Sierra Wireless MC5720 */
+   { USB_DEVICE(0x1199, 0x0021) }, /* Sierra Wireless AirCard 597E */
{ USB_DEVICE(0x1199, 0x6802) }, /* Sierra Wireless MC8755 */
+   { USB_DEVICE(0x1199, 0x6804) }, /* Sierra Wireless MC8755 */
{ USB_DEVICE(0x1199, 0x6803) }, /* Sierra Wireless MC8765 */
-   { USB_DEVICE(0x1199, 0x6804) }, /* Sierra Wireless MC8755 for Europe */
{ USB_DEVICE(0x1199, 0x6812) }, /* Sierra Wireless MC8775 */
{ USB_DEVICE(0x1199, 0x6820) }, /* Sierra Wireless AirCard 875 */
{ }
@@ -81,7 +82,7 @@ static int debug;

/* per port private data */
#define N_IN_URB4
-#define N_OUT_URB  1
+#define N_OUT_URB  4
#define IN_BUFLEN   4096
#define OUT_BUFLEN  128

@@ -396,6 +397,8 @@ static int sierra_open(struct usb_serial
struct usb_serial *serial = port->serial;
int i, err;
struct urb *urb;
+   int result;
+   __u16 set_mode_dzero = 0x;

portdata = usb_get_serial_port_data(port);

@@ -442,6 +445,11 @@ static int sierra_open(struct usb_serial

port->tty->low_latency = 1;

+   /*set mode to D0 */
+   result = usb_control_msg(serial->dev,
+   usb_rcvctrlpipe(serial->dev, 0),
+   0x00,0x40,set_mode_dzero,0,NULL,0,USB_CTRL_SET_TIMEOUT);
+
sierra_send_setup(port);

return (0);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

A question about break and sysrq on a serial console (2.6.19.1)

2007-01-17 Thread Brian Beattie

I'm trying to do a SYSRQ over a serial console.  As I understand it a
break will do that, but I'm not seeing the SYSRQ.  In looking at
uart_handle_break() in drivers/serial/8250.c it looks like the code will
toggle port->sysrq, rather than just setting it when the port is a
console.  I think the correct code would be to move the "port->sysrq =
0;" to follow the closing brace on the next line, or am I missing
something.

--
/*
 * We do the SysRQ and SAK checking like this...
 */
static inline int uart_handle_break(struct uart_port *port)
{
struct uart_info *info = port->info;
#ifdef SUPPORT_SYSRQ
if (port->cons && port->cons->index == port->line) {
if (!port->sysrq) {
port->sysrq = jiffies + HZ*5;
return 1;
}
port->sysrq = 0;
}
#endif
if (port->flags & UPF_SAK)
do_SAK(info->tty);
return 0;
}
-

It seem to me that this code will toggle port->sysrq.
-- 
Brian Beattie
Firmware Engineer
APCON, Inc.
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: intel 82571EB gigabit fails to see link on 2.6.20-rc5 in-tree e1000 driver (regression)

2007-01-17 Thread Auke Kok


Adam Kropelin wrote:

Allen Parker wrote:

Allen Parker wrote:
 From what I've been able to gather, other Intel Pro/1000 chipsets 
work fine in 2.6.20-rc5. If the e1000 guys need any assistance 
testing, I'll be more than happy to volunteer myself as a guinea pig 
for patches.
I wasn't aware that I was supposed to post this as a regression, so 
changed the subject, hoping that someone will pick this up and run with 
it. Primary issue being that link is not seen on this chipset under 
2.6.20-rc5 via in-tree e1000 driver, despite multiple cycles of ifconfig 
$eth up && ethtool $eth | grep link && ifconfig $eth down. Tested on 2 
machines with identical hardware.

I asked Allen to report this here. I'm working on the issue as we speak
but for now I'll treat the link issue as a known regression once I 
confirm it. If others have seen it then I'd like to know ASAP of course


I am experiencing the no-link issue on a 82572EI single port copper
PCI-E card. I've only tried 2.6.20-rc5, so I cannot tell if this is a
regression or not yet. Will test older kernel soon.

Can provide details/logs if you want 'em.


we've already established that Allen's issue is not due to the driver and caused by 
interrupts being mal-assigned on his system, possibly a pci subsystem bug. You also have 
a completely different board (82572EI instead of 82571EB), so I'd like to see the usual 
debugging info as well as hearing from you whether 2.6.19.any works correctly.


On top of that I posted a patch to rc5-mm yesterday that fixes a few significant bugs in 
the rc5-mm driver, so please apply that patch too before trying, so we're not wasting 
our time finding old bugs ;)


Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Jens Axboe

On Wed, Jan 17 2007, Dave Kleikamp wrote:
> On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:
> 
> > Can you try io_schedule() and verify that things just work?
> 
> I actually did do that in the first place, but wondered if it was the
> right thing to introduce the accounting changes that came with that.
> I'll change it back to io_schedule() and test it again, just to make
> sure.

It appears to be the correct change to me - you really are waiting for
IO resources (otherwise it would not hang with the plug change), so
doing an inc/dec of iowait around the schedule should be done.

> If that's the right fix, I can push it directly since it won't have any
> dependencies on your patches.

Perfect!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Dave Kleikamp

On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:

> Can you try io_schedule() and verify that things just work?

I actually did do that in the first place, but wondered if it was the
right thing to introduce the accounting changes that came with that.
I'll change it back to io_schedule() and test it again, just to make
sure.

If that's the right fix, I can push it directly since it won't have any
dependencies on your patches.

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.19.2 : Oops

2007-01-17 Thread Greg KH

On Wed, Jan 17, 2007 at 11:21:59AM +0100, Nicolas Bareil wrote:
> 
> Hello,
> 
> Since 2.6.19, I get the following Oops once a day, always with the same
> process, newspipe[1] which use a lot of CPU, threads and I/O.
> 
> The kernel is patched by Grsecurity. The ext3 filesystem is on a
> software RAID device (the two disks are SATA2). I tested the 
> hardware (RAM, SMART disks) but nothing seem problematic.

Can you reproduce it without the grsec patch applied?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: intel 82571EB gigabit fails to see link on 2.6.20-rc5 in-tree e1000 driver (regression)

2007-01-17 Thread Adam Kropelin

> Allen Parker wrote:
>> Allen Parker wrote:
>>>  From what I've been able to gather, other Intel Pro/1000 chipsets 
>>> work fine in 2.6.20-rc5. If the e1000 guys need any assistance 
>>> testing, I'll be more than happy to volunteer myself as a guinea pig 
>>> for patches.
>> 
>> I wasn't aware that I was supposed to post this as a regression, so 
>> changed the subject, hoping that someone will pick this up and run with 
>> it. Primary issue being that link is not seen on this chipset under 
>> 2.6.20-rc5 via in-tree e1000 driver, despite multiple cycles of ifconfig 
>> $eth up && ethtool $eth | grep link && ifconfig $eth down. Tested on 2 
>> machines with identical hardware.
>
> I asked Allen to report this here. I'm working on the issue as we speak
> but for now I'll treat the link issue as a known regression once I 
> confirm it. If others have seen it then I'd like to know ASAP of course

I am experiencing the no-link issue on a 82572EI single port copper
PCI-E card. I've only tried 2.6.20-rc5, so I cannot tell if this is a
regression or not yet. Will test older kernel soon.

Can provide details/logs if you want 'em.

--Adam

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-17 Thread Nate Diller


On Wed, 17 Jan 2007, Benjamin LaHaise wrote:


On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote:

the right thing to do from a design perspective.  Hopefully it enables
a new architecture that can reduce context switches in I/O completion,
and reduce overhead.  That's the real motive ;)


And it's a broken motive.  Context switches per se are not bad, as they
make it possible to properly schedule code in a busy system (which is
*very* important when realtime concerns come into play).  Have a look
at how things were done in the 2.4 aio code to see how completion would
get done with a non-retry method, typically in interrupt context.  I had
code that did direct I/O rather differently by sharing code with the
read/write code paths at some point, the catch being that it was pretty
invasive, which meant that it never got merged with the changes to handle
writeback pressure and other work that happened during 2.5.


I'm having some trouble understanding your concern.  From my perspective,
any unnecessary context switch represents not only performance loss, but
extra complexity in the code.  In this case, I'm not suggesting that the
aio.c code causes problems, quite the opposite.  The code I'd like to change
is FS and md levels, where context switches happen because of timers,
workqueues, and worker threads.  For sync I/O, these layers could be doing
their completion work in process context, but because waiting on sync I/O is
done in layers above, they must resort to other means, even for the common
case.  The dm-crypt module is the most straightforward example.

I took a look at some 2.4.18 aio patches in kernel.org/.../bcrl/aio/, and if
I understand what you did, you were basically operating at the aops level
rather than f_ops.  I actually like that idea, it's nicer than having the
direct-io code do its work seperately from the aio code.  Part of where I'm
going with this patch is a better integration between the block layer
(make_request), page layer (aops), and FS layer (f_ops), particularly in the
completion paths.  The direct-io code is an improvement over the common code
on that point, do_readahead() and friends all wait on individual pages to
become uptodate.  I'd like to bring some improvements from the directIO
architecture into use in the common case, which I hope will help
performance.

I know that might seem somewhat unrelated, but I don't think it is.  This
change goes hand in hand with using completion handlers in the aops.  That
will link together the completion callback in the bio with the aio callback,
so that the whole stack can finish its work in one context.


That said, you can't make kiocb private without completely removing the
ability of the rest of the kernel to complete an aio sanely from irq context.
You need some form of i/o descriptor, and a kiocb is just that.  Adding more
layering is just going to make things messier and slower for no real gain.


This patchset does not change how or when I/O completion happens,
aio_complete() will still get called from direct-io.c, nfs-direct.c, et al. 
The iocb structure is still passed to aio_complete, just like before.  The

only difference is that the lower level code doesn't know that it's got an
iocb, all it sees is an opaque cookie.  It's more like enforcing a layer
that's already in place, and I think things got simpler rather than messier. 
Whether things are slower or not remains to be seen, but I expect no

measurable changes either way with this patch.

I'm releasing a new version of the patch soon, it will use a new iodesc
structure to keep track of iovec state, which simplifies things further.  It
also will have a new version of the usb gadget code, and some general
cleanups.  I hope you'll take a look at it.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Jens Axboe

On Wed, Jan 17 2007, Dave Kleikamp wrote:
> Jens,
> Can you please take a look at this patch, and if you think it's sane,
> add it to your explicit i/o plugging patchset?  Would it make sense in
> any of these paths to use io_schedule() instead of schedule()?

I'm glad you bring that up, actually. One of the "downsides" of the new
unplugging is that it really requires anyone waiting for IO in a path
like the file system or device driver to use io_schedule() instead of
schedule() to get the blk_replug_current_nested() done to avoid
deadlocks. While it is annoying that it could introduce some deadlocks
until we get things fixed it, I do consider it a correctness fix even in
the generic kernel, as you are really waiting for IO and as such should
use io_schedule() in the first place.

Perhaps I should add a WARN_ON() check for this to catch these bugs
upfront.

> I hadn't looked at your patchset until I discovered that jfs was easy to
> hang in the -mm kernel.  I think jfs may be able to add explicit
> plugging and unplugging in a couple of places, but I'd like to fix the
> hang right away and take my time with any later patches.

Can you try io_schedule() and verify that things just work?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nfs: fix congestion control

2007-01-17 Thread Christoph Hellwig

> --- linux-2.6-git.orig/fs/inode.c 2007-01-12 08:03:47.0 +0100
> +++ linux-2.6-git/fs/inode.c  2007-01-12 08:53:26.0 +0100
> @@ -81,6 +81,7 @@ static struct hlist_head *inode_hashtabl
>   * the i_state of an inode while it is in use..
>   */
>  DEFINE_SPINLOCK(inode_lock);
> +EXPORT_SYMBOL_GPL(inode_lock);

Btw, big "no fucking way" here.  There is no chance we're going to export
this, even _GPL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "obsolete" versus "deprecated", and a new config option?

2007-01-17 Thread Robert P. J. Day

On Wed, 17 Jan 2007, [EMAIL PROTECTED] wrote:

> On Wed, 17 Jan 2007 17:04:20 EST, "Robert P. J. Day" said:
>
> > > How much of the 'OBSOLETE' code should just be labelled 'BROKEN'
> > > instead?
> >
> > the stuff that's actually "broken."  :-)
>
> Right - the question is how much code qualifies as either/both, and
> which we should use when we encounter the random driver that's both
> obsolete *and* broken...

that's entirely a judgment call on the part of the code's maintainer.
if something is both obsolete and broken, then make it depend on
*both* OBSOLETE and BROKEN if you want.  no big deal.

rday
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PME_Turn_Off in Linux

2007-01-17 Thread Greg KH

On Wed, Jan 17, 2007 at 04:35:02PM -0600, Miller, Mike (OS Dev) wrote:
> > On Wed, Jan 17, 2007 at 10:43:14AM -0600, Miller, Mike (OS Dev) wrote:
> > > Hello,
> > > We've been seeing some nasty data corruption issues on some 
> > platforms.
> > > We've been capturing PCI-E traces looking for something 
> > nasty but we 
> > > haven't found anything yet. One of the hardware guys if asking if 
> > > there is a call in Linux to issue a PME_Turn_Off broadcast message.
> > >  
> > > PME_Turn_Off Broadcast Message
> > > Before main component power and reference clocks are turned 
> > off, the 
> > > Root Complex or Switch Downstream Port must issue a 
> > broadcast Message 
> > > that instructs all agents downstream of that point within the 
> > > hierarchy to cease initiation of any subsequent PM_PME Messages, 
> > > effective immediately upon receipt of the PME_Turn_Off Message.
> > > 
> > > This must be initiated from the root complex. Is there such 
> > a call in 
> > > linux?
> > 
> > This firmware that implements the PCI-E connection should do 
> > this, I don't think there is anything that the Operating 
> > system can do to control this, as PCI-E should be transparant 
> > to the OS.
> 
> Hmmm, the hw folks tell me that "other" os'es implement that. But I
> would tend to agree that system firmware should probably be doing this.

Where would the "other" oses implement this, as they don't even know the
pci device is a pci-e port?  How can the os send a PCI-E message without
talking directly to the chipset-specific controller chip?

> > 
> > Unless this is on a PCI-E Hotplug system?  What is the 
> 
> No hotplug.

That's good :)

> > sequence of events that cause the data corruption?
> 
> Install rhel4 u4 on ia64, at the reboot prompt let the system sit idle
> for several hours or overnight. Then after rebooting the filesystems are
> totally trashed. I usually get a message that the kernel is not a valid
> compressed file format. If I try to rescue the system I cannot mount any
> filesystems. I don't have the message handy but it complains about an
> invalid Verneed record, whatever that is.

The RHEL4 kernel is pretty old as far as PCI-E goes.  Can you try this
on a kernel.org release?  2.6.19.2 would be great at the least.  If not,
you're going to have to get your support from Red Hat on this issue :(

Any kernel log messages while the machine is idle before rebooting?

What tasks are running overnight that would cause writes to the disk?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS causing oops when freeing namespace

2007-01-17 Thread Oleg Nesterov

On 01/17, Eric W. Biederman wrote:
>
> Cedric Le Goater <[EMAIL PROTECTED]> writes:
> >
> > your first analysis was correct : exit_task_namespaces() should be moved 
> > above exit_notify(tsk). It will require some extra fixes for nsproxy 
> > though.
> 
> I think the only issue is the child_reaper and currently we only have one of
> those.  When we really do the pid namespace we are going to have to revisit
> this.  My gut feel says that we won't be able to exit our pid namespace until
> the process is waited on.  So we may need to break up exit_task_namespace into
> individual components.

I agree, but please note that the child_reaper is not the only issue. Think
about sub-thread which auto-reaps itself. I'd suggest to add the comment in
do_exit() after exit_notify() to remind that the task is really dead now, it
has no ->signal, it can't be seen in /proc/, we can't send a signal to it, etc.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Dave Kleikamp

Jens,
Can you please take a look at this patch, and if you think it's sane,
add it to your explicit i/o plugging patchset?  Would it make sense in
any of these paths to use io_schedule() instead of schedule()?

I hadn't looked at your patchset until I discovered that jfs was easy to
hang in the -mm kernel.  I think jfs may be able to add explicit
plugging and unplugging in a couple of places, but I'd like to fix the
hang right away and take my time with any later patches.

Thanks,
Shaggy

JFS: Avoid deadlock introduced by explicit I/O plugging

jfs is pretty easy to deadlock with Jens' explicit i/o plugging patchset.
Just try building a kernel.

The problem occurs when a synchronous transaction initiates some I/O, then
waits in lmGroupCommit for the transaction to be committed to the journal.
This requires action by the commit thread, which ends up waiting on a page
to complete writeback.  The commit thread did not initiate the I/O, so it
cannot unplug the io queue, and deadlock occurs.

The fix is for the first thread to call blk_replug_current_nested() before
going to sleep.  This patch also adds the call to a couple other places that
look like they need it.

Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>

diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h linux/fs/jfs/jfs_lock.h
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h  2006-11-29 15:57:37.0 
-0600
+++ linux/fs/jfs/jfs_lock.h 2007-01-17 15:30:19.0 -0600
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * jfs_lock.h
@@ -42,6 +43,7 @@ do {  \
if (cond)   \
break;  \
unlock_cmd; \
+   blk_replug_current_nested();\
schedule(); \
lock_cmd;   \
}   \
diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_metapage.c 
linux/fs/jfs/jfs_metapage.c
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_metapage.c  2007-01-12 09:50:45.0 
-0600
+++ linux/fs/jfs/jfs_metapage.c 2007-01-17 15:28:46.0 -0600
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "jfs_incore.h"
 #include "jfs_superblock.h"
 #include "jfs_filsys.h"
@@ -56,6 +57,7 @@ static inline void __lock_metapage(struc
set_current_state(TASK_UNINTERRUPTIBLE);
if (metapage_locked(mp)) {
unlock_page(mp->page);
+   blk_replug_current_nested();
schedule();
lock_page(mp->page);
}
diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_txnmgr.c linux/fs/jfs/jfs_txnmgr.c
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_txnmgr.c2007-01-12 09:50:45.0 
-0600
+++ linux/fs/jfs/jfs_txnmgr.c   2007-01-17 15:29:04.0 -0600
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "jfs_incore.h"
 #include "jfs_inode.h"
 #include "jfs_filsys.h"
@@ -135,6 +136,7 @@ static inline void TXN_SLEEP_DROP_LOCK(w
add_wait_queue(event, );
set_current_state(TASK_UNINTERRUPTIBLE);
TXN_UNLOCK();
+   blk_replug_current_nested();
schedule();
current->state = TASK_RUNNING;
remove_wait_queue(event, );

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "obsolete" versus "deprecated", and a new config option?

2007-01-17 Thread Valdis . Kletnieks

On Wed, 17 Jan 2007 17:04:20 EST, "Robert P. J. Day" said:

> > How much of the 'OBSOLETE' code should just be labelled 'BROKEN'
> > instead?
> 
> the stuff that's actually "broken."  :-)

Right - the question is how much code qualifies as either/both, and which
we should use when we encounter the random driver that's both obsolete
*and* broken...


pgpa0IdwHrAU9.pgp
Description: PGP signature

[PATCH] Add new categories of DEPRECATED and OBSOLETE.

2007-01-17 Thread Robert P. J. Day


  Next to EXPERIMENTAL, add two new kernel config categories of
DEPRECATED and OBSOLETE.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

  speak now or forever ... too late.


diff --git a/init/Kconfig b/init/Kconfig
index a3f83e2..433dd30 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -29,9 +29,10 @@ config EXPERIMENTAL
  , and
   in the kernel source).

- This option will also make obsoleted drivers available. These are
- drivers that have been replaced by something else, and/or are
- scheduled to be removed in a future kernel release.
+ At the moment, this option also makes obsolete drivers available,
+ but such drivers really should be removed from the EXPERIMENTAL
+ category and added to either DEPRECATED or OBSOLETE, depending
+ on their status.

  Unless you intend to help test and develop a feature or driver that
  falls into this category, or you have a situation that requires
@@ -40,6 +41,26 @@ config EXPERIMENTAL
  you say Y here, you will be offered the choice of using features or
  drivers that are currently considered to be in the alpha-test phase.

+config DEPRECATED
+   bool "Prompt for deprecated code/drivers"
+   default y
+   ---help---
+ Code that is tagged as "deprecated" is officially still available
+ for use but will typically have already been scheduled for removal
+ at some point, so it's in your best interests to start looking for
+ an alternative.
+
+config OBSOLETE
+   bool "Prompt for obsolete code/drivers"
+   default n
+   ---help---
+ Code that is tagged as "obsolete" is officially no longer supported
+ and shouldn't play a part in any normal build, but those features
+ might still be available if you absolutely need access to them.
+
+ You are *strongly* discouraged from continuing to depend on
+ obsolete code on an ongoing, long-term basis.
+
 config BROKEN
bool

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS causing oops when freeing namespace

2007-01-17 Thread Trond Myklebust

On Wed, 2007-01-17 at 15:30 -0700, Eric W. Biederman wrote:
> Cedric Le Goater <[EMAIL PROTECTED]> writes:
> >
> > your first analysis was correct : exit_task_namespaces() should be moved 
> > above exit_notify(tsk). It will require some extra fixes for nsproxy 
> > though.
> 
> I think the only issue is the child_reaper and currently we only have one of
> those.  When we really do the pid namespace we are going to have to revisit
> this.  My gut feel says that we won't be able to exit our pid namespace until
> the process is waited on.  So we may need to break up exit_task_namespace into
> individual components.

It makes little sense, afaics, to have an interruptible sleep in
something like lockd_down() if you have no pid space or signal handling.

That isn't the only place where the process has to wait in an NFS
unmount, BTW. Things like rpc client cleanup (waiting for all the RPC
tasks to complete) also tend to lead to interruptible sleeps.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: O_DIRECT question

2007-01-17 Thread Bodo Eggert

On Tue, 16 Jan 2007, Arjan van de Ven wrote:
> On Tue, 2007-01-16 at 21:26 +0100, Bodo Eggert wrote:
> > Helge Hafting <[EMAIL PROTECTED]> wrote:
> > > Michael Tokarev wrote:

> > >> But seriously - what about just disallowing non-O_DIRECT opens together
> > >> with O_DIRECT ones ?
> > >>   
> > > Please do not create a new local DOS attack.
> > > I open some important file, say /etc/resolv.conf
> > > with O_DIRECT and just sit on the open handle.
> > > Now nobody else can open that file because
> > > it is "busy" with O_DIRECT ?
> > 
> > Suspend O_DIRECT access while non-O_DIRECT-fds are open, fdatasync on close?
> 
> .. then any user can impact the operation, performance and reliability
> of the database application of another user... sounds like plugging one
> hole by making a bigger hole ;)

Don't allow other users to access your raw database files then, and if
backup kicks in, pausing the database would DTRT for integrety of the
backup. For other applications, paused O_DIRECT may very well be a
problem, but I can't think of one right now.

-- 
Logic: The art of being wrong with confidence... 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: PME_Turn_Off in Linux

2007-01-17 Thread Miller, Mike (OS Dev)

greg k-h wrote: 

> On Wed, Jan 17, 2007 at 10:43:14AM -0600, Miller, Mike (OS Dev) wrote:
> > Hello,
> > We've been seeing some nasty data corruption issues on some 
> platforms.
> > We've been capturing PCI-E traces looking for something 
> nasty but we 
> > haven't found anything yet. One of the hardware guys if asking if 
> > there is a call in Linux to issue a PME_Turn_Off broadcast message.
> >  
> > PME_Turn_Off Broadcast Message
> > Before main component power and reference clocks are turned 
> off, the 
> > Root Complex or Switch Downstream Port must issue a 
> broadcast Message 
> > that instructs all agents downstream of that point within the 
> > hierarchy to cease initiation of any subsequent PM_PME Messages, 
> > effective immediately upon receipt of the PME_Turn_Off Message.
> > 
> > This must be initiated from the root complex. Is there such 
> a call in 
> > linux?
> 
> This firmware that implements the PCI-E connection should do 
> this, I don't think there is anything that the Operating 
> system can do to control this, as PCI-E should be transparant 
> to the OS.

Hmmm, the hw folks tell me that "other" os'es implement that. But I
would tend to agree that system firmware should probably be doing this.

> 
> Unless this is on a PCI-E Hotplug system?  What is the 

No hotplug.

> sequence of events that cause the data corruption?

Install rhel4 u4 on ia64, at the reboot prompt let the system sit idle
for several hours or overnight. Then after rebooting the filesystems are
totally trashed. I usually get a message that the kernel is not a valid
compressed file format. If I try to rescue the system I cannot mount any
filesystems. I don't have the message handy but it complains about an
invalid Verneed record, whatever that is.

I've also tried the same procedure using a dumb SAS hba. It complained
that it couldn't read the initrd image but on a second attempt it acted
like it read the initrd but the system goes out in the weeds while
booting. Not the same symptoms but I suspect there's some relationship.

I have not tried any other distros yet.

Thanks,
mikem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS causing oops when freeing namespace

2007-01-17 Thread Eric W. Biederman

Cedric Le Goater <[EMAIL PROTECTED]> writes:
>
> your first analysis was correct : exit_task_namespaces() should be moved 
> above exit_notify(tsk). It will require some extra fixes for nsproxy 
> though.

I think the only issue is the child_reaper and currently we only have one of
those.  When we really do the pid namespace we are going to have to revisit
this.  My gut feel says that we won't be able to exit our pid namespace until
the process is waited on.  So we may need to break up exit_task_namespace into
individual components.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/8] Cpuset aware writeback

2007-01-17 Thread Andrew Morton

> On Wed, 17 Jan 2007 11:43:42 -0800 (PST) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> On Tue, 16 Jan 2007, Andrew Morton wrote:
> 
> > Do what blockdevs do: limit the number of in-flight requests (Peter's
> > recent patch seems to be doing that for us) (perhaps only when PF_MEMALLOC
> > is in effect, to keep Trond happy) and implement a mempool for the NFS
> > request critical store.  Additionally:
> > 
> > - we might need to twiddle the NFS gfp_flags so it doesn't call the
> >   oom-killer on failure: just return NULL.
> > 
> > - consider going off-cpuset for critical allocations.  It's better than
> >   going oom.  A suitable implementation might be to ignore the caller's
> >   cpuset if PF_MEMALLOC.  Maybe put a WARN_ON_ONCE in there: we prefer that
> >   it not happen and we want to know when it does.
> 
> Given the intermediate  layers (network, additional gizmos (ip over xxx) 
> and the network cards) that will not be easy.

Paul has observed that it's already done.  But it seems to not be working.

> > btw, regarding the per-address_space node mask: I think we should free it
> > when the inode is clean (!mapping_tagged(PAGECACHE_TAG_DIRTY)).  Chances
> > are, the inode will be dirty for 30 seconds and in-core for hours.  We
> > might as well steal its nodemask storage and give it to the next file which
> > gets written to.  A suitable place to do all this is in
> > __mark_inode_dirty(I_DIRTY_PAGES), using inode_lock to protect
> > address_space.dirty_page_nodemask.
> 
> The inode lock is not taken when the page is dirtied.

The inode_lock is taken when the address_space's first page is dirtied.  It is
also taken when the address_space's last dirty page is cleaned.  So the place
where the inode is added to and removed from sb->s_dirty is, I think, exactly
the place where we want to attach and detach address_space.dirty_page_nodemask.

> The tree_lock
> is already taken when the mapping is dirtied and so I used that to
> avoid races adding and removing pointers to nodemasks from the address 
> space.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20-rc4-mm1 - cvs merge whoops in git-ioat.patch?

2007-01-17 Thread Andrew Morton

> On Wed, 17 Jan 2007 16:09:41 -0500 [EMAIL PROTECTED] wrote:
> commit d8238afa7eedc047b57da7ec98e98fb051fc4e85
> Author: Chris Leech <[EMAIL PROTECTED]>
> Date:   Fri Nov 17 11:37:29 2006 -0800
> 
> I/OAT: Add documentation for the tcp_dma_copybreak sysctl
> 
> Signed-off-by: Chris Leech <[EMAIL PROTECTED]>
> 
> looks fishy, like a cvs update went bad:
> 
> diff -puN Documentation/networking/ip-sysctl.txt~git-ioat 
> Documentation/networking/ip-sysctl.txt
> --- a/Documentation/networking/ip-sysctl.txt~git-ioat
> +++ a/Documentation/networking/ip-sysctl.txt
> @@ -387,6 +387,22 @@ tcp_workaround_signed_windows - BOOLEAN
> not receive a window scaling option from them.
> Default: 0
> 
> +<<< HEAD/Documentation/networking/ip-sysctl.txt
> +===
> +tcp_slow_start_after_idle - BOOLEAN
> +   If set, provide RFC2861 behavior and time out the congestion
> 

Yeah, that's a git merge error.  I fix lots of them but didn't bother with
this one because it's just a .txt file.  It'll go away when Chris gets
around to cleaning up that tree.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Andi Kleen


> We've just verified that configuring the graphics aperture to be
> write-combining instead of write-back using an MTRR also solves the
> problem.  It appears to be a cache incoherency issue in the graphics
> aperture.

Interesting. 

Unfortunately it is also not correct. It was intentional to 
mark the IOMMU half. of the aperture write-back, as opposed
to uncached as the AGP half. Otherwise you get illegal cache attribute 
conflicts with the memory that is being remapped which can also cause 
corruption.

The Northbridge guarantees coherency over the aperture, but 
only if the caching attributes match. 

You would need to change_page_attr() every kernel address that is mapped into 
the  IOMMU to use an uncached aperture. AGP does this, but the frequency of 
mapping for the IOMMU  is much higher and it would be prohibitively costly
unfortunately. 

In the past we saw corruptions from such conflicts, so this is more
than just theory. I suspect  you traded a more easy to trigger corruption with 
a more subtle one.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20-rc5 nfs+krb => oops

2007-01-17 Thread syrius . ml

Trond Myklebust <[EMAIL PROTECTED]> writes:

> On Sat, 2007-01-13 at 23:57 +0100, [EMAIL PROTECTED] wrote:
>> Hi there,
>> 
>> I've been curious enough to try 2.6.20-rc5 with nfs4/kerberos.
>> It was working fine before. I was using 2.6.18.1 on the client and
>> 2.6.20-rc3-git4 on server and today i tried 2.6.20-rc5 on both client
>> and server. (both running up to date debian/sid)
>> Trying to mount a nfs4 or nfs3 share with krb5 (did try with krb5 and
>> krb5p) produces this oops on the client side:
>> (each time I tried i got the same oops)
>> 
>> [ cut here ]
>> kernel BUG at net/sunrpc/sched.c:902!
>> [ SNIP ]
>> EIP: [] rpc_release_task+0x8f/0xc0 SS:ESP 0068:f6f21be4
>
> Does the attached patch fix it for you?

Yes It does !
Thanks a lot.

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: IPv6 router advertisement broken on 2.6.20-rc5

2007-01-17 Thread Aurelien Jarno

Sridhar Samudrala a écrit :
> I think the following patch
> 
> [IPV6] MCAST: Fix joining all-node multicast group on device initialization.
>   http://www.spinics.net/lists/netdev/msg22663.html
> 
> that went in after 2.6.20-rc5 should fix this problem.
> 

Yep that fixes the problem. Thanks a lot.

Aurelien

-- 
  .''`.  Aurelien Jarno | GPG: 1024D/F1BCDB73
 : :' :  Debian developer   | Electrical Engineer
 `. `'   [EMAIL PROTECTED] | [EMAIL PROTECTED]
   `-people.debian.org/~aurel32 | www.aurel32.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH/RFC 2.6.21] ehca: ehca_uverbs.c: refactor ehca_mmap() for better readability

2007-01-17 Thread Hoang-Nam Nguyen

Hello,
here is a patch for ehca_uverbs.c with the following changes:
- Rename mm_open/close() to ehca_mm_open/close() respectively
- Refactor ehca_mmap() into sub-functions ehca_mmap_cq/qp(),
which then call the new common sub-functions ehca_mmap_fw()
and ehca_mmap_queue() to register firmware memory block and
queue pages respectively
Roland, please note that I applied the previous patches to
your git tree for-2.6.21 before creating this patch. I also
realized a compile issue with the patch from Michael T. in
ehca_reqs.c regarding "return qp pointer in ib_wc". For this
I'll send another patch.
Thanks!
Nam


Signed-off-by Hoang-Nam Nguyen <[EMAIL PROTECTED]>
---


 ehca_uverbs.c |  266 +++---
 1 file changed, 146 insertions(+), 120 deletions(-)


diff -Nurp infiniband/drivers/infiniband/hw/ehca/ehca_uverbs.c 
infiniband_work/drivers/infiniband/hw/ehca/ehca_uverbs.c
--- infiniband/drivers/infiniband/hw/ehca/ehca_uverbs.c 2007-01-17 
21:39:01.0 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_uverbs.c2007-01-17 
21:17:00.0 +0100
@@ -68,7 +68,7 @@ int ehca_dealloc_ucontext(struct ib_ucon
return 0;
 }
 
-static void mm_open(struct vm_area_struct *vma)
+static void ehca_mm_open(struct vm_area_struct *vma)
 {
u32 *count = (u32*)vma->vm_private_data;
if (!count) {
@@ -84,7 +84,7 @@ static void mm_open(struct vm_area_struc
 vma->vm_start, vma->vm_end, *count);
 }
 
-static void mm_close(struct vm_area_struct *vma)
+static void ehca_mm_close(struct vm_area_struct *vma)
 {
u32 *count = (u32*)vma->vm_private_data;
if (!count) {
@@ -98,26 +98,150 @@ static void mm_close(struct vm_area_stru
 }
 
 static struct vm_operations_struct vm_ops = {
-   .open = mm_open,
-   .close = mm_close,
+   .open = ehca_mm_open,
+   .close = ehca_mm_close,
 };
 
-static int ehca_mmap_qpages(struct vm_area_struct *vma, struct ipz_queue 
*queue)
+static int ehca_mmap_fw(struct vm_area_struct *vma, struct h_galpas *galpas,
+   u32 *mm_count)
 {
+   int ret;
+   u64 vsize, physical;
+
+   vsize = vma->vm_end - vma->vm_start;
+   if (vsize != EHCA_PAGESIZE) {
+   ehca_gen_err("invalid vsize=%lx", vma->vm_end - vma->vm_start);
+   return -EINVAL;
+   }
+
+   physical = galpas->user.fw_handle;
+   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   ehca_gen_dbg("vsize=%lx physical=%lx", vsize, physical);
+   /* VM_IO | VM_RESERVED are set by remap_pfn_range() */
+   ret = remap_pfn_range(vma, vma->vm_start, physical >> PAGE_SHIFT,
+ vsize, vma->vm_page_prot);
+   if (unlikely(ret)) {
+   ehca_gen_err("remap_pfn_range() failed ret=%x", ret);
+   return -ENOMEM;
+   }
+
+   vma->vm_private_data = mm_count;
+   (*mm_count)++;
+   vma->vm_ops = _ops;
+
+   return 0;
+}
+
+static int ehca_mmap_queue(struct vm_area_struct *vma, struct ipz_queue *queue,
+  u32 *mm_count)
+{
+   int ret;
u64 start, ofs;
struct page *page;
-   int  rc = 0;
+
+   vma->vm_flags |= VM_RESERVED;
start = vma->vm_start;
for (ofs = 0; ofs < queue->queue_length; ofs += PAGE_SIZE) {
u64 virt_addr = (u64)ipz_qeit_calc(queue, ofs);
page = virt_to_page(virt_addr);
-   rc = vm_insert_page(vma, start, page);
-   if (unlikely(rc)) {
-   ehca_gen_err("vm_insert_page() failed rc=%x", rc);
-   return rc;
+   ret = vm_insert_page(vma, start, page);
+   if (unlikely(ret)) {
+   ehca_gen_err("vm_insert_page() failed rc=%x", ret);
+   return ret;
}
start +=  PAGE_SIZE;
}
+   vma->vm_private_data = mm_count;
+   (*mm_count)++;
+   vma->vm_ops = _ops;
+
+   return 0;
+}
+
+static int ehca_mmap_cq(struct vm_area_struct *vma, struct ehca_cq *cq,
+   u32 rsrc_type)
+{
+   int ret;
+
+   switch (rsrc_type) {
+   case 1: /* galpa fw handle */
+   ehca_dbg(cq->ib_cq.device, "cq_num=%x fw", cq->cq_number);
+   ret = ehca_mmap_fw(vma, >galpas, >mm_count_galpa);
+   if (unlikely(ret)) {
+   ehca_err(cq->ib_cq.device,
+"ehca_mmap_fw() failed rc=%x cq_num=%x",
+ret, cq->cq_number);
+   return ret;
+   }
+   break;
+
+   case 2: /* cq queue_addr */
+   ehca_dbg(cq->ib_cq.device, "cq_num=%x queue", cq->cq_number);
+   ret = ehca_mmap_queue(vma, >ipz_queue, >mm_count_queue);
+   if (unlikely(ret)) {
+   ehca_err(cq->ib_cq.device,
+

Re: [PATCH] Introduce two new maturlty levels: DEPRECATED and OBSOLETE.

2007-01-17 Thread Robert P. J. Day

On Wed, 17 Jan 2007, H. Peter Anvin wrote:

> DEPRECATED should presumably default to Y; OBSOLETE to n.

crap, now i see what you were getting at -- i forgot to assign
defaults.  i'll resubmit, but i'll wait for anyone to suggest any
better help content if they have a better idea.

rday
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce two new maturlty levels: DEPRECATED and OBSOLETE.

2007-01-17 Thread Robert P. J. Day

On Wed, 17 Jan 2007, H. Peter Anvin wrote:

> DEPRECATED should presumably default to Y; OBSOLETE to n.

you presume correctly.

rday
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "obsolete" versus "deprecated", and a new config option?

2007-01-17 Thread Robert P. J. Day

On Wed, 17 Jan 2007, [EMAIL PROTECTED] wrote:

> On Wed, 17 Jan 2007 11:51:27 EST, "Robert P. J. Day" said:
> >
> >   in any event, what about introducing a new config variable,
> > OBSOLETE, under "Code maturity level options"?  this would seem to be
> > a quick and dirty way to prune anything that is *supposed* to be
> > obsolete from the build, to make sure you're not picking up dead code
> > by accident.
> >
> >   i think it would be useful to be able to make that kind of
> > distinction since, as the devfs writer pointed out above, the point of
> > labelling something "obsolete" is not to *discourage* someone from
> > using a feature, it's to imply that they *shouldn't* be using that
> > feature.  period.  which suggests there should be an easy, one-step
> > way to enforce that absolutely in a build.
>
> How much of the 'OBSOLETE' code should just be labelled 'BROKEN'
> instead?

the stuff that's actually "broken."  :-)

OBSOLETE is not (or at least *should not* be) equivalent to BROKEN.
"OBSOLETE" should denote code that, while it is no longer supported
and has a viable replacement, may very well still work.  and it may or
may not be slated for removal some day.  there may very well be
reasons to keep "obsolete" code in the kernel, for occasional backward
compatibility, but marking it as "obsolete" is a powerful indicator
that people should *really* try not to use it.

"BROKEN" code, OTOH, really should mean exactly that -- code that is
*known* to be broken.  that would include old code that has suffered
bit rot, but it might also include *new* code that, while it's now
part of the kernel, someone discovers a major flaw in it and no one's
got around to fixing it yet.  so even bleeding-edge code can
technically be "broken" until someone gets around to debugging it.

thoughts?

rday

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.20-rc3 01/01] usb: Sierra Wireless auto set D0

2007-01-17 Thread Greg KH

On Mon, Jan 15, 2007 at 05:33:28PM -0800, Kevin Lloyd wrote:
> from: Kevin Lloyd <[EMAIL PROTECTED]>
> 
> This patch ensures that the device is turned on when inserted into the 
> system (which mostly affects the EM5725 and MC5720. It also adds more 
> VID/PIDs and matches the N_OUT_URB with the airprime driver.
> 
> Signed-off-by: Kevin Lloyd <[EMAIL PROTECTED]>
> 
> ---
> 
> --- linux-2.6.20-rc5/drivers/usb/serial/sierra.c.orig 2007-01-15 
> 15:17:15.0 -0800
> +++ linux-2.6.20-rc5/drivers/usb/serial/sierra.c  2007-01-15 
> 15:41:56.0 -0800
> @@ -14,9 +14,31 @@
>   Whom based his on the Keyspan driver by Hugh Blemings <[EMAIL PROTECTED]>
> 
>   History:
> +v.1.0.6:
> + klloyd
> + Added more devices and added Vendor Specific USB message to make sure
> + that devices are in D0 state when they start. This is very important for
> + MC5720 and EM5625 modules that go between Windows and Non-Windows 
> + machines.
> +v.1.0.5:
> + Greg KH
> + This saves over 30 lines and fixes a warning from sparse and allows
> + debugging to work dynamically like all other usb-serial drivers.
> + klloyd
> + Changed versioning to v.x.y.z
> +v.1.04:
> + klloyd
> + Adds significant throughput increase to the Sierra driver (uses multiple
> + urgs for download link). This patch also updates the current sierra.c 
> + driver so that it supports both 3-port Sierra devices and 1-port legacy
> + devices and removes Sierra's references in other related files (Kconfig
> + and airprime.c).
> +v.1.03
> + klloyd
> + Adds DTR line control support and impliments urb control.

This is not needed, nor recommended.  The changelog information in the
kernel provides this information.

> 
> -#define DRIVER_VERSION "v.1.0.5"
> +#define DRIVER_VERSION "v.1.0.6"
> #define DRIVER_AUTHOR "Kevin Lloyd <[EMAIL PROTECTED]>"
> #define DRIVER_DESC "USB Driver for Sierra Wireless USB modems"
> 
> @@ -31,14 +53,14 @@
> 
> 
> static struct usb_device_id id_table [] = {
> + { USB_DEVICE(0x1199, 0x0017) }, /* Sierra Wireless EM5625 */
>   { USB_DEVICE(0x1199, 0x0018) }, /* Sierra Wireless MC5720 */
>   { USB_DEVICE(0x1199, 0x0020) }, /* Sierra Wireless MC5725 */
> - { USB_DEVICE(0x1199, 0x0017) }, /* Sierra Wireless EM5625 */
>   { USB_DEVICE(0x1199, 0x0019) }, /* Sierra Wireless AirCard 595 */
> - { USB_DEVICE(0x1199, 0x0218) }, /* Sierra Wireless MC5720 */

You are dropping support for this device id.  Is that correct?  Are you
sure?

It was reported to me by Peter Kucmeroski and Jason Ganovsky, and you
were CCed when the patch went into the kernel that added this id.

> @@ -123,6 +145,7 @@ static int sierra_send_setup(struct usb_
>   return usb_control_msg(serial->dev,
>   usb_rcvctrlpipe(serial->dev, 0),
>   0x22,0x21,val,0,NULL,0,USB_CTRL_SET_TIMEOUT);
> +
>   }
> 
>   return 0;

Is this change really needed?  :)


> @@ -396,6 +419,8 @@ static int sierra_open(struct usb_serial
>   struct usb_serial *serial = port->serial;
>   int i, err;
>   struct urb *urb;
> + int result;
> + __u16 set_mode_dzero = 0x; //Set mode to D0

// comments are not recommended in the kernel, especially for something
as trivial as a variable name.

>   portdata = usb_get_serial_port_data(port);
> 
> @@ -442,6 +467,11 @@ static int sierra_open(struct usb_serial
> 
>   port->tty->low_latency = 1;
> 
> + //set mode to D0

Use /* */ please.

Care to resubmit this?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce two new maturlty levels: DEPRECATED and OBSOLETE.

2007-01-17 Thread H. Peter Anvin


Robert P. J. Day wrote:

  To go along with the EXPERIMENTAL kernel config status, introduce
two new states:  DEPRECATED and OBSOLETE.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

  just adding these config variables to init/Kconfig shouldn't affect
the current build status, but it will allow developers to start to
move over their portions of tree at their convenience.

  in particular, features that are truly obsolete should be tagged as
OBSOLETE, as opposed to EXPERIMENTAL.


diff --git a/init/Kconfig b/init/Kconfig
index a3f83e2..f861efd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -29,9 +29,10 @@ config EXPERIMENTAL
  , and
   in the kernel source).

- This option will also make obsoleted drivers available. These are
- drivers that have been replaced by something else, and/or are
- scheduled to be removed in a future kernel release.
+ At the moment, this option also makes obsolete drivers available,
+ but such drivers really should be removed from the EXPERIMENTAL
+ category and added to either DEPRECATED or OBSOLETE, depending
+ on their status.

  Unless you intend to help test and develop a feature or driver that
  falls into this category, or you have a situation that requires
@@ -40,6 +41,23 @@ config EXPERIMENTAL
  you say Y here, you will be offered the choice of using features or
  drivers that are currently considered to be in the alpha-test phase.

+config DEPRECATED
+   bool "Prompt for deprecated code/drivers"
+   ---help---
+ Code that has tagged as "deprecated" is officially still available
+ for use but will typically have already been scheduled for removal
+ at some point, so it's in your best interests to start looking for
+ an alternative.
+


DEPRECATED should presumably default to Y; OBSOLETE to n.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce two new maturlty levels: DEPRECATED and OBSOLETE.

2007-01-17 Thread H. Peter Anvin


Robert P. J. Day wrote:

  To go along with the EXPERIMENTAL kernel config status, introduce
two new states:  DEPRECATED and OBSOLETE.


I think this is a very good idea.  If nothing else, it gives some 
middle-of-the-roadness to the continual "to remove or not to remove" debate.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: problems with latest smbfs changes on 2.4.34 and security backports

2007-01-17 Thread Willy Tarreau

Hello Santiago,

On Wed, Jan 17, 2007 at 11:00:30AM +0100, Santiago Garcia Mantinan wrote:
> Hi!
> 
> I have discovered a problem with the changes applied to smbfs in 2.4.34 and
> in the security backports like last Debian's 2.4 kernel update these changes
> seem to be made to solve CVE-2006-5871 and they have broken symbolic links
> and changed the way that special files (like devices) are seen.
> 
> For example:
> Before: symbolic links were seen as that, symbolic links an thus if you tried
> to open the file the link was followed and you ended up reading the
> destination file
> Now: symbolic links are seen as normal files (at least with a ls) but their
> length (N) is the length of the symbolic link, if you read it, you'll get the
> first N characters of the destination file. For example, on my filesystem
> bin/sh is a symlink to bash, thus it is 4 bytes long, if I to a cat bin/sh I
> get ELF (this is, the first 4 characters of the bash program, first one
> being a DEL).
> 
> Another example:
> Before: if you did a ls of a device file, like dev/zero you could see it as
> a device, if you tried opening it, it wouldn't work, but if you did a cp -a
> of that file to a local filesystem the result was a working zero device.
> Now: the devices are seen as normal files with a length of 0 bytes.
> 
> Seems to me like a mask is erasing some mode bits that shouldn't be erased.
> 
> I have carried my tests on a Debian Sarge machine always mounting the share
> using: smbmount //server/share /mnt without any other options. The tests
> were carried on a unpatched 2.4.34 comparing it to 2.4.33 and also on
> Debian's 2.4.27 comparing 2.4.27-10sarge4 vs -10sarge5. The server is a
> samba 3.0.23d and I have experienced the same behaviour with samba's
> unix extensions = yes and unix extensions = no.
> 
> I don't know what else to add, if you need any more info or want me to do
> any tests just ask for it.

Well, there is not much to add there. Thanks very much for all your tests.
This problem was not easy to fix, and Dann Frazier did a careful job at
backporting it and testing it. Unfortunately, corner cases like this may
sometimes pass through the tests.

Dann, do you still have your samba server ready to try to reproduce this
problem ? Also, there are very suspect lines right there in the patch :

@@ -505,8 +510,13 @@
mnt->file_mode = (oldmnt->file_mode & S_IRWXUGO) | S_IFREG;
mnt->dir_mode = (oldmnt->dir_mode & S_IRWXUGO) | S_IFDIR;
 
-   mnt->flags = (oldmnt->file_mode >> 9);
+   mnt->flags = (oldmnt->file_mode >> 9) | SMB_MOUNT_UID |
+   SMB_MOUNT_GID | SMB_MOUNT_FMODE | SMB_MOUNT_DMODE;
} else {
+   mnt->file_mode = mnt->dir_mode = S_IRWXU | S_IRGRP | S_IXGRP |
+   S_IROTH | S_IXOTH | S_IFREG;
+   mnt->dir_mode = mnt->dir_mode = S_IRWXU | S_IRGRP | S_IXGRP |
+   S_IROTH | S_IXOTH | S_IFDIR;
if (parse_options(mnt, raw_data))
goto out_bad_option;
}


See above ? mnt->dir_mode being assigned 3 times. It still *seems* to do the
expected thing like this but I wonder if the initial intent was exactly this.
Also, would not it be necessary to add "|S_IFLNK" to the file_mode ? Maybe
what I say is stupid, but it's just a guess.

Santiago, if you feel brave enough to try completely untested code, I
would suggest to try this change :

} else {
-   mnt->file_mode = mnt->dir_mode = S_IRWXU | S_IRGRP | S_IXGRP |
-   S_IROTH | S_IXOTH | S_IFREG;
-   mnt->dir_mode = mnt->dir_mode = S_IRWXU | S_IRGRP | S_IXGRP |
-   S_IROTH | S_IXOTH | S_IFDIR;
+   mnt->file_mode = S_IRWXU | S_IRGRP | S_IXGRP |
+   S_IROTH | S_IXOTH | S_IFREG | S_IFLNK;
+   mnt->dir_mode = S_IRWXU | S_IRGRP | S_IXGRP |
+   S_IROTH | S_IXOTH | S_IFDIR;
if (parse_options(mnt, raw_data))
goto out_bad_option;


Also, please try making symlinks to directories to see how they behave.

Thanks in advance,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nfs: fix congestion control

2007-01-17 Thread Trond Myklebust

On Wed, 2007-01-17 at 22:52 +0100, Peter Zijlstra wrote:

> > 
> > > Index: linux-2.6-git/fs/inode.c
> > > ===
> > > --- linux-2.6-git.orig/fs/inode.c 2007-01-12 08:03:47.0 +0100
> > > +++ linux-2.6-git/fs/inode.c  2007-01-12 08:53:26.0 +0100
> > > @@ -81,6 +81,7 @@ static struct hlist_head *inode_hashtabl
> > >   * the i_state of an inode while it is in use..
> > >   */
> > >  DEFINE_SPINLOCK(inode_lock);
> > > +EXPORT_SYMBOL_GPL(inode_lock);
> > 
> > Hmmm... Commits to all NFS servers will be globally serialized via the 
> > inode_lock?
> 
> Hmm, right, thats not good indeed, I can pull the call to
> nfs_commit_list() out of that loop.

There is no reason to modify any of the commit stuff. Please just drop
that code.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "obsolete" versus "deprecated", and a new config option?

2007-01-17 Thread Valdis . Kletnieks

On Wed, 17 Jan 2007 11:51:27 EST, "Robert P. J. Day" said:
>
>   in any event, what about introducing a new config variable,
> OBSOLETE, under "Code maturity level options"?  this would seem to be
> a quick and dirty way to prune anything that is *supposed* to be
> obsolete from the build, to make sure you're not picking up dead code
> by accident.
> 
>   i think it would be useful to be able to make that kind of
> distinction since, as the devfs writer pointed out above, the point of
> labelling something "obsolete" is not to *discourage* someone from
> using a feature, it's to imply that they *shouldn't* be using that
> feature.  period.  which suggests there should be an easy, one-step
> way to enforce that absolutely in a build.

How much of the 'OBSOLETE' code should just be labelled 'BROKEN' instead?


pgpYtyUQ0sHVV.pgp
Description: PGP signature

Re: [PATCH] nfs: fix congestion control

2007-01-17 Thread Peter Zijlstra

On Wed, 2007-01-17 at 12:05 -0800, Christoph Lameter wrote:
> On Tue, 16 Jan 2007, Peter Zijlstra wrote:
> 
> > The current NFS client congestion logic is severely broken, it marks the
> > backing device congested during each nfs_writepages() call and implements
> > its own waitqueue.
> 
> This is the magic bullet that Andrew is looking for to fix the NFS issues?

Dunno if its magical, but it does solve a few issues I ran into.

> > Index: linux-2.6-git/include/linux/nfs_fs_sb.h
> > ===
> > --- linux-2.6-git.orig/include/linux/nfs_fs_sb.h2007-01-12 
> > 08:03:47.0 +0100
> > +++ linux-2.6-git/include/linux/nfs_fs_sb.h 2007-01-12 08:53:26.0 
> > +0100
> > @@ -82,6 +82,8 @@ struct nfs_server {
> > struct rpc_clnt *   client_acl; /* ACL RPC client handle */
> > struct nfs_iostats *io_stats;   /* I/O statistics */
> > struct backing_dev_info backing_dev_info;
> > +   atomic_twriteback;  /* number of writeback pages */
> > +   atomic_tcommit; /* number of commit pages */
> > int flags;  /* various flags */
> 
> I think writeback is frequently incremented? Would it be possible to avoid
> a single global instance of an atomic_t here? In a busy NFS system 
> with lots of processors writing via NFS this may cause a hot cacheline 
> that limits write speed.

This would be per NFS mount, pretty global indeed. But not different
that other backing_dev_info's. request_queue::nr_requests suffers a
similar fate.

> Would it be possible to use NR_WRITEBACK? If not then maybe add another
> ZVC counter named NFS_NFS_WRITEBACK?

Its a per backing_dev_info thing. So using the zone counters will not
work.

> > Index: linux-2.6-git/mm/page-writeback.c
> > ===
> > --- linux-2.6-git.orig/mm/page-writeback.c  2007-01-12 08:03:47.0 
> > +0100
> > +++ linux-2.6-git/mm/page-writeback.c   2007-01-12 08:53:26.0 
> > +0100
> > @@ -167,6 +167,12 @@ get_dirty_limits(long *pbackground, long
> > *pdirty = dirty;
> >  }
> >  
> > +int dirty_pages_exceeded(struct address_space *mapping)
> > +{
> > +   return dirty_exceeded;
> > +}
> > +EXPORT_SYMBOL_GPL(dirty_pages_exceeded);
> > +
> 
> Export the variable instead of adding a new function? Why does it take an 
> address space parameter that is not used?
> 

Yeah, that function used to be larger.

> 
> > Index: linux-2.6-git/fs/inode.c
> > ===
> > --- linux-2.6-git.orig/fs/inode.c   2007-01-12 08:03:47.0 +0100
> > +++ linux-2.6-git/fs/inode.c2007-01-12 08:53:26.0 +0100
> > @@ -81,6 +81,7 @@ static struct hlist_head *inode_hashtabl
> >   * the i_state of an inode while it is in use..
> >   */
> >  DEFINE_SPINLOCK(inode_lock);
> > +EXPORT_SYMBOL_GPL(inode_lock);
> 
> Hmmm... Commits to all NFS servers will be globally serialized via the 
> inode_lock?

Hmm, right, thats not good indeed, I can pull the call to
nfs_commit_list() out of that loop.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-17 Thread Benjamin LaHaise

On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote:
> the right thing to do from a design perspective.  Hopefully it enables
> a new architecture that can reduce context switches in I/O completion,
> and reduce overhead.  That's the real motive ;)

And it's a broken motive.  Context switches per se are not bad, as they 
make it possible to properly schedule code in a busy system (which is 
*very* important when realtime concerns come into play).  Have a look 
at how things were done in the 2.4 aio code to see how completion would 
get done with a non-retry method, typically in interrupt context.  I had 
code that did direct I/O rather differently by sharing code with the 
read/write code paths at some point, the catch being that it was pretty 
invasive, which meant that it never got merged with the changes to handle 
writeback pressure and other work that happened during 2.5.

That said, you can't make kiocb private without completely removing the 
ability of the rest of the kernel to complete an aio sanely from irq context.  
You need some form of i/o descriptor, and a kiocb is just that.  Adding more 
layering is just going to make things messier and slower for no real gain.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel 2.6.19.1 - Please report the result to linux-kernel to fix this permanently

2007-01-17 Thread Greg KH

On Wed, Jan 17, 2007 at 05:04:45PM +0100, =?UTF-8?Q? taps ?= wrote:
> Hello,
> 
> I got this text when I boot my linux :
> 
> --cut--
> 
> PCI quirk: region 1000-107f claimed by ICH4 ACPI/GPIO/TCO
> PCI quirk: region 1180-11bf claimed by ICH4 GPIO
> PCI: Ignoring BAR0-3 of IDE controller :00:1f.1
> PCI: Transparent bridge - :00:1e.0
> PCI: Bus #03 (-#06) is hidden behind transparent bridge #02 (-#02) (try 
> 'pci=assign-busses')
> Please report the result to linux-kernel to fix this permanently
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIB._PRT]
> ACPI: PCI Interrupt Link [LNKA] (IRQs *6)
> ACPI: PCI Interrupt Link [LNKB] (IRQs 11) *10
> 
> --cut--
> 
> Kernel 2.6.19.1 without any patch.
> Debian 4.0 Etch.
> Everything works on laptop Toshiba satellite pro L10
> Do you need any more information ?

Bernhard, is there any way we can stop printing out this message?  It
really doesn't seem to be necessary.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19.1, sata_sil: sata dvd writer doesn't work

2007-01-17 Thread Harald Dunkel

Hi Tejun,

Tejun Heo wrote:
> 
> Okay, I just tested a number of dvds on x86-64 and x86.  The error
> pattern is really interesting.  It doesn't matter whether you're on
> x86-64 or x86, 2.6.18 or 2.6.20-rc5.  The problem occurs when a dvd
> which doesn't match dvd's region mask is played.
> 
> MMC command 0xa4 (READ KEY) is the one which always fails.  After the
> failure, the odd goes into strange state and usually won't respond to
> commands.  Interestingly, if you pull the power plug or reset the
> machine while the READ KEY command is in progress and then reconnect it,
> you can play the DVD after that.  I've checked this multiple times and
> no, dvdcss key caching isn't the cause, crossed checked it multiple times.
> 
> Once you played a dvd this way, the drive seems to remember the dvd and
> successfully plays it afterwards.  I've checked this multiple times
> using completely separate OS installation (one x86, the other x86-64).
> 

How comes that there is no such problem if I connect the drive
via an USB SATA adapter?

> This almost looks like new defense method against CSS-workaround.  Can't
> understand why the drive remembers successfully played dvds tho.
> 

I would have the option to return it (playing no DVDs is surely
a defect), but this would be a shame. It was lightning fast on writing,
a little bit noisy, though, but I was really glad to get rid of that
clumsy parallel cable.

Do you think it would be reasonable to send a bug report to Samsung,
and see what they say? I would need some documentation about these
MMC commands, though. Is this part of some "Red Book" standard, or
so?

Regards

Harri

signature.asc
Description: OpenPGP digital signature

Re: PME_Turn_Off in Linux

2007-01-17 Thread Greg KH

On Wed, Jan 17, 2007 at 10:43:14AM -0600, Miller, Mike (OS Dev) wrote:
> Hello,
> We've been seeing some nasty data corruption issues on some platforms.
> We've been capturing PCI-E traces looking for something nasty but we
> haven't found anything yet. One of the hardware guys if asking if there
> is a call in Linux to issue a PME_Turn_Off broadcast message.
>  
> PME_Turn_Off Broadcast Message
> Before main component power and reference clocks are turned off, the
> Root Complex or Switch Downstream Port must issue a broadcast Message
> that instructs all agents downstream of that point within the hierarchy
> to cease initiation of any subsequent PM_PME Messages, effective
> immediately upon receipt of the PME_Turn_Off Message.
> 
> This must be initiated from the root complex. Is there such a call in
> linux?

This firmware that implements the PCI-E connection should do this, I
don't think there is anything that the Operating system can do to
control this, as PCI-E should be transparant to the OS.

Unless this is on a PCI-E Hotplug system?  What is the sequence of
events that cause the data corruption?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Introduce two new maturlty levels: DEPRECATED and OBSOLETE.

2007-01-17 Thread Robert P. J. Day


  To go along with the EXPERIMENTAL kernel config status, introduce
two new states:  DEPRECATED and OBSOLETE.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

  just adding these config variables to init/Kconfig shouldn't affect
the current build status, but it will allow developers to start to
move over their portions of tree at their convenience.

  in particular, features that are truly obsolete should be tagged as
OBSOLETE, as opposed to EXPERIMENTAL.


diff --git a/init/Kconfig b/init/Kconfig
index a3f83e2..f861efd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -29,9 +29,10 @@ config EXPERIMENTAL
  , and
   in the kernel source).

- This option will also make obsoleted drivers available. These are
- drivers that have been replaced by something else, and/or are
- scheduled to be removed in a future kernel release.
+ At the moment, this option also makes obsolete drivers available,
+ but such drivers really should be removed from the EXPERIMENTAL
+ category and added to either DEPRECATED or OBSOLETE, depending
+ on their status.

  Unless you intend to help test and develop a feature or driver that
  falls into this category, or you have a situation that requires
@@ -40,6 +41,23 @@ config EXPERIMENTAL
  you say Y here, you will be offered the choice of using features or
  drivers that are currently considered to be in the alpha-test phase.

+config DEPRECATED
+   bool "Prompt for deprecated code/drivers"
+   ---help---
+ Code that has tagged as "deprecated" is officially still available
+ for use but will typically have already been scheduled for removal
+ at some point, so it's in your best interests to start looking for
+ an alternative.
+
+config OBSOLETE
+   bool "Prompt for obsolete code/drivers"
+   ---help---
+ Code that has tagged as "obsolete" is officially no longer supported
+ and shouldn't play a part in any normal build, but those features
+ might still be available if you absolutely need access to them.
+
+ It's *strongly* discouraged to continue to depend on obsolete code.
+
 config BROKEN
bool


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix missing include of list.h in sysfs.h

2007-01-17 Thread Greg KH

On Wed, Jan 17, 2007 at 05:51:18PM +0100, Frank Haverkamp wrote:
> Sysfs.h uses definitions (e.g. struct list_head s_sibling) from list.h
> but does not include it.
> 
> Signed-off-by: Frank Haverkamp <[EMAIL PROTECTED]>
> ---
>  include/linux/sysfs.h |1 +
>  1 file changed, 1 insertion(+)
> 
> --- ubi-2.6.git.orig/include/linux/sysfs.h
> +++ ubi-2.6.git/include/linux/sysfs.h
> @@ -11,6 +11,7 @@
>  #define _SYSFS_H_
>  
>  #include 
> +#include 
>  #include 

Does this currently cause a build error on any platform for 2.6.20-rc5?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: linux 2.6.19 unable to enable acpi

2007-01-17 Thread Matheus Izvekov


On 1/17/07, Len Brown <[EMAIL PROTECTED]> wrote:

The code that enables ACPI mode hasn't really changed since before 2.6.12 --
unless udelay() has changed beneath us...
So if you are going to test an old version of Linux, you should start before 
then.

Perhaps you can try this debug patch on top of 2.6.19 and send along the dmesg?
(also, please include CONFIG_ACPI_DEBUG=y)

thanks,
-Len


Tried that, dmesg output below:

DMI 2.2 present.
ACPI: RSDP (v000 AMI   ) @ 0x000fb080
ACPI: RSDT (v001 AMIINT  0x MSFT 0x0097) @ 0x0fdf
ACPI: FADT (v001 AMIINT  0x MSFT 0x0097) @ 0x0fdf0030
ACPI: DSDT (v001SiS  620 0x1000 MSFT 0x010a) @ 0x
ACPI: PM-Timer IO Port: 0x408
Allocating PCI resources starting at 1000 (gap: 0fe0:f00f)
Detected 300.683 MHz processor.
Built 1 zonelists.  Total pages: 64501
Kernel command line: root=/dev/sda3
Initializing CPU#0
CPU 0 irqstacks, hard=c038f000 soft=c038e000
PID hash table entries: 1024 (order: 10, 4096 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 254268k/260032k available (1818k kernel code, 5268k reserved,
611k data, 160k init, 0k highmem)
virtual kernel memory layout:
   fixmap  : 0x8000 - 0xf000   (  28 kB)
   vmalloc : 0xd080 - 0x6000   ( 759 MB)
   lowmem  : 0xc000 - 0xcfdf   ( 253 MB)
 .init : 0xc0361000 - 0xc0389000   ( 160 kB)
 .data : 0xc02c6be6 - 0xc035fa28   ( 611 kB)
 .text : 0xc010 - 0xc02c6be6   (1818 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 602.00 BogoMIPS (lpj=3010033)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0080f9ff   
  
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU: After all inits, caps: 0080f9ff   0040
  
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: Intel Pentium II (Klamath) stepping 03
Checking 'hlt' instruction... OK.
ACPI: Core revision 20060707
tbxface-0107 [01] load_tables   : ACPI Tables successfully acquired
Parsing all Control Methods:
Table [DSDT](id 0005) - 259 Objects with 25 Devices 99 Methods 13 Regions
ACPI Namespace successfully loaded at root c03a49f0
ACPI: setting ELCR to 8000 (from 1c00)
ACPI: FADT.acpi_enable 225
ACPI: FADT.acpi_disable 30
ACPI: smi_cmd 0x435, acpi_enable 0xe1
ACPI: retry 142
ACPI Error (hwacpi-0185): Hardware did not change modes [20060707]
ACPI Error (evxfevnt-0084): Could not transition to ACPI mode [20060707]
ACPI Warning (utxface-0154): AcpiEnable failed [20060707]
ACPI: Unable to enable ACPI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20-rc4-mm1 - cvs merge whoops in git-ioat.patch?

2007-01-17 Thread Valdis . Kletnieks

commit d8238afa7eedc047b57da7ec98e98fb051fc4e85
Author: Chris Leech <[EMAIL PROTECTED]>
Date:   Fri Nov 17 11:37:29 2006 -0800

I/OAT: Add documentation for the tcp_dma_copybreak sysctl

Signed-off-by: Chris Leech <[EMAIL PROTECTED]>

looks fishy, like a cvs update went bad:

diff -puN Documentation/networking/ip-sysctl.txt~git-ioat 
Documentation/networking/ip-sysctl.txt
--- a/Documentation/networking/ip-sysctl.txt~git-ioat
+++ a/Documentation/networking/ip-sysctl.txt
@@ -387,6 +387,22 @@ tcp_workaround_signed_windows - BOOLEAN
not receive a window scaling option from them.
Default: 0

+<<< HEAD/Documentation/networking/ip-sysctl.txt
+===
+tcp_slow_start_after_idle - BOOLEAN
+   If set, provide RFC2861 behavior and time out the congestion




pgpiO3BrgBx2O.pgp
Description: PGP signature

Re: kernel cmdline: root=/dev/sdb1,/dev/sda1 "fallback"?

2007-01-17 Thread H. Peter Anvin


Tomasz Chmielewski wrote:


All right.
I see that initramfs is attached to the kernel itself.

So it leaves me only a question: will I fit all tools into 300 kB 
(considering I'll use uClibc and busybox)?




You don't need to use busybox and have a bunch of tools.

The klibc distribution comes with "kinit", which does the equivalent to 
the kernel root-mounting code; it's in the tens of kilobytes, at least 
on x86.  If you're using ARM, you can compile it as Thumb.


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS causing oops when freeing namespace

2007-01-17 Thread Cedric Le Goater

Oleg Nesterov wrote:
> On 01/17, Cedric Le Goater wrote:
>> Oleg Nesterov wrote:
>>> On 01/17, Daniel Hokka Zakrisson wrote:
 It was the only semi-plausible explanation I could come up with. I added a
 printk in do_exit right before exit_task_namespaces, where sighand was
 still set, and one right before the spin_lock_irq in lockd_down, where it
 had suddenly been set to NULL.
>>> I can't reproduce the problem, but
>> I did on a 2.6.20-rc4-mm1.
>>
>>> do_exit:
>>> exit_notify(tsk);
>>> exit_task_namespaces(tsk);
>>>
>>> the task could be reaped by its parent in between.
>> indeed. while it goes spleeping in lockd_down() just before it does
>>
>>  spin_lock_irq(>sighand->siglock);
>>
>> current->sighand is valid before interruptible_sleep_on_timeout() and
>> not after.
>>
>>> We should not use ->signal/->sighand after exit_notify().
>>>
>>> Can we move exit_task_namespaces() up?
>> yes but I moved it down because it invalidates ->nsproxy ...
> 
> Well, we can fix the symptom if we change lockd_down() to use
> lock_task_sighand(), or something like this,
> 
>   --- NFS/fs/lockd/svc.c~lockd_down   2006-11-27 21:20:11.0 
> +0300
>   +++ NFS/fs/lockd/svc.c  2007-01-17 22:39:47.0 +0300
>   @@ -314,6 +314,7 @@ void
>lockd_down(void)
>{
>   static int warned;
>   +   int sigpending;
>   
>   mutex_lock(_mutex);
>   if (nlmsvc_users) {
>   @@ -334,16 +335,15 @@ lockd_down(void)
>* Wait for the lockd process to exit, but since we're holding
>* the lockd semaphore, we can't wait around forever ...
>*/
>   -   clear_thread_flag(TIF_SIGPENDING);
>   +   sigpending = test_and_clear_thread_flag(TIF_SIGPENDING);
>   interruptible_sleep_on_timeout(_exit, HZ);
>   if (nlmsvc_pid) {
>   printk(KERN_WARNING
>   "lockd_down: lockd failed to exit, clearing 
> pid\n");
>   nlmsvc_pid = 0;
>   }
>   -   spin_lock_irq(>sighand->siglock);
>   -   recalc_sigpending();
>   -   spin_unlock_irq(>sighand->siglock);
>   +   if (sigpending) /* can be wrong at this point, harmless */
>   +   set_thread_flag(TIF_SIGPENDING);
>out:
>   mutex_unlock(_mutex);
>}
> 
> but this is not good anyway.

your first analysis was correct : exit_task_namespaces() should be moved 
above exit_notify(tsk). It will require some extra fixes for nsproxy 
though.

C.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: prioritize PCI traffic ?

2007-01-17 Thread Hans-Peter Jansen

Am Montag, 15. Januar 2007 15:53 schrieb Vaidyanathan Srinivasan:
>
> 33Mhz 32-bit PCI bus on typical PC can do around 100MB/sec... 

Substract roughly n * 5MB for VIA chipsets, where n is the age (1 <= n <= 
4), and even more for SIS, ATI..

Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nfs: fix congestion control

2007-01-17 Thread Christoph Lameter

On Tue, 16 Jan 2007, Peter Zijlstra wrote:

> The current NFS client congestion logic is severely broken, it marks the
> backing device congested during each nfs_writepages() call and implements
> its own waitqueue.

This is the magic bullet that Andrew is looking for to fix the NFS issues?

> Index: linux-2.6-git/include/linux/nfs_fs_sb.h
> ===
> --- linux-2.6-git.orig/include/linux/nfs_fs_sb.h  2007-01-12 
> 08:03:47.0 +0100
> +++ linux-2.6-git/include/linux/nfs_fs_sb.h   2007-01-12 08:53:26.0 
> +0100
> @@ -82,6 +82,8 @@ struct nfs_server {
>   struct rpc_clnt *   client_acl; /* ACL RPC client handle */
>   struct nfs_iostats *io_stats;   /* I/O statistics */
>   struct backing_dev_info backing_dev_info;
> + atomic_twriteback;  /* number of writeback pages */
> + atomic_tcommit; /* number of commit pages */
>   int flags;  /* various flags */

I think writeback is frequently incremented? Would it be possible to avoid
a single global instance of an atomic_t here? In a busy NFS system 
with lots of processors writing via NFS this may cause a hot cacheline 
that limits write speed.

Would it be possible to use NR_WRITEBACK? If not then maybe add another
ZVC counter named NFS_NFS_WRITEBACK?

> Index: linux-2.6-git/mm/page-writeback.c
> ===
> --- linux-2.6-git.orig/mm/page-writeback.c2007-01-12 08:03:47.0 
> +0100
> +++ linux-2.6-git/mm/page-writeback.c 2007-01-12 08:53:26.0 +0100
> @@ -167,6 +167,12 @@ get_dirty_limits(long *pbackground, long
>   *pdirty = dirty;
>  }
>  
> +int dirty_pages_exceeded(struct address_space *mapping)
> +{
> + return dirty_exceeded;
> +}
> +EXPORT_SYMBOL_GPL(dirty_pages_exceeded);
> +

Export the variable instead of adding a new function? Why does it take an 
address space parameter that is not used?


> Index: linux-2.6-git/fs/inode.c
> ===
> --- linux-2.6-git.orig/fs/inode.c 2007-01-12 08:03:47.0 +0100
> +++ linux-2.6-git/fs/inode.c  2007-01-12 08:53:26.0 +0100
> @@ -81,6 +81,7 @@ static struct hlist_head *inode_hashtabl
>   * the i_state of an inode while it is in use..
>   */
>  DEFINE_SPINLOCK(inode_lock);
> +EXPORT_SYMBOL_GPL(inode_lock);

Hmmm... Commits to all NFS servers will be globally serialized via the 
inode_lock?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Chip Coldwell


On Wed, 17 Jan 2007, Chip Coldwell wrote:


On Wed, 17 Jan 2007, Andi Kleen wrote:


On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:

On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

I agree,... it seems drastic, but this is the only really secure
solution.


I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).


AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.



We've just verified that configuring the graphics aperture to be
write-combining instead of write-back using an MTRR also solves the
problem.  It appears to be a cache incoherency issue in the graphics
aperture.


I take it back.  Further testing has revealed that this does not solve
the problem.

Chip

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS causing oops when freeing namespace

2007-01-17 Thread Oleg Nesterov

On 01/17, Cedric Le Goater wrote:
>
> Oleg Nesterov wrote:
> > On 01/17, Daniel Hokka Zakrisson wrote:
> >> It was the only semi-plausible explanation I could come up with. I added a
> >> printk in do_exit right before exit_task_namespaces, where sighand was
> >> still set, and one right before the spin_lock_irq in lockd_down, where it
> >> had suddenly been set to NULL.
> > 
> > I can't reproduce the problem, but
> 
> I did on a 2.6.20-rc4-mm1.
> 
> > do_exit:
> > exit_notify(tsk);
> > exit_task_namespaces(tsk);
> > 
> > the task could be reaped by its parent in between.
> 
> indeed. while it goes spleeping in lockd_down() just before it does
> 
>   spin_lock_irq(>sighand->siglock);
> 
> current->sighand is valid before interruptible_sleep_on_timeout() and
> not after.
>  
> > We should not use ->signal/->sighand after exit_notify().
> > 
> > Can we move exit_task_namespaces() up?
> 
> yes but I moved it down because it invalidates ->nsproxy ... 

Well, we can fix the symptom if we change lockd_down() to use
lock_task_sighand(), or something like this,

--- NFS/fs/lockd/svc.c~lockd_down   2006-11-27 21:20:11.0 
+0300
+++ NFS/fs/lockd/svc.c  2007-01-17 22:39:47.0 +0300
@@ -314,6 +314,7 @@ void
 lockd_down(void)
 {
static int warned;
+   int sigpending;
 
mutex_lock(_mutex);
if (nlmsvc_users) {
@@ -334,16 +335,15 @@ lockd_down(void)
 * Wait for the lockd process to exit, but since we're holding
 * the lockd semaphore, we can't wait around forever ...
 */
-   clear_thread_flag(TIF_SIGPENDING);
+   sigpending = test_and_clear_thread_flag(TIF_SIGPENDING);
interruptible_sleep_on_timeout(_exit, HZ);
if (nlmsvc_pid) {
printk(KERN_WARNING 
"lockd_down: lockd failed to exit, clearing 
pid\n");
nlmsvc_pid = 0;
}
-   spin_lock_irq(>sighand->siglock);
-   recalc_sigpending();
-   spin_unlock_irq(>sighand->siglock);
+   if (sigpending) /* can be wrong at this point, harmless */
+   set_thread_flag(TIF_SIGPENDING);
 out:
mutex_unlock(_mutex);
 }

but this is not good anyway.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: linux 2.6.19 unable to enable acpi

2007-01-17 Thread Matheus Izvekov


On 1/17/07, Sunil Naidu <[EMAIL PROTECTED]> wrote:

On 1/17/07, Matheus Izvekov <[EMAIL PROTECTED]> wrote:
> I just tried the firmwarekit, and here are the results, attached.
> TYVM, thats a very useful tool.

I do suspect ACPI issues on my new DG965WH MOBO:-

http://www.intel.com/products/motherboard/DG965WH/index.htm

Tried with Linux-2.6.19.2. Anyone tested this MOBO?

And, from where to download the firmwarekit?

~Akula2



Its in Arjan's sig: http://www.linuxfirmwarekit.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/8] Cpuset aware writeback

2007-01-17 Thread Christoph Lameter

On Tue, 16 Jan 2007, Andrew Morton wrote:

> Do what blockdevs do: limit the number of in-flight requests (Peter's
> recent patch seems to be doing that for us) (perhaps only when PF_MEMALLOC
> is in effect, to keep Trond happy) and implement a mempool for the NFS
> request critical store.  Additionally:
> 
> - we might need to twiddle the NFS gfp_flags so it doesn't call the
>   oom-killer on failure: just return NULL.
> 
> - consider going off-cpuset for critical allocations.  It's better than
>   going oom.  A suitable implementation might be to ignore the caller's
>   cpuset if PF_MEMALLOC.  Maybe put a WARN_ON_ONCE in there: we prefer that
>   it not happen and we want to know when it does.

Given the intermediate  layers (network, additional gizmos (ip over xxx) 
and the network cards) that will not be easy.

> btw, regarding the per-address_space node mask: I think we should free it
> when the inode is clean (!mapping_tagged(PAGECACHE_TAG_DIRTY)).  Chances
> are, the inode will be dirty for 30 seconds and in-core for hours.  We
> might as well steal its nodemask storage and give it to the next file which
> gets written to.  A suitable place to do all this is in
> __mark_inode_dirty(I_DIRTY_PAGES), using inode_lock to protect
> address_space.dirty_page_nodemask.

The inode lock is not taken when the page is dirtied. The tree_lock
is already taken when the mapping is dirtied and so I used that to
avoid races adding and removing pointers to nodemasks from the address 
space.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 50/59] sysctl: Move utsname sysctls to their own file

2007-01-17 Thread Eric W. Biederman

Kirill Korotaev <[EMAIL PROTECTED]> writes:

> Eric, though I personally don't care much:
> 1. I ask for not setting your authorship/copyright on the code which you just
> copied
>   from other places. Just doesn't look polite IMHO.

I can't claim complete ownership of the code, there was plenty of feed back
and contributions from others but the final form without a big switch
statement is mine.  I certainly can't claim the table, it has been in
that form for years.

If you notice I actually didn't say whose copyright it was :)  just
that I wrote the file.

If there are copyright claims I should include I will be happy to do that.
Mostly I was just trying to find some stupid boiler plate that would work.

> 2. I would propose to not introduce utsname_sysctl.c.
>   both files are too small and minor that I can't see much reasons splitting
> them.

The impact of moving this code out of sysctl.c is a major
simplification, to sysctl.c.  Putting them in their own file means we
can cleanly restrict the code to only be compiled CONFIG_SYSCTL is set.

It is a necessary first step to implementing a per process /proc/sys.

It reorganizes the ipc and utsname sysctl from a terribly fragile
structure to something that is robust and easy to follow.  Code
scattered all throughout sysctl.c was just a disaster.  We had
several instances of having to fix bugs with odd combinations of
CONFIG options, simply because the other spot that needed to be touched
wasn't obvious.

So from my perspective this is an extremely worthwhile change that
will make maintenance easier and is a small first step towards
some nice future functionality.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] nfs: Fix mismatch between encode_dent_fn and filldir_t

2007-01-17 Thread Gabriel Paubert

The 5th parameter of filldir_t function type used by vfs_readdir
was changed from ino_t to u64 in October. Unfortunately the patch 
missed some files in fs/nfsd where functions pointers of type 
encode_dent_fn are passed around and finally cast to filldir_t.

The effect is only visible when an NFS server is run on a 32 bit
big-endian machine (it would have been visible on all 32 bit
architectures if the 6th parameter had been used). The results
are interesting: all files have an inode of 0 (unique you say?) 
from getdents(2) and even ls(1) does not find any files.

Signed-off-by: Gabriel Paubert <[EMAIL PROTECTED]>

--

P.S.: this shows that function pointer casts are evil, but I did not 
find a simpler way to fix this.

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 277df40..20c5f4a 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -991,14 +991,14 @@ encode_entry(struct readdir_cd *ccd, const char *name,
 
 int
 nfs3svc_encode_entry(struct readdir_cd *cd, const char *name,
-int namlen, loff_t offset, ino_t ino, unsigned int d_type)
+int namlen, loff_t offset, u64 ino, unsigned int d_type)
 {
return encode_entry(cd, name, namlen, offset, ino, d_type, 0);
 }
 
 int
 nfs3svc_encode_entry_plus(struct readdir_cd *cd, const char *name,
- int namlen, loff_t offset, ino_t ino, unsigned int 
d_type)
+ int namlen, loff_t offset, u64 ino, unsigned int 
d_type)
 {
return encode_entry(cd, name, namlen, offset, ino, d_type, 1);
 }
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index f5243f9..244406e 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -463,7 +463,7 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p,
 
 int
 nfssvc_encode_entry(struct readdir_cd *ccd, const char *name,
-   int namlen, loff_t offset, ino_t ino, unsigned int d_type)
+   int namlen, loff_t offset, u64 ino, unsigned int d_type)
 {
struct nfsd_readdirres *cd = container_of(ccd, struct nfsd_readdirres, 
common);
__be32  *p = cd->buffer;
diff --git a/include/linux/nfsd/nfsd.h b/include/linux/nfsd/nfsd.h
index 0727774..3ba5141 100644
--- a/include/linux/nfsd/nfsd.h
+++ b/include/linux/nfsd/nfsd.h
@@ -53,7 +53,7 @@ struct readdir_cd {
__be32  err;/* 0, nfserr, or nfserr_eof */
 };
 typedef int(*encode_dent_fn)(struct readdir_cd *, const char *,
-   int, loff_t, ino_t, unsigned 
int);
+   int, loff_t, u64, unsigned int);
 typedef int (*nfsd_dirop_t)(struct inode *, struct dentry *, int, int);
 
 extern struct svc_program  nfsd_program;
diff --git a/include/linux/nfsd/xdr.h b/include/linux/nfsd/xdr.h
index 877192d..328520c 100644
--- a/include/linux/nfsd/xdr.h
+++ b/include/linux/nfsd/xdr.h
@@ -166,7 +166,7 @@ int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *, 
struct nfsd_statfsres *
 int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *, struct 
nfsd_readdirres *);
 
 int nfssvc_encode_entry(struct readdir_cd *, const char *name,
-   int namlen, loff_t offset, ino_t ino, unsigned 
int);
+   int namlen, loff_t offset, u64 ino, unsigned 
int);
 
 int nfssvc_release_fhandle(struct svc_rqst *, __be32 *, struct nfsd_fhandle *);
 
diff --git a/include/linux/nfsd/xdr3.h b/include/linux/nfsd/xdr3.h
index 7996386..bc8b171 100644
--- a/include/linux/nfsd/xdr3.h
+++ b/include/linux/nfsd/xdr3.h
@@ -332,10 +332,10 @@ int nfs3svc_release_fhandle(struct svc_rqst *, __be32 *,
 int nfs3svc_release_fhandle2(struct svc_rqst *, __be32 *,
struct nfsd3_fhandle_pair *);
 int nfs3svc_encode_entry(struct readdir_cd *, const char *name,
-   int namlen, loff_t offset, ino_t ino,
+   int namlen, loff_t offset, u64 ino,
unsigned int);
 int nfs3svc_encode_entry_plus(struct readdir_cd *, const char *name,
-   int namlen, loff_t offset, ino_t ino,
+   int namlen, loff_t offset, u64 ino,
unsigned int);
 /* Helper functions for NFSv3 ACL code */
 __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: patch pci-rework-documentation-pci.txt.patch added to gregkh-2.6 tree

2007-01-17 Thread Greg KH

On Wed, Jan 17, 2007 at 10:09:08AM +0059, Jiri Slaby wrote:
> [EMAIL PROTECTED] wrote:
> [...]
> > +Tips on when/where to use the above attributes:
> > +   o The module_init()/module_exit() functions (and all
> > + initialization functions called _only_ from these)
> > + should be marked __init/__exit.
> > +
> > +   o Do not mark the struct pci_driver.
> > +
> > +   o The ID table array should be marked __devinitdata.
> 
> Is that correct? It panics somewehere IIRC?

If it's marked __initdata it can panic if we try to access the data when
adding a new PCI device after the system is up and running (pci hotplug
or dynamic pci ids support.)  That's why it needs to be either no
marking or __devinitdata (which resolves to nothing if CONFIG_HOTPLUG is
enabled.)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce and use get_task_mnt_ns()

2007-01-17 Thread Serge E. Hallyn

Quoting Alexey Dobriyan ([EMAIL PROTECTED]):
> Apply after "[PATCH] Fix NULL ->nsproxy dereference in /proc/*/mounts".
> 
> Similar to get_task_mm(): get a reference to task's mnt namespace if any.
> Suggested by Pavel Emelianov.
> 
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>

Yeah, that's nicer, thanks.

Acked-by: Serge Hallyn <[EMAIL PROTECTED]>


> ---
> 
>  fs/proc/base.c  |   15 ++-
>  include/linux/nsproxy.h |1 +
>  kernel/nsproxy.c|   14 ++
>  3 files changed, 17 insertions(+), 13 deletions(-)
> 
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -370,13 +370,7 @@ static int mounts_open(struct inode *ino
>   int ret = -EINVAL;
>  
>   if (task) {
> - task_lock(task);
> - if (task->nsproxy) {
> - ns = task->nsproxy->mnt_ns;
> - if (ns)
> - get_mnt_ns(ns);
> - }
> - task_unlock(task);
> + ns = get_task_mnt_ns(task);
>   put_task_struct(task);
>   }
>  
> @@ -443,12 +437,7 @@ static int mountstats_open(struct inode 
>   struct task_struct *task = get_proc_task(inode);
>  
>   if (task) {
> - task_lock(task);
> - if (task->nsproxy)
> - mnt_ns = task->nsproxy->mnt_ns;
> - if (mnt_ns)
> - get_mnt_ns(mnt_ns);
> - task_unlock(task);
> + mnt_ns = get_task_mnt_ns(task);
>   put_task_struct(task);
>   }
>  
> --- a/include/linux/nsproxy.h
> +++ b/include/linux/nsproxy.h
> @@ -35,6 +35,7 @@ struct nsproxy *dup_namespaces(struct ns
>  int copy_namespaces(int flags, struct task_struct *tsk);
>  void get_task_namespaces(struct task_struct *tsk);
>  void free_nsproxy(struct nsproxy *ns);
> +struct mnt_namespace * get_task_mnt_ns(struct task_struct *tsk);
>  
>  static inline void put_nsproxy(struct nsproxy *ns)
>  {
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -147,3 +147,17 @@ void free_nsproxy(struct nsproxy *ns)
>   put_pid_ns(ns->pid_ns);
>   kfree(ns);
>  }
> +
> +struct mnt_namespace * get_task_mnt_ns(struct task_struct *tsk)
> +{
> + struct mnt_namespace *mnt_ns = NULL;
> +
> + task_lock(tsk);
> + if (tsk->nsproxy)
> + mnt_ns = tsk->nsproxy->mnt_ns;
> + if (mnt_ns)
> + get_mnt_ns(mnt_ns);
> + task_unlock(tsk);
> +
> + return mnt_ns;
> +}
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS causing oops when freeing namespace

2007-01-17 Thread Cedric Le Goater

Oleg Nesterov wrote:
> On 01/17, Daniel Hokka Zakrisson wrote:
 Call Trace:
  [] _spin_lock_irqsave+0x20/0x90
  [] lockd_down+0x125/0x190
  [] nfs_free_server+0x6d/0xd0
  [] nfs_kill_super+0xc/0x20
  [] deactivate_super+0x7d/0xa0
  [] release_mounts+0x6e/0x80
  [] __put_mnt_ns+0x66/0x80
  [] free_nsproxy+0x5e/0x60
  [] do_exit+0x791/0x810
  [] do_group_exit+0x26/0x70
  [] sysenter_past_esp+0x5f/0x85
  [] rpc_wake_up+0x3/0x70
>> It was the only semi-plausible explanation I could come up with. I added a
>> printk in do_exit right before exit_task_namespaces, where sighand was
>> still set, and one right before the spin_lock_irq in lockd_down, where it
>> had suddenly been set to NULL.
> 
> I can't reproduce the problem, but

I did on a 2.6.20-rc4-mm1.

>   do_exit:
>   exit_notify(tsk);
>   exit_task_namespaces(tsk);
> 
> the task could be reaped by its parent in between.

indeed. while it goes spleeping in lockd_down() just before it does

spin_lock_irq(>sighand->siglock);

current->sighand is valid before interruptible_sleep_on_timeout() and
not after.
 
> We should not use ->signal/->sighand after exit_notify().
> 
> Can we move exit_task_namespaces() up?

yes but I moved it down because it invalidates ->nsproxy ... 

C.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH: Update disable_IO_APIC to use 8-bit destination field (X86_64)

2007-01-17 Thread Eric W. Biederman

Benjamin Romer <[EMAIL PROTECTED]> writes:

> On the Unisys ES7000/ONE system, we encountered a problem where
> performing a kexec reboot or dump on any cell other than cell 0 causes
> the system timer to stop working, resulting in a hang during timer
> calibration in the new kernel. 
>
> We traced the problem to one line of code in disable_IO_APIC(), which
> needs to restore the timer's IO-APIC configuration before rebooting. The
> code is currently using the 4-bit physical destination field, rather
> than using the 8-bit logical destination field, and it cuts off the
> upper 4 bits of the timer's APIC ID. If we change this to use the
> logical destination field, the timer works and we can kexec on the upper
> cells. This was tested on two different cells (0 and 2) in an ES7000/ONE
> system.
>
> For reference, the relevant Intel xAPIC spec is kept at
> ftp://download.intel.com/design/chipsets/e8501/datashts/30962001.pdf,
> specifically on page 334.

Looks like good bug hunting.  I will have to look but it might
make more sense to simply fix: struct IO_APIC_route_entry,
or use whatever technique we normally use to generate the io_apic
vectors.

I don't recall enough off of the top of my head to recall what
the discrimination rule between logical and physical is but
I think setting the system in physical mode is a good clue :)

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/59] Cleanup sysctl

2007-01-17 Thread Eric W. Biederman

Kirill Korotaev <[EMAIL PROTECTED]> writes:

> Eric, really good job!
>
> Patches: 1-13, 15-24, 26-32, 34-44, 46-49, 52-55, 57 (all except below)
> Acked-By: Kirill Korotaev <[EMAIL PROTECTED]>
>
> 14/59 - minor (extra space)
> 25/59 - minor note
> 33/59 - not sorted sysctl IDs
> 45/59 - typo
> 50/59 - copyright/file note
> 51/59 - copyright/file name/kconfig option notes
>
> 56,58,59/59 - will review tomorrow
>
> another issue I have to think over is removal of de->owner.
> Alexey Dobriyan has sent recently patching fixing /proc <-> modules 
> refcounting.
> I guess w/o these patches your changes are not safe if proc_handler or 
> strategy
> are functions from the module.

sysctl uses the logic in use_table/unuse_table to keep it safe from module
remove while it is in use.  And it does the logic in the generic code
in either do_rw_proc or do_sysctl.  This definitely works on the sys_sysctl path
and it appears to work in the do_rw_proc case, things are a little trickier
there so someone may have missed a race somewhere.  In my rewrite of proc
it works exactly like the binary case so we are good there. 

It is certainly the intention of the sysctl implementation that users
should not have to set de->owner.  So if there is a problem with 
removing de->owner it is a bug in the sysctl implementation not in
the code where it was removed.

Normal proc users definitely have to set de->owner to be safe, but sysctl has
always been it's own thing, with different rules. 

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS causing oops when freeing namespace

2007-01-17 Thread Oleg Nesterov

On 01/17, Daniel Hokka Zakrisson wrote:
>
> >> Call Trace:
> >>  [] _spin_lock_irqsave+0x20/0x90
> >>  [] lockd_down+0x125/0x190
> >>  [] nfs_free_server+0x6d/0xd0
> >>  [] nfs_kill_super+0xc/0x20
> >>  [] deactivate_super+0x7d/0xa0
> >>  [] release_mounts+0x6e/0x80
> >>  [] __put_mnt_ns+0x66/0x80
> >>  [] free_nsproxy+0x5e/0x60
> >>  [] do_exit+0x791/0x810
> >>  [] do_group_exit+0x26/0x70
> >>  [] sysenter_past_esp+0x5f/0x85
> >>  [] rpc_wake_up+0x3/0x70
>
> It was the only semi-plausible explanation I could come up with. I added a
> printk in do_exit right before exit_task_namespaces, where sighand was
> still set, and one right before the spin_lock_irq in lockd_down, where it
> had suddenly been set to NULL.

I can't reproduce the problem, but

do_exit:
exit_notify(tsk);
exit_task_namespaces(tsk);

the task could be reaped by its parent in between.

We should not use ->signal/->sighand after exit_notify().

Can we move exit_task_namespaces() up?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "obsolete" versus "deprecated", and a new config option?

2007-01-17 Thread Robert P. J. Day

On Wed, 17 Jan 2007, Bill Davidsen wrote:

> Robert P. J. Day wrote:
> >   a couple random thoughts on the notion of obsolescence and
> > deprecation.
>
>   [...horrible example deleted...]
>
> >   so is that ioctl obsolete or deprecated?  those aren't the same
> > things, a good distinction being drawn here by someone discussing
> > devfs:
> >
> > http://kerneltrap.org/node/1893
> >
> > "Devfs is deprecated.  This means it's still available but you
> > should consider moving to other options when available.  Obsolete
> > means it shouldn't be used.  Some 2.6 docs have confused these two
> > terms WRT devfs."
> >
> >   yes, and that confusion continues to this day, when a single
> > feature is described as both deprecated and obsolete.  not good.
> > (also, i'm guessing that anything that's "obsolete" might deserve
> > a default of "n" rather than "y", but that's just me.  :-)
>
> Agree on that. I would hope "obsolete" means there's a newer way
> which should provide the functionality (** help should say where
> that is **) while depreciated should mean "we decided this was a bad
> solution" or something like that.

in simpler terms, "deprecated" (note correct spelling :-) should mean
"it's still available and you can use it but you should seriously
think of moving up soon 'cuz this is going to disappear some day,"
while "obsolete" should mean, "it's dead, jim."

> >   in any event, what about introducing a new config variable,
> > OBSOLETE, under "Code maturity level options"?  this would seem to
> > be a quick and dirty way to prune anything that is *supposed* to
> > be obsolete from the build, to make sure you're not picking up
> > dead code by accident.
>
> If you're doing that, why not four variables, for incomplete,
> experimental, obsolete and depreciated? Unfortunately doing any more
> detailed nomenclature would be a LOT of work!

i wouldn't go that far.  using deprecated code is still technically
fine, but using obsolete code should be something that raises a red
flag of some kind.  i would just somehow mark the OBSOLETE stuff.  in
fact, some kernel config options already do something like this, such
as in drivers/mtd/chips/Kconfig:

config MTD_OBSOLETE_CHIPS
depends on MTD
bool "Older (theoretically obsoleted now) drivers for non-CFI chips"
help
  ... yadda yadda yadda ...

config MTD_AMDSTD
tristate "AMD compatible flash chip support (non-CFI)"
depends on MTD && MTD_OBSOLETE_CHIPS && BROKEN
...

and there's plenty of places in the Kconfig files that label features
as obsolete.  i just want the ability to switch all that stuff off
with one mouse click and see what happens.

rday
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix compile warnings in r8169

2007-01-17 Thread Francois Romieu

Bernhard Walle <[EMAIL PROTECTED]> :
> --- linux-2.6.20-rc4.orig/drivers/net/r8169.c 2007-01-07 06:45:51.0 
> +0100
> +++ linux-2.6.20-rc4/drivers/net/r8169.c  2007-01-17 11:39:13.792309228 
> +0100
[...]
> @@ -2227,7 +2227,7 @@ static int rtl8169_xmit_frags(struct rtl
>  {
>   struct skb_shared_info *info = skb_shinfo(skb);
>   unsigned int cur_frag, entry;
> - struct TxDesc *txd;
> + struct TxDesc *txd = NULL;
>  
>   entry = tp->cur_tx;
>   for (cur_frag = 0; cur_frag < info->nr_frags; cur_frag++) {

The driver is right. This change does not alleviate the maintenance
of the driver nor does it add incentive to fix the compiler.

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 414 matches

Mail list logo