Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-02 Thread Nish Aravamudan
On 8/1/07, Zan Lynx <[EMAIL PROTECTED]> wrote:
> On Wed, 2007-08-01 at 08:52 -0700, Nish Aravamudan wrote:
> > On 7/31/07, Zan Lynx <[EMAIL PROTECTED]> wrote:
> > > On Tue, 2007-07-31 at 15:02 -0700, Randy Dunlap wrote:
> > > > On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
> > > >
> > > > > I was playing with huge pages and libhugetlbfs.  Small programs like
> > > > > "ls" work fine.  I tried running Evolution through libhugetlbfs and 
> > > > > the
> > > > > system slowly stops running.  One interesting thing is the "ps" 
> > > > > command,
> > > > > it gets stuck like this:
> > > >
> > > > Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?
> > >
> > > D'oh!  I mean 2.6.23-rc1-mm1, the 22 was a typo.  Cut & paste to be
> > > sure:
> > > Linux zephyr 2.6.23-rc1-mm1 #1 SMP PREEMPT Wed Jul 25 17:33:04 MDT 2007
> > > x86_64 AMD Athlon(tm) 64 Processor 3400+ AuthenticAMD GNU/Linux
> >
> > Just to confirm, still happens with -mm2?
>
> No, it does not seem to.  Evolution runs OK.  ps, top, pmap all work
> fine.

Interesting.

> However, a couple of other things happened.  Could be unrelated or only
> loosely related.
>
> Evolution launches spamd (spamassassin) to filter junk mail.  spamd died
> and I have this in dmesg to show for it:
>
> VM: killing process spamd
>
> spamd would have inherited the libhugetlbfs.so environment variables.
> There are no other clues as to why it died though.

Interesting. Any chance spamd can be run manually with those env
variables, but with HUGETLB_VERBOSE=99 (and/or in gdb) to see what
happens to it?

> Also, immediately after launching evolution with libhugetlbfs, I got
> that USB bug where the mouse starts creating keyboard input.  I got some
> of these in dmesg:
> keyboard.c: can't emulate rawmode for keycode 240
>
> That could be pure coincidence, although I had been using the system
> almost all day before that, and it hadn't happened.

Had you started evolution w/o libhugetlbs at all before that?

It does seem like that would be coincidence.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-01 Thread Zan Lynx
On Wed, 2007-08-01 at 08:52 -0700, Nish Aravamudan wrote:
> On 7/31/07, Zan Lynx <[EMAIL PROTECTED]> wrote:
> > On Tue, 2007-07-31 at 15:02 -0700, Randy Dunlap wrote:
> > > On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
> > >
> > > > I was playing with huge pages and libhugetlbfs.  Small programs like
> > > > "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> > > > system slowly stops running.  One interesting thing is the "ps" command,
> > > > it gets stuck like this:
> > >
> > > Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?
> >
> > D'oh!  I mean 2.6.23-rc1-mm1, the 22 was a typo.  Cut & paste to be
> > sure:
> > Linux zephyr 2.6.23-rc1-mm1 #1 SMP PREEMPT Wed Jul 25 17:33:04 MDT 2007
> > x86_64 AMD Athlon(tm) 64 Processor 3400+ AuthenticAMD GNU/Linux
> 
> Just to confirm, still happens with -mm2?

No, it does not seem to.  Evolution runs OK.  ps, top, pmap all work
fine.

However, a couple of other things happened.  Could be unrelated or only
loosely related.

Evolution launches spamd (spamassassin) to filter junk mail.  spamd died
and I have this in dmesg to show for it:

VM: killing process spamd

spamd would have inherited the libhugetlbfs.so environment variables.
There are no other clues as to why it died though.

Also, immediately after launching evolution with libhugetlbfs, I got
that USB bug where the mouse starts creating keyboard input.  I got some
of these in dmesg:
keyboard.c: can't emulate rawmode for keycode 240

That could be pure coincidence, although I had been using the system
almost all day before that, and it hadn't happened.  
-- 
Zan Lynx <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-01 Thread Nish Aravamudan
On 7/31/07, Zan Lynx <[EMAIL PROTECTED]> wrote:
> On Tue, 2007-07-31 at 15:02 -0700, Randy Dunlap wrote:
> > On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
> >
> > > I was playing with huge pages and libhugetlbfs.  Small programs like
> > > "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> > > system slowly stops running.  One interesting thing is the "ps" command,
> > > it gets stuck like this:
> >
> > Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?
>
> D'oh!  I mean 2.6.23-rc1-mm1, the 22 was a typo.  Cut & paste to be
> sure:
> Linux zephyr 2.6.23-rc1-mm1 #1 SMP PREEMPT Wed Jul 25 17:33:04 MDT 2007
> x86_64 AMD Athlon(tm) 64 Processor 3400+ AuthenticAMD GNU/Linux

Also, are we at all sure this isn't a reiser4 issue? I assume you're
able to use Evolution w/o libhuge on rc1-mm1 ok? Any chance to remove
reiser4 from the picture? Have you been using libhuge this way
regularly? Any chance you know it worked ok with some recent kernel
(say 2.6.23-rc1?).

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-01 Thread Nish Aravamudan
On 7/31/07, Zan Lynx <[EMAIL PROTECTED]> wrote:
> On Tue, 2007-07-31 at 15:02 -0700, Randy Dunlap wrote:
> > On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
> >
> > > I was playing with huge pages and libhugetlbfs.  Small programs like
> > > "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> > > system slowly stops running.  One interesting thing is the "ps" command,
> > > it gets stuck like this:
> >
> > Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?
>
> D'oh!  I mean 2.6.23-rc1-mm1, the 22 was a typo.  Cut & paste to be
> sure:
> Linux zephyr 2.6.23-rc1-mm1 #1 SMP PREEMPT Wed Jul 25 17:33:04 MDT 2007
> x86_64 AMD Athlon(tm) 64 Processor 3400+ AuthenticAMD GNU/Linux

Just to confirm, still happens with -mm2?

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-01 Thread Nish Aravamudan
On 8/1/07, Randy Dunlap <[EMAIL PROTECTED]> wrote:
> Nish Aravamudan wrote:
> > On 7/31/07, Randy Dunlap <[EMAIL PROTECTED]> wrote:
> >> On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
> >>
> >>> I was playing with huge pages and libhugetlbfs.  Small programs like
> >>> "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> >>> system slowly stops running.  One interesting thing is the "ps" command,
> >>> it gets stuck like this:
> >> Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?
> >>
> >> There was a hugepage problem fixed very recently, in 2.6.23-rc1 IIRC.
> >
> > Actually fixed just after 2.6.23-rc1:
> >
> > git describe 5ab3ee7b1cd5c91eb2272764f9d7d1fe4749681e
> > v2.6.23-rc1-14-g5ab3ee7
>
> Looks to me like Andrew included Ken's patch in his rc1-mm1 anyway,
> so that shouldn't be the issue.  Or did I not read mm/hugetlb.c correctly?

Yeah you're right, the -mm tree has that bug fixed.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-01 Thread Randy Dunlap

Nish Aravamudan wrote:

On 7/31/07, Randy Dunlap <[EMAIL PROTECTED]> wrote:

On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:


I was playing with huge pages and libhugetlbfs.  Small programs like
"ls" work fine.  I tried running Evolution through libhugetlbfs and the
system slowly stops running.  One interesting thing is the "ps" command,
it gets stuck like this:

Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?

There was a hugepage problem fixed very recently, in 2.6.23-rc1 IIRC.


Actually fixed just after 2.6.23-rc1:

git describe 5ab3ee7b1cd5c91eb2272764f9d7d1fe4749681e
v2.6.23-rc1-14-g5ab3ee7


Looks to me like Andrew included Ken's patch in his rc1-mm1 anyway,
so that shouldn't be the issue.  Or did I not read mm/hugetlb.c correctly?

--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-01 Thread Nish Aravamudan
On 7/31/07, Zan Lynx <[EMAIL PROTECTED]> wrote:
> On Tue, 2007-07-31 at 15:02 -0700, Randy Dunlap wrote:
> > On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
> >
> > > I was playing with huge pages and libhugetlbfs.  Small programs like
> > > "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> > > system slowly stops running.  One interesting thing is the "ps" command,
> > > it gets stuck like this:
> >
> > Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?
>
> D'oh!  I mean 2.6.23-rc1-mm1, the 22 was a typo.  Cut & paste to be
> sure:
> Linux zephyr 2.6.23-rc1-mm1 #1 SMP PREEMPT Wed Jul 25 17:33:04 MDT 2007
> x86_64 AMD Athlon(tm) 64 Processor 3400+ AuthenticAMD GNU/Linux

Hrm -- if you kill Evolution does the system come back? Or is it
unkillable/unusable. I guess you were able to run ps at the same time.

What is Evolution doing (sysrq+t)? For that matter, what was the
output from libhuge?

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-01 Thread Nish Aravamudan
On 7/31/07, Randy Dunlap <[EMAIL PROTECTED]> wrote:
> On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
>
> > I was playing with huge pages and libhugetlbfs.  Small programs like
> > "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> > system slowly stops running.  One interesting thing is the "ps" command,
> > it gets stuck like this:
>
> Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?
>
> There was a hugepage problem fixed very recently, in 2.6.23-rc1 IIRC.

Actually fixed just after 2.6.23-rc1:

git describe 5ab3ee7b1cd5c91eb2272764f9d7d1fe4749681e
v2.6.23-rc1-14-g5ab3ee7

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-07-31 Thread Zan Lynx
On Tue, 2007-07-31 at 15:02 -0700, Randy Dunlap wrote:
> On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
> 
> > I was playing with huge pages and libhugetlbfs.  Small programs like
> > "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> > system slowly stops running.  One interesting thing is the "ps" command,
> > it gets stuck like this:
> 
> Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?

D'oh!  I mean 2.6.23-rc1-mm1, the 22 was a typo.  Cut & paste to be
sure:
Linux zephyr 2.6.23-rc1-mm1 #1 SMP PREEMPT Wed Jul 25 17:33:04 MDT 2007
x86_64 AMD Athlon(tm) 64 Processor 3400+ AuthenticAMD GNU/Linux

-- 
Zan Lynx <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part


Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-07-31 Thread Randy Dunlap
On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:

> I was playing with huge pages and libhugetlbfs.  Small programs like
> "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> system slowly stops running.  One interesting thing is the "ps" command,
> it gets stuck like this:

Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?

There was a hugepage problem fixed very recently, in 2.6.23-rc1 IIRC.


> psD 81001e57ed40 0 103558 103483
>  81001f061dc8 0096 81003d8586e8 81001cbadc00
>  0006 80537009 0030 807ff700
>  807ff700 807ff700 807ff700 807ff700
> Call Trace:
>  [] _spin_unlock+0x29/0x50
>  [] __down_read+0x75/0xaf
>  [] access_process_vm+0x49/0x190
>  [] proc_pid_cmdline+0xa3/0x130
>  [] proc_info_read+0xba/0x100
>  [] vfs_read+0xc5/0x180
>  [] sys_read+0x53/0x90
>  [] system_call+0x7e/0x83
> 
> and nothing will touch it after that.
> 
> Here's my kernel command line:
> root=/dev/sda2 rootfstype=reiser4 rootflags=no_write_barrier ro
> i8042.nomux elevator=cfq resume=/dev/sda3 panic=5 nmi_watchdog=2,panic
> debug hugepages=32
> 
> Here's the "huge" script I was using to run programs:
> #!/bin/sh
> export LD_PRELOAD=libhugetlbfs.so
> export HUGETLB_MORECORE=yes
> export HUGETLB_PATH=/mnt/huge
> export HUGETLB_VERBOSE=1
> exec "$@"
> 
> I don't have any more info than that at the moment but I could reproduce
> it with whatever, on request.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-06-11 Thread Andy Whitcroft
Andy Whitcroft wrote:
> H. Peter Anvin wrote:
>> Andy Whitcroft wrote:
 It definitely sounds like a memory clobber of some sort.

 Usual suspects, in addition to the input/output buffers you already
 looked at, would be the heap and the stack.  Finding where the stack
 pointer lives would be my first, instinctive guess.
>>> The stack seems to be where it should be and seems to stay pretty much
>>> in the same place as it should.  Adding checks for the heap also seem to
>>> stay within bounds.  I've tried making the stack and the heap 64k to no
>>> effect.
>>>
>>> Moving the kernel to other places in memory seems to kill the decode
>>> completely during gunzip() which may be a hint I am not sure.
>>>
>>> This thing is trying to ruin my mind.
>>>
>> Yours and mine both.  Seems like *something* is clobbering memory, but
>> what and why is a mystery.  The fact that putting the kernel in a higher
>> point in memory is a good indication that this clobber is at a
>> relatively high address.
>>
>> How much RAM does this machine have?
> 
> This is as 12GB machine.  3 numa nodes.
> 
> I checked out the location of the IDT and GDT and both seem sane, in the
> 9 range below the kernel destination.
> 
> I also note that on another machine of this type, one Node only in that
> case some of the "did work" cases do not work.  Also when I applied some
> of my patches on the top "working" cases stopped working.  So whatever
> it is is definatly related to the shape of the kernel to be loaded.
> Very confusing.

Ok, in fact when the kernel is moved elsewhere in the address space it
will decode properly.  There was a check in there for not loading at the
right address which was catching me out ... as errors do not show up as
we have no serial support.  Doh.

Once I had gotten this decoding at other addresses I simply tried moving
the base address for the kernel elsewhere.  I am able to successfully
boot the kernel at 16MB and 256MB.  This seems like something outside
the decoder scribbling.

I would not normally recommend moving the base address of the kernel.
However, this problem at least so far has only shown up on the NUMA-Q
platform which is at best described as a very small volume
sub-architecture.  There are areas in which it differers from mainstream
BIOS and we are no longer able to get details of these differences.

We actually have no proof as yet this is or is not a NUMA-Q specific
problem.  For instance these machines tend to run less modules and more
builtin stuff than the average due to an owner dislike of modules.  So
we could have a lurking kernel size issue or similar.

I am therefore proposing change the base address for NUMA-Q only (patch
to follow this email).  And that we remain aware of the issue and on the
lookout for similar breakage on mainstream x86 platforms.  At least with
this patch we can get wider testing on the rest of the kernel.

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-06-07 Thread Andy Whitcroft
H. Peter Anvin wrote:
> Andy Whitcroft wrote:
>>> It definitely sounds like a memory clobber of some sort.
>>>
>>> Usual suspects, in addition to the input/output buffers you already
>>> looked at, would be the heap and the stack.  Finding where the stack
>>> pointer lives would be my first, instinctive guess.
>> The stack seems to be where it should be and seems to stay pretty much
>> in the same place as it should.  Adding checks for the heap also seem to
>> stay within bounds.  I've tried making the stack and the heap 64k to no
>> effect.
>>
>> Moving the kernel to other places in memory seems to kill the decode
>> completely during gunzip() which may be a hint I am not sure.
>>
>> This thing is trying to ruin my mind.
>>
> 
> Yours and mine both.  Seems like *something* is clobbering memory, but
> what and why is a mystery.  The fact that putting the kernel in a higher
> point in memory is a good indication that this clobber is at a
> relatively high address.
> 
> How much RAM does this machine have?

This is as 12GB machine.  3 numa nodes.

I checked out the location of the IDT and GDT and both seem sane, in the
9 range below the kernel destination.

I also note that on another machine of this type, one Node only in that
case some of the "did work" cases do not work.  Also when I applied some
of my patches on the top "working" cases stopped working.  So whatever
it is is definatly related to the shape of the kernel to be loaded.
Very confusing.

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-06-05 Thread H. Peter Anvin
Andy Whitcroft wrote:
>>>
>> It definitely sounds like a memory clobber of some sort.
>>
>> Usual suspects, in addition to the input/output buffers you already
>> looked at, would be the heap and the stack.  Finding where the stack
>> pointer lives would be my first, instinctive guess.
> 
> The stack seems to be where it should be and seems to stay pretty much
> in the same place as it should.  Adding checks for the heap also seem to
> stay within bounds.  I've tried making the stack and the heap 64k to no
> effect.
> 
> Moving the kernel to other places in memory seems to kill the decode
> completely during gunzip() which may be a hint I am not sure.
> 
> This thing is trying to ruin my mind.
> 

Yours and mine both.  Seems like *something* is clobbering memory, but
what and why is a mystery.  The fact that putting the kernel in a higher
point in memory is a good indication that this clobber is at a
relatively high address.

How much RAM does this machine have?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-06-05 Thread Andy Whitcroft
H. Peter Anvin wrote:
> Andy Whitcroft wrote:
>> I think that my debugging says that newsetup got the compressed kernel
>> and decompressor into memory ok and execution passed to it normally.
>> But I cannot figure out where the corruption is coming from.  I tried
>> annotating the gzip decompressor to see if the input and output buffers
>> were overlapping at any time and that debug said no (unsure how reliable
>> that is).  And yet at some point the output image is munched up.
>>
>> One last piece of information.  The decompressor also always seems to
>> get to the end of the input stream in exactly the right place without
>> reporting any kind of error, that is with exactly 8 bytes left over for
>> the length and crc checks.  Which given the context sensitive nature of
>> the algorithm tends to imply the input stream was ok for the whole
>> duration of the decompress.  Yet the output stream is badly broken.
>>
>> Anyone got any wacky suggestions ...
>>
> 
> It definitely sounds like a memory clobber of some sort.
> 
> Usual suspects, in addition to the input/output buffers you already
> looked at, would be the heap and the stack.  Finding where the stack
> pointer lives would be my first, instinctive guess.

The stack seems to be where it should be and seems to stay pretty much
in the same place as it should.  Adding checks for the heap also seem to
stay within bounds.  I've tried making the stack and the heap 64k to no
effect.

Moving the kernel to other places in memory seems to kill the decode
completely during gunzip() which may be a hint I am not sure.

This thing is trying to ruin my mind.

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-06-04 Thread Pavel Machek
On Thu 2007-05-31 22:46:11, Len Brown wrote:
> On Monday 21 May 2007 08:11, Pavel Machek wrote:
> > On Thu 2007-05-17 18:42:43, Len Brown wrote:
> > > > Something similar happened to me on XE3, yes.
> > > > 
> > > > (Actual values were different; BIOS specified critical temperature at
> > > > cca 95C, but hw killed the power at cca 83C. Setting critical trip
> > > > point at 80C made the problem go away.)
> > > 
> > > Great, please file a bug and include the acpidump from the XE3
> > > and we'll fix it, rather than supporting a bogus (manual) workaround for 
> > > it.
> > 
> > It is few years since I do not have that XE3 machine.
> > 
> > > Of course if your system is running at 80*C and the hardware shuts
> > > off at 83*C, you may have a broken fan, or one clogged with dust...
> > 
> > It _did_ have broken fan. It also had broken trip points.
> 
> Thanks for clarifying this, Pavel.
> If you come upon an XE3 where Linux-2.6.22 doesn't work as well
> as Windows, please let me know.

"work as well as windows" is not good enough goal as far as I'm
concerned. Please don't break working setups.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-06-04 Thread Pavel Machek
On Mon 2007-06-04 11:02:01, Stefan Seyfried wrote:
> On Thu, May 17, 2007 at 06:35:48PM -0400, Len Brown wrote:
>  
> > Yes, SuSE enables polling mode by default, but that is just
> > distro specific "value add" that should eventually be fixed.
> 
> I will do that for openSUSE FACTORY.

Well, I still believe right solution is to enable polling mode as soon
as trip points are written (and ignoring bios updates from then
on). That gets trip point writing into functional state.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-06-04 Thread Stefan Seyfried
On Thu, May 17, 2007 at 06:35:48PM -0400, Len Brown wrote:
 
> Yes, SuSE enables polling mode by default, but that is just
> distro specific "value add" that should eventually be fixed.

I will do that for openSUSE FACTORY.
-- 
Stefan Seyfried
QA / R&D Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-06-04 Thread Stefan Seyfried
On Tue, May 22, 2007 at 11:06:36AM +0200, Pavel Machek wrote:
 
> We need to ignore trip point updates from BIOS, and we need to poll
> thermals when use overrides trip points. That's expected. Plus I've
> yet to see platform actually updating the trip points.

Thinkpad 600, whenever a trip point is crossed, all trip points are updated.
I think they implemented hysteresis that way.
ISTR that hp nx5000 did something similar, but i might be wrong on this one.
-- 
Stefan Seyfried
QA / R&D Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-06-01 Thread H. Peter Anvin
Andy Whitcroft wrote:
> 
> I think that my debugging says that newsetup got the compressed kernel
> and decompressor into memory ok and execution passed to it normally.
> But I cannot figure out where the corruption is coming from.  I tried
> annotating the gzip decompressor to see if the input and output buffers
> were overlapping at any time and that debug said no (unsure how reliable
> that is).  And yet at some point the output image is munched up.
> 
> One last piece of information.  The decompressor also always seems to
> get to the end of the input stream in exactly the right place without
> reporting any kind of error, that is with exactly 8 bytes left over for
> the length and crc checks.  Which given the context sensitive nature of
> the algorithm tends to imply the input stream was ok for the whole
> duration of the decompress.  Yet the output stream is badly broken.
> 
> Anyone got any wacky suggestions ...
> 

It definitely sounds like a memory clobber of some sort.

Usual suspects, in addition to the input/output buffers you already
looked at, would be the heap and the stack.  Finding where the stack
pointer lives would be my first, instinctive guess.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-06-01 Thread Andy Whitcroft
Andy Whitcroft wrote:
> Mel Gorman wrote:
>> On Wed, 16 May 2007, H. Peter Anvin wrote:
>>
>>> Correction, does *this patch* do it for you?
>>>
>> With these two patches in combination, previously failing machines
>> elm3b6 (x86_64 on test.kernel.org) and a modern x86 built a kernel and
>> booted correctly.
>>
>> elm3b132 and elm3b132 (x86 numaq on test.kernel.org) built with these
>> patches but silently fail on boot with no output via earlyprintk.
>> According to test.kernel.org, this failure occurs with git-newsetup
>> reverted so it is a separate problem.
> 
> Ok, I've been following up on this failure on elm3b132/3.  I moved
> forward to v2.6.22-rc2-mm1 and that also fails.  I ran a bisection on
> the git-newsetup patch in as in -mm and basically it came down to the
> first patch, ie. any and all of this tree stops the boot.
> 
> I just tried reproducing git-newsetup boot failures with the latest
> version of your tree (369f16fdd423d79640c4390915e6ab71189cb497) and that
> also fails.
> 
> Fails in this context is hard boot failure after loading the kernel and
> before anything is printed.  I also added a printf to the top of main()
> in boot/main.c and it doesn't come out, not that I really know if that
> means it got there or not.
> 
> Any suggestions how to debug this puppy?

Thanks to Peter for all his encouragement off list.

I cannot claim to have sorted this one out, I do however understand why
my experiences and Mels did not seem consistent.  Basically I am getting
inconsistent results with different machines.

I started my debug on a machine where 2.6.22-rc2 which worked and
2.6.22-rc2+newsetup which did not.  I debugged the latter and managed to
prove that it was in fact getting all the way to the kernel
decompressor, and then crashing hard.  The gzip image in memory was
intact and yet it did not decrypt correctly, the first about 60% was
intact, the remainder was damaged.

Suspecting that this was an "uncompress in place" overlap problem I
moved the compressed kernel way up out of the way and this then booted
successfully.  Experimenting I was able to get it to boot by increasing
the overlap 'gap' from 32KB's to 64KB's.  I was able to use the same
patch to boot 2.6.22-rc2-mm1 on the same problems machines. However,
this same overlap change did not fix another similar machine (the one in
the TKO grid).

I think that my debugging says that newsetup got the compressed kernel
and decompressor into memory ok and execution passed to it normally.
But I cannot figure out where the corruption is coming from.  I tried
annotating the gzip decompressor to see if the input and output buffers
were overlapping at any time and that debug said no (unsure how reliable
that is).  And yet at some point the output image is munched up.

One last piece of information.  The decompressor also always seems to
get to the end of the input stream in exactly the right place without
reporting any kind of error, that is with exactly 8 bytes left over for
the length and crc checks.  Which given the context sensitive nature of
the algorithm tends to imply the input stream was ok for the whole
duration of the decompress.  Yet the output stream is badly broken.

Anyone got any wacky suggestions ...

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-31 Thread Len Brown
On Monday 21 May 2007 08:11, Pavel Machek wrote:
> On Thu 2007-05-17 18:42:43, Len Brown wrote:
> > > Something similar happened to me on XE3, yes.
> > > 
> > > (Actual values were different; BIOS specified critical temperature at
> > > cca 95C, but hw killed the power at cca 83C. Setting critical trip
> > > point at 80C made the problem go away.)
> > 
> > Great, please file a bug and include the acpidump from the XE3
> > and we'll fix it, rather than supporting a bogus (manual) workaround for it.
> 
> It is few years since I do not have that XE3 machine.
> 
> > Of course if your system is running at 80*C and the hardware shuts
> > off at 83*C, you may have a broken fan, or one clogged with dust...
> 
> It _did_ have broken fan. It also had broken trip points.

Thanks for clarifying this, Pavel.
If you come upon an XE3 where Linux-2.6.22 doesn't work as well
as Windows, please let me know.

Given that the justification for this ill-conceived workaround
seems to have diminished to the memory of broken hardware,
it is clear that we should stay the course of removing it
so that it doesn't further confuse future users.

If SuSE violently disagrees with me, you are certainly empowered
to restore the workaround in your distribution staring at 2.6.22
as part of your value add.  However, given its history of confusing
users, it seems that it might increase your support burden rather
than decrease it.

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-29 Thread Andy Whitcroft
Mel Gorman wrote:
> On Wed, 16 May 2007, H. Peter Anvin wrote:
> 
>> Correction, does *this patch* do it for you?
>>
> 
> With these two patches in combination, previously failing machines
> elm3b6 (x86_64 on test.kernel.org) and a modern x86 built a kernel and
> booted correctly.
> 
> elm3b132 and elm3b132 (x86 numaq on test.kernel.org) built with these
> patches but silently fail on boot with no output via earlyprintk.
> According to test.kernel.org, this failure occurs with git-newsetup
> reverted so it is a separate problem.

Ok, I've been following up on this failure on elm3b132/3.  I moved
forward to v2.6.22-rc2-mm1 and that also fails.  I ran a bisection on
the git-newsetup patch in as in -mm and basically it came down to the
first patch, ie. any and all of this tree stops the boot.

I just tried reproducing git-newsetup boot failures with the latest
version of your tree (369f16fdd423d79640c4390915e6ab71189cb497) and that
also fails.

Fails in this context is hard boot failure after loading the kernel and
before anything is printed.  I also added a printf to the top of main()
in boot/main.c and it doesn't come out, not that I really know if that
means it got there or not.

Any suggestions how to debug this puppy?

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22-rc1-mm1] ehci-hcd - BUG: scheduling while atomic: rmmod/0x00000001/4568

2007-05-29 Thread Andrew Morton
On Tue, 29 May 2007 10:14:35 -0500 <[EMAIL PROTECTED]> wrote:

> 
> Sorry about that.  Would it be helpful if I verified that and sent it in
> signed off?
> 

Yes please.  The question in my mind was "did it add a race": say, the
notifier chain gets called by some external source after we've gone and
reset the device?


> 
> -Original Message-
> From: Andrew Morton [mailto:[EMAIL PROTECTED] 
> Sent: Friday, May 25, 2007 5:00 PM
> To: Greg KH
> Cc: Mattia Dongili; Linux Kernel Mailing List; Hayes, Stuart; David
> Brownell; [EMAIL PROTECTED]
> Subject: Re: [2.6.22-rc1-mm1] ehci-hcd - BUG: scheduling while atomic:
> rmmod/0x0001/4568
> 
> On Fri, 25 May 2007 14:40:05 -0700 Greg KH <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, May 21, 2007 at 11:44:37AM +0900, Mattia Dongili wrote:
> > > Hello,
> > > 
> > > with gregkh-usb-usb-ehci-cpufreq-fix.patch removing ehci-hcd causes 
> > > the following BUG:
> > 
> > Thanks for letting me know.
> > 
> > Stuart, any help here?
> 
> pretty obvious.  cpufreq_unregister_notifier() sleeps, and that patch
> causes it to be called under spinlock.
> 
> Something like this...
> 
> --- a/drivers/usb/host/ehci-hcd.c~fix-gregkh-usb-usb-ehci-cpufreq-fix
> +++ a/drivers/usb/host/ehci-hcd.c
> @@ -452,14 +452,14 @@ static void ehci_stop (struct usb_hcd *h
>   if (HC_IS_RUNNING (hcd->state))
>   ehci_quiesce (ehci);
>  
> -#ifdef CONFIG_CPU_FREQ
> - cpufreq_unregister_notifier(&ehci->cpufreq_transition,
> - CPUFREQ_TRANSITION_NOTIFIER);
> -#endif
>   ehci_reset (ehci);
>   ehci_writel(ehci, 0, &ehci->regs->intr_enable);
>   spin_unlock_irq(&ehci->lock);
>  
> +#ifdef CONFIG_CPU_FREQ
> + cpufreq_unregister_notifier(&ehci->cpufreq_transition,
> + CPUFREQ_TRANSITION_NOTIFIER);
> +#endif
>   /* let companion controllers work when we aren't */
>   ehci_writel(ehci, 0, &ehci->regs->configured_flag);
>  
> _
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [2.6.22-rc1-mm1] ehci-hcd - BUG: scheduling while atomic: rmmod/0x00000001/4568

2007-05-29 Thread Stuart_Hayes

Sorry about that.  Would it be helpful if I verified that and sent it in
signed off?

Thanks
Stuart

-Original Message-
From: Andrew Morton [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 25, 2007 5:00 PM
To: Greg KH
Cc: Mattia Dongili; Linux Kernel Mailing List; Hayes, Stuart; David
Brownell; [EMAIL PROTECTED]
Subject: Re: [2.6.22-rc1-mm1] ehci-hcd - BUG: scheduling while atomic:
rmmod/0x0001/4568

On Fri, 25 May 2007 14:40:05 -0700 Greg KH <[EMAIL PROTECTED]> wrote:

> On Mon, May 21, 2007 at 11:44:37AM +0900, Mattia Dongili wrote:
> > Hello,
> > 
> > with gregkh-usb-usb-ehci-cpufreq-fix.patch removing ehci-hcd causes 
> > the following BUG:
> 
> Thanks for letting me know.
> 
> Stuart, any help here?

pretty obvious.  cpufreq_unregister_notifier() sleeps, and that patch
causes it to be called under spinlock.

Something like this...

--- a/drivers/usb/host/ehci-hcd.c~fix-gregkh-usb-usb-ehci-cpufreq-fix
+++ a/drivers/usb/host/ehci-hcd.c
@@ -452,14 +452,14 @@ static void ehci_stop (struct usb_hcd *h
if (HC_IS_RUNNING (hcd->state))
ehci_quiesce (ehci);
 
-#ifdef CONFIG_CPU_FREQ
-   cpufreq_unregister_notifier(&ehci->cpufreq_transition,
-   CPUFREQ_TRANSITION_NOTIFIER);
-#endif
ehci_reset (ehci);
ehci_writel(ehci, 0, &ehci->regs->intr_enable);
spin_unlock_irq(&ehci->lock);
 
+#ifdef CONFIG_CPU_FREQ
+   cpufreq_unregister_notifier(&ehci->cpufreq_transition,
+   CPUFREQ_TRANSITION_NOTIFIER);
+#endif
/* let companion controllers work when we aren't */
ehci_writel(ehci, 0, &ehci->regs->configured_flag);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: evm BUG when reading sysfs file

2007-05-29 Thread Reiner Sailer
[EMAIL PROTECTED] (Joseph Fannin) wrote on 05/26/2007 02:29:07 AM:

> On Fri, May 25, 2007 at 10:28:22PM -0400, Reiner Sailer wrote:
> > On Tue, 22 May 2007 03:25:48 -0400
> > [EMAIL PROTECTED] (Joseph Fannin) wrote:
> >
> 
> > > I've been getting this since 2.6.21-rc7-mm1:
> 

> >
> > Joseph,
> >
> > thank you for posting this problem. I cannot reconstruct it on my 
machine.
> >
> > Could you tell us which kernel configuration you used 
(drivers/char/tpm
> > and security settings) ?
> > Does it disappear when IMA is disabled in the kernel config?
> 
> I've found that disabling CONFIG_SYSFS_DEPRECATED makes the
> BUG message go away; maybe that's what you're missing?
> 
> I've also attached my .config -- but it has lots of stuff turned
> on, so it may be faster to try flipping CONFIG_SYSFS_DEPRECATED on a
> slimmer config, if you'd like.
> 
> Disabling IMA doesn't change the message; it's still there.
> 
> Thanks!
> --
> Joseph Fannin
> [EMAIL PROTECTED]
> 

Thank you very much for these helpful details. We are working on a 
solution.

Best
Reiner
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 Implementing fan/thermal control in userspace - Was: [cannot change thermal trip points]

2007-05-28 Thread Pavel Machek
On Mon 2007-05-28 13:50:36, Matthew Garrett wrote:
> On Mon, May 28, 2007 at 12:58:51PM +0200, Pavel Machek wrote:
> 
> > It would happily occur under Windows. You just needed to load machine
> > in a way that cpu stayed ~80C.
> 
> So replace the DSDT. All the problems get solved that way.

We are in the middle of stable series, and Len's patch breaks existing
setups without prior warning. That's "no-no". Of course I could
replace DSDT. I also could throw that machine out of window.

I'm not sure what we are arguing about here, that patch is broken.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 Implementing fan/thermal control in userspace - Was: [cannot change thermal trip points]

2007-05-28 Thread Matthew Garrett
On Mon, May 28, 2007 at 12:58:51PM +0200, Pavel Machek wrote:

> It would happily occur under Windows. You just needed to load machine
> in a way that cpu stayed ~80C.

So replace the DSDT. All the problems get solved that way.
-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 Implementing fan/thermal control in userspace - Was: [cannot change thermal trip points]

2007-05-28 Thread Pavel Machek
Hi!

> > > Because, as Len has pointed out, you end up with two different ideas 
> > > about what the trip points are - the kernel's and the hardware's. That 
> > > works fine until some event in the firmware either forcibly 
> > > resynchronises the two or makes assumptions about the spec-compliance of 
> > > the interpreter.
> > 
> > ...and suggested workaround is to drive fans directly from userspace,
> > which not only violates the specs and has all the problems with
> > desynchronized state, but ALSO FAILS TO WORK IN PRACTICE.
> 
> I don't think that's obviously true. 11.3.2 of the 3.0 spec states:

> "A package consisting of references to all active cooling devices that 
> should be engaged when the associated active cooling threshold (_ACx) is 
> exceeded." 

We'd need:

a) way to tell acpi not to control fans any more

b) in kernel watchdog so that acpi starts controlling fans after oom
killer

c) way to control passive cooling from userspace.

Not something doable for 2.6.22.  


> > > The interface would need to be more complicated than that if you wanted 
> > > to be able to implement hysteresis, and there's the potential for 
> > > hardware damage if paramaters are set inappropriately. Even then, 
> > > there's no easy way of programatically determining whether it would work 
> > > on any given hardware.
> > 
> > Not sure why you try to scare people with 'hardware damage'. HP XE3
> > bios already _was_ damaging hardware (it cooked the hard drive using
> > cpu as a heater), and no acpi magic can damage correctly working
> > machine.
> 
> Given that this presumably didn't occur under Windows, I think it would
> be significantly better to figure out why and then fix that. 

It would happily occur under Windows. You just needed to load machine
in a way that cpu stayed ~80C.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 Implementing fan/thermal control in userspace - Was: [cannot change thermal trip points]

2007-05-27 Thread Matthew Garrett
On Fri, May 25, 2007 at 06:38:15AM +, Pavel Machek wrote:
> Hi!
> > Because, as Len has pointed out, you end up with two different ideas 
> > about what the trip points are - the kernel's and the hardware's. That 
> > works fine until some event in the firmware either forcibly 
> > resynchronises the two or makes assumptions about the spec-compliance of 
> > the interpreter.
> 
> ...and suggested workaround is to drive fans directly from userspace,
> which not only violates the specs and has all the problems with
> desynchronized state, but ALSO FAILS TO WORK IN PRACTICE.

I don't think that's obviously true. 11.3.2 of the 3.0 spec states:

"A package consisting of references to all active cooling devices that 
should be engaged when the associated active cooling threshold (_ACx) is 
exceeded." 

(referring to _ALx objects).

> > The interface would need to be more complicated than that if you wanted 
> > to be able to implement hysteresis, and there's the potential for 
> > hardware damage if paramaters are set inappropriately. Even then, 
> > there's no easy way of programatically determining whether it would work 
> > on any given hardware.
> 
> Not sure why you try to scare people with 'hardware damage'. HP XE3
> bios already _was_ damaging hardware (it cooked the hard drive using
> cpu as a heater), and no acpi magic can damage correctly working
> machine.

Given that this presumably didn't occur under Windows, I think it would 
be significantly better to figure out why and then fix that. 
Alternatively, if the firmware tables are actually genuinely broken in a 
way that's impossible to repair, you can replace the table. That has the 
advantage that there's no risk of the platform and the OS becoming 
confused.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 Implementing fan/thermal control in userspace - Was: [cannot change thermal trip points]

2007-05-27 Thread Pavel Machek
Hi!

> 
> > I doubt it is impossible, would you mind sharing your knowledge why you
> > think it is impossible or point to some related discussion, pls.
> 
> Because, as Len has pointed out, you end up with two different ideas 
> about what the trip points are - the kernel's and the hardware's. That 
> works fine until some event in the firmware either forcibly 
> resynchronises the two or makes assumptions about the spec-compliance of 
> the interpreter.

...and suggested workaround is to drive fans directly from userspace,
which not only violates the specs and has all the problems with
desynchronized state, but ALSO FAILS TO WORK IN PRACTICE.

> > I could imagine an implementation for this, that e.g. critical...active9
> > get module parameters. BIOS updates for trip points get ignored as soon
> > as one is set and you can only decrease a value. Nothing bad can happen
> > and it will make some people happy (yes it's hacky, violates the specs
> > and so on..., but some more people have a working machine). Will this
> > (or similar) get accepted?
> 
> The interface would need to be more complicated than that if you wanted 
> to be able to implement hysteresis, and there's the potential for 
> hardware damage if paramaters are set inappropriately. Even then, 
> there's no easy way of programatically determining whether it would work 
> on any given hardware.

Not sure why you try to scare people with 'hardware damage'. HP XE3
bios already _was_ damaging hardware (it cooked the hard drive using
cpu as a heater), and no acpi magic can damage correctly working
machine.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: evm BUG when reading sysfs file

2007-05-26 Thread Joseph Fannin
On Fri, May 25, 2007 at 10:28:22PM -0400, Reiner Sailer wrote:
> On Tue, 22 May 2007 03:25:48 -0400
> [EMAIL PROTECTED] (Joseph Fannin) wrote:
>

> > I've been getting this since 2.6.21-rc7-mm1:

> > [2.379310] BUG: unable to handle kernel paging request at virtual 
> > address 4400d340
> > [2.379491]  printing eip:
> > [2.379573] c021c978
> > [2.379656] *pdpt = 0353c001
> > [2.379739] *pde = 
> > [2.379824] Oops:  [#1]
> > [2.379906] PREEMPT SMP
> > [2.380059] Modules linked in: thermal processor dm_mod
> > [2.380288] CPU:0
> > [2.380289] EIP:0060:[]Not tainted VLI
> > [2.380291] EFLAGS: 00010297   (2.6.22-rc1-mm1 #2)
> > [2.380547] EIP is at vsnprintf+0x448/0x5d0
> > [2.380633] eax: 4400d340   ebx: c348f034   ecx: 4400d340   edx: fffe
> > [2.380721] esi: c03e0100   edi: 4400d340   ebp: c357ecc0   esp: c357ec68
> > [2.380810] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> > [2.380898] Process udevtrigger (pid: 686, ti=c357e000 task=c1876df0 
> > task.ti=c357e000)
> > [2.380987] Stack: c348f014 0fec c03e1c60 c03e3cec c357eccc c0499b88 
> > c357ece0 c0282513
> > [2.381428]c348f014 0fec 3cb70fcb c348f034   
> >  
> > [2.381867] fffe c03e017c c357ed18 0034 c0494a20 
> > c357ece0 c021cb9f
> > [2.382305] Call Trace:
> > [2.382470]  [] sprintf+0x1f/0x30
> > [2.382594]  [] show_uevent+0xed/0x130
> > [2.382720]  [] dev_attr_show+0x23/0x30
> > [2.382843]  [] sysfs_read_file+0x97/0x140
> > [2.382968]  [] vfs_read+0xaf/0x180
> > [2.383096]  [] kernel_read+0x3a/0x50
> > [2.383221]  [] evm_calc_hash+0x11c/0x240
> > [2.383347]  [] evm_file_free+0xb9/0x330
> > [2.383470]  [] __fput+0xba/0x180
> > [2.383593]  [] fput+0x22/0x40
> > [2.383715]  [] filp_close+0x47/0x70
> > [2.383839]  [] sys_close+0x69/0xc0
> > [2.383965]  [] syscall_call+0x7/0xb
> > [2.384092]  [] 0xb7ebd0a7
> > [2.384212]  ===
> > [2.384295] INFO: lockdep is turned off.
> > [2.384379] Code: 21 fd ff ff c6 03 25 e9 19 fd ff ff 8d 4f 04 b8
> > 3b a2 3d c0 8b 55 e4 89 4d 08 8b 3f 81 ff ff 0f 00 00 0f 46 f8 89 f9
> > 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 89 c6 8b 45 e0
> > f6 45
> > [2.386787] EIP: [] vsnprintf+0x448/0x5d0 SS:ESP
> > 0068:c357ec68

>
> Joseph,
>
> thank you for posting this problem. I cannot reconstruct it on my machine.
>
> Could you tell us which kernel configuration you used (drivers/char/tpm
> and security settings) ?
> Does it disappear when IMA is disabled in the kernel config?

I've found that disabling CONFIG_SYSFS_DEPRECATED makes the
BUG message go away; maybe that's what you're missing?

I've also attached my .config -- but it has lots of stuff turned
on, so it may be faster to try flipping CONFIG_SYSFS_DEPRECATED on a
slimmer config, if you'd like.

Disabling IMA doesn't change the message; it's still there.

Thanks!
--
Joseph Fannin
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: evm BUG when reading sysfs file

2007-05-25 Thread Reiner Sailer
On Tue, 22 May 2007 03:25:48 -0400
[EMAIL PROTECTED] (Joseph Fannin) wrote:

> 
> I've been getting this since 2.6.21-rc7-mm1:
> 
> [2.379310] BUG: unable to handle kernel paging request at virtual 
address 4400d340
> [2.379491]  printing eip:
> [2.379573] c021c978
> [2.379656] *pdpt = 0353c001
> [2.379739] *pde = 
> [2.379824] Oops:  [#1]
> [2.379906] PREEMPT SMP
> [2.380059] Modules linked in: thermal processor dm_mod
> [2.380288] CPU:0
> [2.380289] EIP:0060:[]Not tainted VLI
> [2.380291] EFLAGS: 00010297   (2.6.22-rc1-mm1 #2)
> [2.380547] EIP is at vsnprintf+0x448/0x5d0
> [2.380633] eax: 4400d340   ebx: c348f034   ecx: 4400d340   edx: 
fffe
> [2.380721] esi: c03e0100   edi: 4400d340   ebp: c357ecc0   esp: 
c357ec68
> [2.380810] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [2.380898] Process udevtrigger (pid: 686, ti=c357e000 task=c1876df0 
task.ti=c357e000)
> [2.380987] Stack: c348f014 0fec c03e1c60 c03e3cec c357eccc 
c0499b88 c357ece0 c0282513
> [2.381428]c348f014 0fec 3cb70fcb c348f034  
  
> [2.381867] fffe c03e017c c357ed18 0034 
c0494a20 c357ece0 c021cb9f
> [2.382305] Call Trace:
> [2.382470]  [] sprintf+0x1f/0x30
> [2.382594]  [] show_uevent+0xed/0x130
> [2.382720]  [] dev_attr_show+0x23/0x30
> [2.382843]  [] sysfs_read_file+0x97/0x140
> [2.382968]  [] vfs_read+0xaf/0x180
> [2.383096]  [] kernel_read+0x3a/0x50
> [2.383221]  [] evm_calc_hash+0x11c/0x240
> [2.383347]  [] evm_file_free+0xb9/0x330
> [2.383470]  [] __fput+0xba/0x180
> [2.383593]  [] fput+0x22/0x40
> [2.383715]  [] filp_close+0x47/0x70
> [2.383839]  [] sys_close+0x69/0xc0
> [2.383965]  [] syscall_call+0x7/0xb
> [2.384092]  [] 0xb7ebd0a7
> [2.384212]  ===
> [2.384295] INFO: lockdep is turned off.
> [2.384379] Code: 21 fd ff ff c6 03 25 e9 19 fd ff ff 8d 4f 04 b8
> 3b a2 3d c0 8b 55 e4 89 4d 08 8b 3f 81 ff ff 0f 00 00 0f 46 f8 89 f9
> 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 89 c6 8b 45 e0
> f6 45
> [2.386787] EIP: [] vsnprintf+0x448/0x5d0 SS:ESP 
0068:c357ec68
> 
> This comes a bit after IMA bails out successfully, if that's 
relevant:
> 
> [1.708761] ima (ima_init): No TPM chip found(rc = -19), activating
> TPM-bypass!

Joseph,

thank you for posting this problem. I cannot reconstruct it on my machine.

Could you tell us which kernel configuration you used (drivers/char/tpm 
and security settings) ?
Does it disappear when IMA is disabled in the kernel config?

Reiner
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22-rc1-mm1] ehci-hcd - BUG: scheduling while atomic: rmmod/0x00000001/4568

2007-05-25 Thread Andrew Morton
On Fri, 25 May 2007 14:40:05 -0700 Greg KH <[EMAIL PROTECTED]> wrote:

> On Mon, May 21, 2007 at 11:44:37AM +0900, Mattia Dongili wrote:
> > Hello,
> > 
> > with gregkh-usb-usb-ehci-cpufreq-fix.patch removing ehci-hcd causes the
> > following BUG:
> 
> Thanks for letting me know.
> 
> Stuart, any help here?

pretty obvious.  cpufreq_unregister_notifier() sleeps, and that patch
causes it to be called under spinlock.

Something like this...

--- a/drivers/usb/host/ehci-hcd.c~fix-gregkh-usb-usb-ehci-cpufreq-fix
+++ a/drivers/usb/host/ehci-hcd.c
@@ -452,14 +452,14 @@ static void ehci_stop (struct usb_hcd *h
if (HC_IS_RUNNING (hcd->state))
ehci_quiesce (ehci);
 
-#ifdef CONFIG_CPU_FREQ
-   cpufreq_unregister_notifier(&ehci->cpufreq_transition,
-   CPUFREQ_TRANSITION_NOTIFIER);
-#endif
ehci_reset (ehci);
ehci_writel(ehci, 0, &ehci->regs->intr_enable);
spin_unlock_irq(&ehci->lock);
 
+#ifdef CONFIG_CPU_FREQ
+   cpufreq_unregister_notifier(&ehci->cpufreq_transition,
+   CPUFREQ_TRANSITION_NOTIFIER);
+#endif
/* let companion controllers work when we aren't */
ehci_writel(ehci, 0, &ehci->regs->configured_flag);
 
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22-rc1-mm1] ehci-hcd - BUG: scheduling while atomic: rmmod/0x00000001/4568

2007-05-25 Thread Greg KH
On Mon, May 21, 2007 at 11:44:37AM +0900, Mattia Dongili wrote:
> Hello,
> 
> with gregkh-usb-usb-ehci-cpufreq-fix.patch removing ehci-hcd causes the
> following BUG:

Thanks for letting me know.

Stuart, any help here?

thanks,

greg k-h



> [  459.800033] BUG: scheduling while atomic: rmmod/0x0001/4568
> [  459.800045]  [] dump_trace+0x63/0x1ec
> [  459.800055]  [] show_trace_log_lvl+0x1a/0x30
> [  459.800066]  [] show_trace+0x12/0x14
> [  459.800099]  [] dump_stack+0x16/0x18
> [  459.800135]  [] __sched_text_start+0x56/0x7db
> [  459.800142]  [] wait_for_completion+0x65/0x9b
> [  459.800147]  [] synchronize_rcu+0x2d/0x33
> [  459.800154]  [] synchronize_srcu+0x23/0x5f
> [  459.800160]  [] srcu_notifier_chain_unregister+0x43/0x4d
> [  459.800185]  [] cpufreq_unregister_notifier+0x22/0x32
> [  459.800203]  [] ehci_stop+0x4f/0xb7 [ehci_hcd]
> [  459.800248]  [] usb_remove_hcd+0x97/0xd7 [usbcore]
> [  459.800280]  [] usb_hcd_pci_remove+0x18/0x6a [usbcore]
> [  459.800317]  [] pci_device_remove+0x1c/0x3d
> [  459.800324]  [] __device_release_driver+0x74/0x90
> [  459.800332]  [] driver_detach+0x81/0xc2
> [  459.800337]  [] bus_remove_driver+0x5d/0x7c
> [  459.800342]  [] driver_unregister+0xb/0xd
> [  459.800347]  [] pci_unregister_driver+0x13/0x65
> [  459.800351]  [] ehci_hcd_cleanup+0x10/0x12 [ehci_hcd]
> [  459.800360]  [] sys_delete_module+0x187/0x1ae
> [  459.800367]  [] sysenter_past_esp+0x5f/0x85
> [  459.800373]  [] 0xe410
> [  459.800384]  ===
> 
> static void ehci_stop (struct usb_hcd *hcd)
> {
>   ...
>   spin_lock_irq(&ehci->lock);
>   if (HC_IS_RUNNING (hcd->state))
>   ehci_quiesce (ehci);
> 
> #ifdef CONFIG_CPU_FREQ
>   cpufreq_unregister_notifier(&ehci->cpufreq_transition,
>   CPUFREQ_TRANSITION_NOTIFIER);
> #endif
> 
> -- 
> mattia
> :wq!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: evm BUG when reading sysfs file

2007-05-25 Thread Mimi Zohar
Andrew Morton <[EMAIL PROTECTED]> wrote on 05/22/2007 05:23:05 PM:

> On Tue, 22 May 2007 03:25:48 -0400
> [EMAIL PROTECTED] (Joseph Fannin) wrote:
> 
> > This comes a bit after IMA bails out successfully, if that's 
relevant:
...
> > 
> > [1.708761] ima (ima_init): No TPM chip found(rc = -19), activating
> > TPM-bypass!
> 
> OK, thanks.  Does the crash go away if you disable IMA, SLIM, etc in 
.config?
> 
> I think I'll drop all those patches, actually - they don't seem to be 
going
> anywhere.

You are absolutely right, we have been stalled on EVM/IMA/SLIM, while 
trying
to figure out the mtime and revocation issues. In retrospect we tried to 
submit
too much complex code all at once. 

We will resubmit in small functional pieces as the technical issues have 
been
resolved, starting with the LIM API and hooks, which are independent of 
the 
mtime and revocation issues.

Mimi Zohar


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-25 Thread Alan Cox
> > To set it up the user will have to know the parameters and have typed
> > them into the BIOS (if it even has an option for it). I see no problem
> 
> Sorry, see no problem which way?  

Forcing the user to provide the geometry. Historically that driver dealt
with the main disks the user had. Today its only use is specialist
recovery work. Anyone recovering a disk has to get the geometry data into
the BIOS (if the BIOS even allows it - many now don't) and will therefore
know it for hd= arguments as well

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-24 Thread Cornelia Huck
On Thu, 24 May 2007 15:11:08 -0700,
"Williams, Dan J" <[EMAIL PROTECTED]> wrote:


> --- a/async_tx/async_memcpy.c
> +++ b/async_tx/async_memcpy.c
> @@ -56,6 +56,7 @@ async_memcpy(struct page *dest, struct page *src,
> unsigned int dest_offset,
>   int_en) : NULL;
>  
>   if (tx) { /* run the memcpy asynchronously */
> + #ifdef CONFIG_HAS_DMA
>   dma_addr_t dma_addr;
>   enum dma_data_direction dir;

Can you factor out the async stuff into a function so you can use the
#ifdefs to define different functions rather than put them in the middle
of a complex function?

(Maybe you should rather use #ifdef CONFIG_DMA_ENGINE, since the async
part is not needed for !DMA_ENGINE either.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-24 Thread H. Peter Anvin
Alan Cox wrote:
>> The question I'm asking is: do you think it's better to remove this from
>> hd.c, or do you think it's better to add it back boot code BIOS
>> detection (and take the risk of poking an ST-506 disk with legacy data
>> with parameters which may belong to another disk -- keep in mind this
>> can permanently damage an ST-506 disk)?
> 
> To set it up the user will have to know the parameters and have typed
> them into the BIOS (if it even has an option for it). I see no problem

Sorry, see no problem which way?  My concern here is with getting
incorrect data, not getting no data.  The BIOS probe amounts to pulling
data out of two tables (INT 0x41/0x46, corresponding to BIOS drives 0x80
and 0x81 -- the EDD 1.1 spec is quite specific that if implemented they
follow the BIOS drive numbers, not the ATA port addresses), and hoping
that they actually match the drives that hd.c uses.  That scares me,
since we're talking about old legacy data here.

I'm not concerned with what's easy, I'm concerned with what's the right
thing to do.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-24 Thread Alan Cox
> The question I'm asking is: do you think it's better to remove this from
> hd.c, or do you think it's better to add it back boot code BIOS
> detection (and take the risk of poking an ST-506 disk with legacy data
> with parameters which may belong to another disk -- keep in mind this
> can permanently damage an ST-506 disk)?

To set it up the user will have to know the parameters and have typed
them into the BIOS (if it even has an option for it). I see no problem

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-24 Thread H. Peter Anvin
Alan Cox wrote:
> I believe the technical description for the comment is "bullshit" 8)
> 
> Almost all MFM controllers and RLL controllers will only run at the
> standard primary and secondary ATA address.

Yes, but that doesn't (necessarily) apply to the controller that is
likely to be the primary controller in a modern system.

The whole point is that what the BIOS considers primary isn't
necessarily tied to the standard ATA addresses anymore, with SATA
controllers being primary.

The question I'm asking is: do you think it's better to remove this from
hd.c, or do you think it's better to add it back boot code BIOS
detection (and take the risk of poking an ST-506 disk with legacy data
with parameters which may belong to another disk -- keep in mind this
can permanently damage an ST-506 disk)?

> Given the intended use of the driver today I don't see a big problem in
> requiring "hd=" although you have to question the point of this boot code
> rewrite when it seems primarily to be removing features 

I've been trying to remove features that are obsolete and/or broken.  I
don't have access to this particular ancient hardware, nor any system
that can even host them.   It's very easy to add the stuff back in the
boot code; it's a much more tricky/annoying question if one *should* do
so.  That's part of a rewrite/cleanup.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-24 Thread Alan Cox
> > It thus needs fixing not removing.
> > 
> 
> Opinions, anyone (especially Alan):
> 
> http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=commitdiff;h=369f16fdd423d79640c4390915e6ab71189cb497

I believe the technical description for the comment is "bullshit" 8)

Almost all MFM controllers and RLL controllers will only run at the
standard primary and secondary ATA address.

Given the intended use of the driver today I don't see a big problem in
requiring "hd=" although you have to question the point of this boot code
rewrite when it seems primarily to be removing features 

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-24 Thread H. Peter Anvin
Alan Cox wrote:
> 
> hd.c can drive MFM and RLL disks and drivers/ide cannot. Although it
> really wants burying further down the config tree the ability to read MFM
> and RLL disks when recovering ancient data is useful and people do
> actually use this driver now and then rescuing stuff like twenty year old
> datasets.
> 
> It thus needs fixing not removing.
> 

Opinions, anyone (especially Alan):

http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=commitdiff;h=369f16fdd423d79640c4390915e6ab71189cb497

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-24 Thread Williams, Dan J
> From: Cornelia Huck [mailto:[EMAIL PROTECTED]
> On Wed, 23 May 2007 10:05:39 +0200,
> Martin Schwidefsky <[EMAIL PROTECTED]> wrote:
> 
> > We are trying to get rid of dma-mapping.h, see the last change to
the
> > file with commit 411f0f3edc141a582190d3605cadd1d993abb6df. I don't
think
> > we should reintroduce dma related definition but split the async_tx
in a
> > way that allows to compile it on an architecture with
CONFIG_NO_DMA=y
> > (yes I know that is harder that to just add the dma stubs).

Not harder, it just a question of which is "uglier", but given the
direction taken with CONFIG_HAS_DMA it now appears more appropriate to
change async_tx.

> > You've said that there is a software implementation if there is no
dma
> > engine present. This software implementation should be independent
of
> > dma-mapping.h. Without having looked at the code, isn't it possible
to
> > isolate that software implementation into its own C file? That would
be
> > the only one that gets compiled for s390.
> 
> Taking a quick look at the async_*.c stuff, the functions in question
> basically seem to be of the form
> 
> check_if_we_can_do_it_async();
> if (async_ok) {
>   /* do async stuff */
>   /* that's where the dma mapping creeps in */
> } else {
>   /* do it sync */
>   /* seems fine for us */
> }
> 
> So you should be able to factor out (say) async_memset_{sync,async}()
> and put it into async_memset_{sync,async}.c. async_memset() would then
> be
> 
> async_memset()
> {
> #if CONFIG_HAS_DMA
>   if (check_if_we_can_do_at_async())
>   async_memset_async();
> #endif
>   return async_memset_sync();
> }
> 
> Kconfig could then do
> 
> config ASYNC_MEMSET
>   default m
>   tristate "async_memset support"
>   select ASYNC_MEMSET_ASYNC if HAS_DMA
> 
> config ASYNC_MEMSET_ASYNC
>   depends on HAS_DMA
>   tristate "async_memset async via dma support"
> 
> Thoughts?

I took your approach of encasing the HAS_DMA dependent portion of the
code in #ifdef CONFIG_HAS_DMA, and I dropped the dma-mapping-stub patch.
I let the compiler do the factoring out for me by making
async_tx_find_channel become the following when CONFIG_DMA_ENGINE=n:

static inline struct dma_chan *
async_tx_find_channel(struct dma_async_tx_descriptor *depend_tx,
enum dma_transaction_type tx_type)
{
return NULL;
}

---

diff --git a/async_tx/async_memcpy.c b/async_tx/async_memcpy.c
index 7896ba8..547976e 100644
--- a/async_tx/async_memcpy.c
+++ b/async_tx/async_memcpy.c
@@ -56,6 +56,7 @@ async_memcpy(struct page *dest, struct page *src,
unsigned int dest_offset,
int_en) : NULL;
 
if (tx) { /* run the memcpy asynchronously */
+   #ifdef CONFIG_HAS_DMA
dma_addr_t dma_addr;
enum dma_data_direction dir;
 
@@ -75,6 +76,9 @@ async_memcpy(struct page *dest, struct page *src,
unsigned int dest_offset,
 
async_tx_submit(chan, tx, flags, depend_tx, callback,
callback_param);
+   #else
+   BUG();
+   #endif /* CONFIG_HAS_DMA */
} else { /* run the memcpy synchronously */
void *dest_buf, *src_buf;
pr_debug("%s: (sync) len: %zu\n", __FUNCTION__, len);
diff --git a/async_tx/async_memset.c b/async_tx/async_memset.c
index 736c7c2..9166a27 100644
--- a/async_tx/async_memset.c
+++ b/async_tx/async_memset.c
@@ -55,6 +55,7 @@ async_memset(struct page *dest, int val, unsigned int
offset,
int_en) : NULL;
 
if (tx) { /* run the memset asynchronously */
+   #ifdef CONFIG_HAS_DMA
dma_addr_t dma_addr;
enum dma_data_direction dir;
 
@@ -67,6 +68,9 @@ async_memset(struct page *dest, int val, unsigned int
offset,
 
async_tx_submit(chan, tx, flags, depend_tx, callback,
callback_param);
+   #else
+   BUG();
+   #endif /* CONFIG_HAS_DMA */
} else { /* run the memset synchronously */
void *dest_buf;
pr_debug("%s: (sync) len: %zu\n", __FUNCTION__, len);
diff --git a/async_tx/async_xor.c b/async_tx/async_xor.c
index 37ae5fc..5e4bc29 100644
--- a/async_tx/async_xor.c
+++ b/async_tx/async_xor.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 
+#ifdef CONFIG_HAS_DMA
 static void
 do_async_xor(struct dma_async_tx_descriptor *tx, struct dma_device
*device,
struct dma_chan *chan, struct page *dest, struct page
**src_list,
@@ -62,6 +63,17 @@ do_async_xor(struct dma_async_tx_descriptor *tx,
struct dma_device *device,
async_tx_submit(chan, tx, flags, depend_tx, callback,
callback_param);
 }
+#else
+static void
+do_async_xor(struct dma_async_tx_descriptor *tx, struct dma_device
*device,
+   struct dma_chan *chan, struct page *dest, struct page
**src_list,
+   unsigned int offset, unsigned int src_cnt, size_t len,
+   enum async_tx_

Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-24 Thread H. Peter Anvin
Alan Cox wrote:
>>> hd.c:(.init.text+0x44a7d): undefined reference to `drive_info'
>>> hd.c:(.init.text+0x44a89): undefined reference to `drive_info'
>>> hd.c:(.init.text+0x44a95): undefined reference to `drive_info'
>>> hd.c:(.init.text+0x44aa1): undefined reference to `drive_info'
>>> hd.c:(.init.text+0x44aad): undefined reference to `drive_info'
>>> drivers/built-in.o:hd.c:(.init.text+0x44ab9): more undefined references to 
>>> `drive_info' follow
>>> make[1]: *** [.tmp_vmlinux1] Error 1
>>>
>>> <--  snip  -->
>>>
>>> Considering the fact that we have two more recent drivers with the same 
>>> functionality, it might be an option to simply remove this driver...
>> Care to send a patch?
> 
> hd.c can drive MFM and RLL disks and drivers/ide cannot. Although it
> really wants burying further down the config tree the ability to read MFM
> and RLL disks when recovering ancient data is useful and people do
> actually use this driver now and then rescuing stuff like twenty year old
> datasets.
> 
> It thus needs fixing not removing.

Why is this driver parked in drivers/ide/legacy when the companion
driver, xd.c, is in drivers/block (where hd.c used to be at one point,
too)?  Especially so since it's not really for IDE, but for ST-506.

HOWEVER, the code that fails above hard-assumes that the ST-506 disks
that you have are your primary system drives, which is obviously a wrong
assumption -- ST-506 drives were obsolete quite a while before Linux
existed[1].

xd.c, on the other hand, seems to actually go out and query the hardware
directly.  I guess this is understandably, since this controller would
never have been primary.

If hd.c is pure legacy, which it obviously is, should we remove the code
to assume the BIOS settings are the MFM/RLL settings (i.e. the __i386__
clause), and instead do something more like the __arm__ clause which
means that "if you really want to use this you have to specify the
parameters manually"?

-hpa


[1] The 386-16 that I had access to at Northwestern, which with 0.59
BogoMIPS was the slowest Linux system in existence until Linux was
ported to other architectures, might have been an ST-506 drive, but I'm
not sure.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 Implementing fan/thermal control in userspace - Was: [cannot change thermal trip points]

2007-05-24 Thread Thomas Renninger
On Thu, 2007-05-24 at 15:36 +0100, Matthew Garrett wrote:
> On Thu, May 24, 2007 at 04:16:53PM +0200, Thomas Renninger wrote:
> 
> > I doubt it is impossible, would you mind sharing your knowledge why you
> > think it is impossible or point to some related discussion, pls.
> 
> Because, as Len has pointed out, you end up with two different ideas 
> about what the trip points are - the kernel's and the hardware's. That 
> works fine until some event in the firmware either forcibly 
> resynchronises the two or makes assumptions about the spec-compliance of 
> the interpreter.

Not sure what exactly you'd like to do in userspace, maybe you can be a
bit more precise here:
  a) Doing whole thermal management in userspace, reading temp, writing
 fan and cpufreq_max_freq, shutting down machine,...
  b) Workaround not switching on fans by double checking fan/temperature
 by a userspace daemon and try to finally trigger the switch by 
 writing to /proc/acpi/fan/state (or corresponding /sys,..)

IMO we need a some kind of fan watchdog like Henrique described
recently, maybe this could be put in userspace not sure.
Currently the fan can runs out of sync easily if the fan state is
changed behind the OSs back.


> > Yes, trip points are overridden by BIOS on HPs and what is the problem?
> > The workaround won't work for them, but it still does on others
> > (mainly on ThinkPads which have passive tp at about 89 C and critical on
> > 91 C).
> 
> You don't know whether the workaround will work or not
Hmm, I don't get the point. If it works it's great, if not you have a
problem anyway and can at least test a workaround.
>  until you've 
> performed a full audit of the platform firmware, which is going to 
> potentially change between BIOS versions. It's entirely legal for the 
> firmware to behave in this way, and even beneficial under various 
> circumstances.
But that's exactly what all these workarounds are for. You pass them if
you have a buggy BIOS. You wait for new BIOSes and hope that you can get
rid of the workaround...

> > I could imagine an implementation for this, that e.g. critical...active9
> > get module parameters. BIOS updates for trip points get ignored as soon
> > as one is set and you can only decrease a value. Nothing bad can happen
> > and it will make some people happy (yes it's hacky, violates the specs
> > and so on..., but some more people have a working machine). Will this
> > (or similar) get accepted?
> 
> The interface would need to be more complicated than that if you wanted 
> to be able to implement hysteresis, and there's the potential for 
> hardware damage if paramaters are set inappropriately. Even then, 
> there's no easy way of programatically determining whether it would work 
> on any given hardware.

The fact that 3 people complained rather fast for a patch in rc1-mm1,
looks like this is a workaround that is needed. I personally advised two
guys to use it with their ThinkPad in the summer and they are happy with
it.

I'd also like to have this a bit extended: be able to just modify
passive trip point.
IMO this is a very powerful feature allowing people a fanless system as
long as they have a cpufreq capable processor.

The idea having this in userspace is interesting. But as said rather
complicated to implement. The hysteresis implementation for passive
cooling works fine in kernel and is field tested, it should get used.

The problem with the ACPI spec is that it's rather complicated. This is
IMO mainly for a BIOS developer point of view for what I can say.
Therefore it's rather seldom picked up by BIOS vendors.
However for the kernel it's easy (to fake, to do) and it's working fine,
so why not making use of it?

IMO we should even provide a passive trip point (initially unused) when
there is no one defined by BIOS.

I agree that it's hard to find the temperature to not let the fan kick
in automatically. But it's really easy then for everyone to:
  - get a fanless system
  - workaround critical shutdowns
and all this is safe in respect to HW damage.

IMO this is an area where we can easily behave better than M$ does.

Maybe my first mails were a bit offending, don't know, we should get
this back to an objective discussion.

I especially like to have some comments from Len, before doing any work
for nothing (or before giving up):
   - Would such a passive trip point override be acceptable in any way
 (be it in userspace, kernel space or in whatever form -> to be 
 discussed)
   - Would such a workaround as I described in my mail before be 
 acceptable
   - If done in userspace, how should it look like exactly

Thanks,

   Thomas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 Implementing fan/thermal control in userspace - Was: [cannot change thermal trip points]

2007-05-24 Thread Matthew Garrett
On Thu, May 24, 2007 at 04:16:53PM +0200, Thomas Renninger wrote:

> I doubt it is impossible, would you mind sharing your knowledge why you
> think it is impossible or point to some related discussion, pls.

Because, as Len has pointed out, you end up with two different ideas 
about what the trip points are - the kernel's and the hardware's. That 
works fine until some event in the firmware either forcibly 
resynchronises the two or makes assumptions about the spec-compliance of 
the interpreter.

> Yes, trip points are overridden by BIOS on HPs and what is the problem?
> The workaround won't work for them, but it still does on others
> (mainly on ThinkPads which have passive tp at about 89 C and critical on
> 91 C).

You don't know whether the workaround will work or not until you've 
performed a full audit of the platform firmware, which is going to 
potentially change between BIOS versions. It's entirely legal for the 
firmware to behave in this way, and even beneficial under various 
circumstances.

> I could imagine an implementation for this, that e.g. critical...active9
> get module parameters. BIOS updates for trip points get ignored as soon
> as one is set and you can only decrease a value. Nothing bad can happen
> and it will make some people happy (yes it's hacky, violates the specs
> and so on..., but some more people have a working machine). Will this
> (or similar) get accepted?

The interface would need to be more complicated than that if you wanted 
to be able to implement hysteresis, and there's the potential for 
hardware damage if paramaters are set inappropriately. Even then, 
there's no easy way of programatically determining whether it would work 
on any given hardware.

> It's even more impossible to get ACPI working correctly for all machines
> and all subsystems, these little workarounds can help some people to at
> least use their machine or get some parts working better.

It's fairly clearly not impossible, given that there exists at least one 
OS that these machines work with.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 Implementing fan/thermal control in userspace - Was: [cannot change thermal trip points]

2007-05-24 Thread Thomas Renninger
Stripping some CCs, acpi and kernel list should be enough this one goes
to...

On Tue, 2007-05-22 at 01:31 +0100, Matthew Garrett wrote:
> On Tue, May 22, 2007 at 12:42:00AM +0200, Pavel Machek wrote:
> > On Mon 2007-05-21 14:45:53, Matthew Garrett wrote:
> > > So don't do it badly. The advantage of doing so is that you can make it 
> > > work properly, which you can't by putting it in the kernel.
> > 
> > You want stuff like critical shutdowns to work even if userspace is
> > dead.
> 
> I don't think anyone suggested putting the critical shutdown control in 
> userspace. The kernel already handles that fine.
> 
> > I do not think you can control passive cooling adequately from 
> > userspace, and you can certainly not prevent kernel from slowing 
> > machine down too soon.
> 
> Given the choice between something impossible and something difficult, 
> I'm inclined towards picking the difficult one.

I doubt it is impossible, would you mind sharing your knowledge why you
think it is impossible or point to some related discussion, pls.

Does this mean checking temperature against trip points and adjust fan
and cpufreq should be done in a hal module?
In which stage is this, rfc, development, already in some git tree?

Yes, trip points are overridden by BIOS on HPs and what is the problem?
The workaround won't work for them, but it still does on others
(mainly on ThinkPads which have passive tp at about 89 C and critical on
91 C).

I could imagine an implementation for this, that e.g. critical...active9
get module parameters. BIOS updates for trip points get ignored as soon
as one is set and you can only decrease a value. Nothing bad can happen
and it will make some people happy (yes it's hacky, violates the specs
and so on..., but some more people have a working machine). Will this
(or similar) get accepted?

It's even more impossible to get ACPI working correctly for all machines
and all subsystems, these little workarounds can help some people to at
least use their machine or get some parts working better.

   Thomas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-24 Thread Alan Cox
> > hd.c:(.init.text+0x44a7d): undefined reference to `drive_info'
> > hd.c:(.init.text+0x44a89): undefined reference to `drive_info'
> > hd.c:(.init.text+0x44a95): undefined reference to `drive_info'
> > hd.c:(.init.text+0x44aa1): undefined reference to `drive_info'
> > hd.c:(.init.text+0x44aad): undefined reference to `drive_info'
> > drivers/built-in.o:hd.c:(.init.text+0x44ab9): more undefined references to 
> > `drive_info' follow
> > make[1]: *** [.tmp_vmlinux1] Error 1
> > 
> > <--  snip  -->
> > 
> > Considering the fact that we have two more recent drivers with the same 
> > functionality, it might be an option to simply remove this driver...
> 
> Care to send a patch?

hd.c can drive MFM and RLL disks and drivers/ide cannot. Although it
really wants burying further down the config tree the ability to read MFM
and RLL disks when recovering ancient data is useful and people do
actually use this driver now and then rescuing stuff like twenty year old
datasets.

It thus needs fixing not removing.


Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread Steven French
If srvTcp->tsk is NULL then the thread (cifs_demultiplex_thread) is 
getting ready to exit and kthread_stop would not be needed.

It would probably be possible to recode this so we don't need to call 
kthread_stop at all (send_sig is apparently required to wake up this 
thread when blocked in certain places in the tcp stack - and in 
combination with the existing flags might be good enough) - but I don't 
know if it would make it simpler.Fortunately there is no race, a few 
lines after srvTcp->tsk is set to zero by the cifs_demultipex_thread, it 
will sleep briefly before exiting (kthread_stop won't be called on a 
thread that does not exist).

That section of code in fs/cifs/connect.c now looks like:

2071 if (srvTcp->tsk) {
2072 struct task_struct *tsk;
2073 
/* If we could verify that kthread_stop would
2074 
   always wake up processes blocked in
2075 
   tcp in recv_mesg then we could remove the
2076send_sig call */
2077 send_sig(SIGKILL,srvTcp->tsk,1);
2078 tsk = srvTcp->tsk;
2079 if(tsk)
2080 kthread_stop(srvTcp->tsk);
2081 }
2082 }
2083 
 /* If find_unc succeeded then rc == 0 so we can not end */
2084 
if (tcon)  /* up accidently freeing someone elses tcon struct */
2085 tconInfoFree(tcon);
2086 if (existingCifsSes == NULL) {
2087 if (pSesInfo) {
2088 if ((pSesInfo->server) && 
2089 (pSesInfo->status == CifsGood)) {
2090 int temp_rc;
2091 
temp_rc = CIFSSMBLogoff(xid, pSesInfo);
2092 
/* if the socketUseCount is now zero */
2093 
if ((temp_rc == -ESHUTDOWN) &&
2094 
   (pSesInfo->server) && 
(pSesInfo->server->tsk)) {
2095 
struct task_struct *tsk;
2096 

send_sig(SIGKILL,pSesInfo->server->tsk,1);
2097 
tsk = pSesInfo->server->tsk;
2098 if(tsk)
2099 
kthread_stop(tsk);


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com



"young dave" <[EMAIL PROTECTED]> 
05/23/2007 08:05 PM

To
Steven French/Austin/[EMAIL PROTECTED]
cc
"Andrew Morton" <[EMAIL PROTECTED]>, David 
Kleikamp/Austin/[EMAIL PROTECTED], "Linux Kernel Mailing List" 
, Shirish S Pargaonkar/Austin/[EMAIL PROTECTED]
Subject
Re: 2.6.22-rc1-mm1 cifs_mount oops






Hi,

I have one problem about this:  after the srvTcp->tsk is set to NULL
(maybe the thread is  still there, isn't it?), is the kthread still
needed to be stopped by calling kthread_stop()? If it is true, then
the task_struct should be saved before send_sig like my patch:

if (srvTcp->tsk) {
+   struct task_struct * tsk = srvTcp->tsk;
   send_sig(SIGKILL,srvTcp->tsk,1);
-   kthread_stop(srvTcp->tsk);
+   kthread_stop(tsk);

Regards
dave


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: IDE compile error

2007-05-23 Thread Bartlomiej Zolnierkiewicz

Hi,

On Wednesday 16 May 2007, Adrian Bunk wrote:
> On Tue, May 15, 2007 at 08:19:14PM -0700, Andrew Morton wrote:
> >...
> > - Added an i386 early-startup development tree, as git-newsetup.patch ("H. 
> >   Peter Anvin" <[EMAIL PROTECTED]>)
> >...
> > Changes since 2.6.21-mm2:
> >...
> >  git-newsetup.patch
> >...
> >  git trees
> >...
> 
> This breaks the compilation of the oldest of our IDE disk drivers:
> 
> <--  snip  -->
> 
> ...
>   LD  .tmp_vmlinux1
> drivers/built-in.o: In function `hd_init':
> hd.c:(.init.text+0x44a7d): undefined reference to `drive_info'
> hd.c:(.init.text+0x44a89): undefined reference to `drive_info'
> hd.c:(.init.text+0x44a95): undefined reference to `drive_info'
> hd.c:(.init.text+0x44aa1): undefined reference to `drive_info'
> hd.c:(.init.text+0x44aad): undefined reference to `drive_info'
> drivers/built-in.o:hd.c:(.init.text+0x44ab9): more undefined references to 
> `drive_info' follow
> make[1]: *** [.tmp_vmlinux1] Error 1
> 
> <--  snip  -->
> 
> Considering the fact that we have two more recent drivers with the same 
> functionality, it might be an option to simply remove this driver...

Care to send a patch?

Thanks,
Bart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread young dave

Hi,

I have one problem about this:  after the srvTcp->tsk is set to NULL
(maybe the thread is  still there, isn't it?), is the kthread still
needed to be stopped by calling kthread_stop()? If it is true, then
the task_struct should be saved before send_sig like my patch:

   if (srvTcp->tsk) {
+   struct task_struct * tsk = srvTcp->tsk;
  send_sig(SIGKILL,srvTcp->tsk,1);
-   kthread_stop(srvTcp->tsk);
+   kthread_stop(tsk);

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread Steven French
> This can end up running kthread_stop() against an already-exited task.
I don't think so since cifs_demultiplex_thread waits sufficiently long 
before 
exit but after setting srvTcp->tsk to zero (the wait is immediately after 
waking up any processes that may be blocked on requests on this socket to 
give
file requests time to exit from the cifs vfs).   As long as this (mount) 
process were 
scheduled within 1.25 seconds it should be ok although more complicated 
than I
would like (that is why this thread was the last one in cifs to switch
to kthread API).

I wish there were an obvious way to do this, perhaps without using 
kthread_stop at all
for this thread (since that by itself does not seem to work for threads 
blocked 
in various socket calls).


--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2069,8 +2069,12 @@ cifs_mount(struct super_block *sb, struct 
cifs_sb_info *cifs_sb,
 srvTcp->tcpStatus = 
CifsExiting;
 spin_unlock(&GlobalMid_Lock);
 if (srvTcp->tsk) {
 struct 
task_struct *tsk;
 send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+/* 
srvTcp->tsk can be zeroed at any time */
+tsk = 
srvTcp->tsk;
+if (tsk)
+  kthread_stop(tsk);
 }


I don't think so 


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread Andrew Morton
On Wed, 23 May 2007 08:28:47 -0500 Steven French <[EMAIL PROTECTED]> wrote:

> Yes - this patch looks better.
> 
> I also am not sure whether the send_sig is still necessary to wake up a 
> thread blocked in tcp recv_msg (only do a wake_up_process vs. doing a 
> send_sig(SIGKILL) )
> 
> Unless someone knows for sure whether the send_sig is redundant, I would 
> like to merge Shaggy's version of the patch
> 
> 
> "young dave" <[EMAIL PROTECTED]> wrote on 05/23/2007 03:37:04 AM:
> 
> > Hi,
> > Sorry for the wrong patch in my last post.
> > 
> > How about save the tsk then call kthread_stop like this:
> > 
> > diff -udr linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
> > --- linux/fs/cifs/connect.c 2007-05-23 10:59:13.0 +
> > +++ linux.new/fs/cifs/connect.c 2007-05-23 16:33:54.0 +
> > @@ -2069,8 +2069,9 @@
> > srvTcp->tcpStatus = CifsExiting;
> > spin_unlock(&GlobalMid_Lock);
> > if (srvTcp->tsk) {
> > +   struct task_struct * tsk = srvTcp->tsk;
> > send_sig(SIGKILL,srvTcp->tsk,1);
> > -   kthread_stop(srvTcp->tsk);
> > +   kthread_stop(tsk);
> > }
> > }
> >  /* If find_unc succeeded then rc == 0 so we can not end 
> */
> > 
> > Regards
> > dave
> 
> Shaggy's suggested patch seems slightly better:
> 
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 216fb62..b6e2158 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -2069,8 +2069,12 @@ cifs_mount(struct super_block *sb, struct 
> cifs_sb_info *cifs_sb,
>  srvTcp->tcpStatus = 
> CifsExiting;
>  spin_unlock(&GlobalMid_Lock);
>  if (srvTcp->tsk) {
> +struct 
> task_struct *tsk;
>  send_sig(SIGKILL,srvTcp->tsk,1);
> - kthread_stop(srvTcp->tsk);
> +/* 
> srvTcp->tsk can be zeroed at any time */
> +tsk = 
> srvTcp->tsk;
> +if (tsk)
> +  kthread_stop(tsk);
>  }
>  }
>   /* If find_unc succeeded then rc == 0 so 
> we can not end */

The wordwrapping made that extraordinarily hard to read.  Repairing...

--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2069,8 +2069,12 @@ cifs_mount(struct super_block *sb, struct cifs_sb_info 
*cifs_sb,
srvTcp->tcpStatus = CifsExiting;
spin_unlock(&GlobalMid_Lock);
if (srvTcp->tsk) {
struct  task_struct *tsk;
send_sig(SIGKILL,srvTcp->tsk,1);
-   kthread_stop(srvTcp->tsk);
+   /*  srvTcp->tsk can be zeroed at any time */
+   tsk = srvTcp->tsk;
+   if (tsk)
+   kthread_stop(tsk);
}
}
/* If find_unc succeeded then rc == 0 so we can not end */


This can end up running kthread_stop() against an already-exited task.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread Steve French

This is what I now have in the cifs git tree.  (only minor change is
that I now have since fixed the missing space after the if)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 216fb62..f6963d1 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2069,8 +2069,15 @@ cifs_mount(struct super_block *sb, struct
cifs_sb_info *cifs_sb,
   srvTcp->tcpStatus = CifsExiting;
   spin_unlock(&GlobalMid_Lock);
   if (srvTcp->tsk) {
+   struct task_struct *tsk;
+   /* If we could verify that kthread_stop would
+  always wake up processes blocked in
+  tcp in recv_mesg then we could remove the
+  send_sig call */
   send_sig(SIGKILL,srvTcp->tsk,1);
-   kthread_stop(srvTcp->tsk);
+   tsk = srvTcp->tsk;
+   if(tsk)
+   kthread_stop(srvTcp->tsk);
   }
   }
/* If find_unc succeeded then rc == 0 so we can not end */
@@ -2085,8 +2092,11 @@ cifs_mount(struct super_block *sb, struct
cifs_sb_info *cifs_sb,
   /* if the socketUseCount is now zero */
   if ((temp_rc == -ESHUTDOWN) &&
  (pSesInfo->server) &&
(pSesInfo->server->tsk)) {
+   struct task_struct *tsk;

send_sig(SIGKILL,pSesInfo->server->tsk,1);
-
kthread_stop(pSesInfo->server->tsk);
+   tsk = pSesInfo->server->tsk;
+   if(tsk)
+   kthread_stop(tsk);
   }
   } else
   cFYI(1, ("No session or bad tcon"));


--
Thanks,

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread Steven French
I don't think it is racy against thread startup since server->tsk is not 
filled in until after the demultiplex thread does allow_signal.

I looked more at each of the three send_sig calls which precede the three 
places we do kthread_stop on this thread.   Without the three send_sig 
calls (e.g. in the umount path) umount takes 7 more seconds (presumably 
because the socket does not wake up as quickly) - so at first glance it 
looks like we still need a way of waking up this thread when it is stuck 
in a socket - and send_sig is the obvious way to do it.
I will merge Shaggy's version (similar to Dave Young's) into the cifs-2.6 
tree now.


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com



Andrew Morton <[EMAIL PROTECTED]> 
05/22/2007 09:22 PM

To
"young dave" <[EMAIL PROTECTED]>
cc
"Linux Kernel Mailing List" , Steven 
French/Austin/[EMAIL PROTECTED]
Subject
Re: 2.6.22-rc1-mm1 cifs_mount oops






On Wed, 23 May 2007 00:50:13 + "young dave" 
<[EMAIL PROTECTED]> wrote:

> Hi,
> when I use mount -t cifs , the kernel oops, seems break at
> kthread_stop, I'm not sure.
> 
> But if I add the CONFIG_CIFS_DEBUG2=y to config file, rebuild kernel,
> then the oops disappeared.
> 
> Below is the oops message:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 0008
>  printing eip:
> c012e910
> *pde = 
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: cifs smbfs radeon drm ipv6 snd_seq_dummy
> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> snd_mixer_oss capability commoncap e100 mii psmouse sg evdev serio_raw
> snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp
> agpgart i2c_i801 pcspkr
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00210246   (2.6.22-rc1-mm1 #3)
> EIP is at kthread_stop+0x10/0x90
> eax: c051bde0   ebx:    ecx: c1fba000   edx: c1fef040
> esi:    edi: 0064   ebp: c2a36c80   esp: c1fbbd58
> ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
> Process mount.cifs (pid: 3955, ti=c1fba000 task=c2b38540 
task.ti=c1fba000)
> Stack: c1fef040 ff90 ff90 f8a7a328 c285a504 f8a9a9fb 0083 
00cf
>00dc 000b c2b38540 c2af5740 c292c540   
c285a4c0
> c411b400 c3a4f500 c3cec200 c1fef052 c291c1e0 c1fef037 
c291c940
> Call Trace:
>  [] cifs_mount+0xbe8/0xf10 [cifs]
>  [] idr_get_new_above_int+0x3e/0x50
>  [] cifs_read_super+0x4e/0x160 [cifs]
>  [] set_anon_super+0x0/0xd0
>  [] cifs_get_sb+0x60/0xd0 [cifs]
>  [] vfs_kern_mount+0x91/0x130
>  [] permit_mount+0x28/0xa0
>  [] do_new_mount+0x8a/0x140
>  [] do_mount+0x25e/0x280
>  [] schedule+0x2e0/0x680
>  [] exact_copy_from_user+0x32/0x70
>  [] copy_mount_options+0x5a/0xc0
>  [] sys_mount+0x79/0xc0
>  [] syscall_call+0x7/0xb
>  ===
> Code: 88 d1 d3 e0 89 43 5c 83 c4 18 5b c3 eb 0d 90 90 90 90 90 90 90
> 90 90 90 90 90 90 53 83 ec 08 89 c3 b8 e0 bd 51 c0 e8 90 26 31 00 
> 43 08 31 c9 b8 f0 c1 58 c0 89 0d ec c1 58 c0 e8 3b 01 00 00
> EIP: [] kthread_stop+0x10/0x90 SS:ESP 0068:c1fbbd58
> 

I assume cifs_demultiplex_thread() took the SIGKILL, zeroed server->tsk
then exitted.  Then, cifs_mount() did a kthread_stop() on the now-NULL
pointer.

I don't see a non-racy way of fixing this as the code stands at present. 
This:

--- a/fs/cifs/connect.c~cifs-oops-fix
+++ a/fs/cifs/connect.c
@@ -2086,7 +2086,6 @@ cifs_mount(struct super_block *sb, struc
  if ((temp_rc == -ESHUTDOWN) &&
 (pSesInfo->server) && (pSesInfo->server->tsk)) {
 send_sig(SIGKILL,pSesInfo->server->tsk,1);
-kthread_stop(pSesInfo->server->tsk);
  }
 } else
  cFYI(1, ("No session or bad tcon"));
_

has a decent chance of fixing it.  But it's now racy against thread
*startup*: if we send SIGKILL to that task before it has done its
allow_signal(), it will presumably never get shut down.

Steve, can we just pull all the signal stuff out of there and use the
kthread machinery alone?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread Steven French
Yes - this patch looks better.

I also am not sure whether the send_sig is still necessary to wake up a 
thread blocked in tcp recv_msg (only do a wake_up_process vs. doing a 
send_sig(SIGKILL) )

Unless someone knows for sure whether the send_sig is redundant, I would 
like to merge Shaggy's version of the patch


"young dave" <[EMAIL PROTECTED]> wrote on 05/23/2007 03:37:04 AM:

> Hi,
> Sorry for the wrong patch in my last post.
> 
> How about save the tsk then call kthread_stop like this:
> 
> diff -udr linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
> --- linux/fs/cifs/connect.c 2007-05-23 10:59:13.0 +
> +++ linux.new/fs/cifs/connect.c 2007-05-23 16:33:54.0 +
> @@ -2069,8 +2069,9 @@
> srvTcp->tcpStatus = CifsExiting;
> spin_unlock(&GlobalMid_Lock);
> if (srvTcp->tsk) {
> +   struct task_struct * tsk = srvTcp->tsk;
> send_sig(SIGKILL,srvTcp->tsk,1);
> -   kthread_stop(srvTcp->tsk);
> +   kthread_stop(tsk);
> }
> }
>  /* If find_unc succeeded then rc == 0 so we can not end 
*/
> 
> Regards
> dave

Shaggy's suggested patch seems slightly better:

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 216fb62..b6e2158 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2069,8 +2069,12 @@ cifs_mount(struct super_block *sb, struct 
cifs_sb_info *cifs_sb,
 srvTcp->tcpStatus = 
CifsExiting;
 spin_unlock(&GlobalMid_Lock);
 if (srvTcp->tsk) {
+struct 
task_struct *tsk;
 send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+/* 
srvTcp->tsk can be zeroed at any time */
+tsk = 
srvTcp->tsk;
+if (tsk)
+  kthread_stop(tsk);
 }
 }
  /* If find_unc succeeded then rc == 0 so 
we can not end */

Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-23 Thread Martin Schwidefsky
On Wed, 2007-05-23 at 10:46 +0200, Cornelia Huck wrote:
> Taking a quick look at the async_*.c stuff, the functions in question
> basically seem to be of the form
> 
> check_if_we_can_do_it_async();
> if (async_ok) {
> /* do async stuff */
> /* that's where the dma mapping creeps in */
> } else {
> /* do it sync */
> /* seems fine for us */
> }

Hmm, on what does the async_ok depend? Is that a runtime check that is
done once or is it something more complicated like the availability of a
dma slot? If it is a simple runtime check then there should be a
operations structure that has indirect function pointers for the
different async_memset_{sync,async}() functions. Instead of doing the
async_ok check just call the function. That would save an if as well.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-23 Thread Cornelia Huck
On Wed, 23 May 2007 10:05:39 +0200,
Martin Schwidefsky <[EMAIL PROTECTED]> wrote:

> We are trying to get rid of dma-mapping.h, see the last change to the
> file with commit 411f0f3edc141a582190d3605cadd1d993abb6df. I don't think
> we should reintroduce dma related definition but split the async_tx in a
> way that allows to compile it on an architecture with CONFIG_NO_DMA=y
> (yes I know that is harder that to just add the dma stubs).
> You've said that there is a software implementation if there is no dma
> engine present. This software implementation should be independent of
> dma-mapping.h. Without having looked at the code, isn't it possible to
> isolate that software implementation into its own C file? That would be
> the only one that gets compiled for s390.

Taking a quick look at the async_*.c stuff, the functions in question
basically seem to be of the form

check_if_we_can_do_it_async();
if (async_ok) {
/* do async stuff */
/* that's where the dma mapping creeps in */
} else {
/* do it sync */
/* seems fine for us */
}

So you should be able to factor out (say) async_memset_{sync,async}()
and put it into async_memset_{sync,async}.c. async_memset() would then
be

async_memset()
{
#if CONFIG_HAS_DMA
if (check_if_we_can_do_at_async())
async_memset_async();
#endif
return async_memset_sync();
}

Kconfig could then do

config ASYNC_MEMSET
default m
tristate "async_memset support"
select ASYNC_MEMSET_ASYNC if HAS_DMA

config ASYNC_MEMSET_ASYNC
depends on HAS_DMA
tristate "async_memset async via dma support"

Thoughts?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread young dave

Hi,
Sorry for the wrong patch in my last post.

How about save the tsk then call kthread_stop like this:

diff -udr linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
--- linux/fs/cifs/connect.c 2007-05-23 10:59:13.0 +
+++ linux.new/fs/cifs/connect.c 2007-05-23 16:33:54.0 +
@@ -2069,8 +2069,9 @@
   srvTcp->tcpStatus = CifsExiting;
   spin_unlock(&GlobalMid_Lock);
   if (srvTcp->tsk) {
+   struct task_struct * tsk = srvTcp->tsk;
   send_sig(SIGKILL,srvTcp->tsk,1);
-   kthread_stop(srvTcp->tsk);
+   kthread_stop(tsk);
   }
   }
/* If find_unc succeeded then rc == 0 so we can not end */

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-23 Thread Martin Schwidefsky
On Tue, 2007-05-22 at 17:25 -0700, Williams, Dan J wrote:
> The approach I have taken is to add the missing definitions to
> include/asm-s390/dma-mapping.h [ a non-outlook-mangled version of the
> patch is pushed out in my rebased git tree ].  I was not able to fully
> compile-test this change as the three s390-cross-toolchains I tried
> each

We are trying to get rid of dma-mapping.h, see the last change to the
file with commit 411f0f3edc141a582190d3605cadd1d993abb6df. I don't think
we should reintroduce dma related definition but split the async_tx in a
way that allows to compile it on an architecture with CONFIG_NO_DMA=y
(yes I know that is harder that to just add the dma stubs).
You've said that there is a software implementation if there is no dma
engine present. This software implementation should be independent of
dma-mapping.h. Without having looked at the code, isn't it possible to
isolate that software implementation into its own C file? That would be
the only one that gets compiled for s390.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-23 Thread young dave

Hi,


Yeah, that's racy: once we've sent the signal, the kernel thread can write
NULL to srvTcp->tsk at any time.


Yes, here is another patch :

diff -ur linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
--- linux/fs/cifs/connect.c 2007-05-23 10:59:13.0 +
+++ linux.new/fs/cifs/connect.c 2007-05-23 15:16:11.0 +
@@ -650,6 +650,7 @@

   spin_lock(&GlobalMid_Lock);
   server->tcpStatus = CifsExiting;
+   kthread_stop(server->tsk);
   server->tsk = NULL;
   /* check if we have blocked requests that need to free */
   /* Note that cifs_max_pending is normally 50, but
@@ -2070,7 +2071,6 @@
   spin_unlock(&GlobalMid_Lock);
   if (srvTcp->tsk) {
   send_sig(SIGKILL,srvTcp->tsk,1);
-   kthread_stop(srvTcp->tsk);
   }
   }
/* If find_unc succeeded then rc == 0 so we can not end */

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-22 Thread Andrew Morton
On Wed, 23 May 2007 02:59:07 + "young dave" <[EMAIL PROTECTED]> wrote:

> Hi,
> maybe we can add if sentence before kthread_stop.
> 
> diff -ur linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
> --- linux/fs/cifs/connect.c 2007-05-23 10:59:13.0 +
> +++ linux.new/fs/cifs/connect.c 2007-05-23 10:58:39.0 +
> @@ -2070,7 +2070,8 @@
> spin_unlock(&GlobalMid_Lock);
> if (srvTcp->tsk) {
> send_sig(SIGKILL,srvTcp->tsk,1);
> -   kthread_stop(srvTcp->tsk);
> +   if(srvTcp->tsk)
> +   kthread_stop(srvTcp->tsk);
> }
> }

Yeah, that's racy: once we've sent the signal, the kernel thread can write
NULL to srvTcp->tsk at any time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-22 Thread young dave

Hi,
maybe we can add if sentence before kthread_stop.

diff -ur linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
--- linux/fs/cifs/connect.c 2007-05-23 10:59:13.0 +
+++ linux.new/fs/cifs/connect.c 2007-05-23 10:58:39.0 +
@@ -2070,7 +2070,8 @@
   spin_unlock(&GlobalMid_Lock);
   if (srvTcp->tsk) {
   send_sig(SIGKILL,srvTcp->tsk,1);
-   kthread_stop(srvTcp->tsk);
+   if(srvTcp->tsk)
+   kthread_stop(srvTcp->tsk);
   }
   }


Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 cifs_mount oops

2007-05-22 Thread Andrew Morton
On Wed, 23 May 2007 00:50:13 + "young dave" <[EMAIL PROTECTED]> wrote:

> Hi,
> when I use mount -t cifs , the kernel oops, seems break at
> kthread_stop, I'm not sure.
> 
> But if I add the CONFIG_CIFS_DEBUG2=y to config file, rebuild kernel,
> then the oops disappeared.
> 
> Below is the oops message:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 0008
>  printing eip:
> c012e910
> *pde = 
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: cifs smbfs radeon drm ipv6 snd_seq_dummy
> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> snd_mixer_oss capability commoncap e100 mii psmouse sg evdev serio_raw
> snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp
> agpgart i2c_i801 pcspkr
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00210246   (2.6.22-rc1-mm1 #3)
> EIP is at kthread_stop+0x10/0x90
> eax: c051bde0   ebx:    ecx: c1fba000   edx: c1fef040
> esi:    edi: 0064   ebp: c2a36c80   esp: c1fbbd58
> ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
> Process mount.cifs (pid: 3955, ti=c1fba000 task=c2b38540 task.ti=c1fba000)
> Stack: c1fef040 ff90 ff90 f8a7a328 c285a504 f8a9a9fb 0083 00cf
>00dc 000b c2b38540 c2af5740 c292c540   c285a4c0
> c411b400 c3a4f500 c3cec200 c1fef052 c291c1e0 c1fef037 c291c940
> Call Trace:
>  [] cifs_mount+0xbe8/0xf10 [cifs]
>  [] idr_get_new_above_int+0x3e/0x50
>  [] cifs_read_super+0x4e/0x160 [cifs]
>  [] set_anon_super+0x0/0xd0
>  [] cifs_get_sb+0x60/0xd0 [cifs]
>  [] vfs_kern_mount+0x91/0x130
>  [] permit_mount+0x28/0xa0
>  [] do_new_mount+0x8a/0x140
>  [] do_mount+0x25e/0x280
>  [] schedule+0x2e0/0x680
>  [] exact_copy_from_user+0x32/0x70
>  [] copy_mount_options+0x5a/0xc0
>  [] sys_mount+0x79/0xc0
>  [] syscall_call+0x7/0xb
>  ===
> Code: 88 d1 d3 e0 89 43 5c 83 c4 18 5b c3 eb 0d 90 90 90 90 90 90 90
> 90 90 90 90 90 90 53 83 ec 08 89 c3 b8 e0 bd 51 c0 e8 90 26 31 00 
> 43 08 31 c9 b8 f0 c1 58 c0 89 0d ec c1 58 c0 e8 3b 01 00 00
> EIP: [] kthread_stop+0x10/0x90 SS:ESP 0068:c1fbbd58
> 

I assume cifs_demultiplex_thread() took the SIGKILL, zeroed server->tsk
then exitted.  Then, cifs_mount() did a kthread_stop() on the now-NULL
pointer.

I don't see a non-racy way of fixing this as the code stands at present. 
This:

--- a/fs/cifs/connect.c~cifs-oops-fix
+++ a/fs/cifs/connect.c
@@ -2086,7 +2086,6 @@ cifs_mount(struct super_block *sb, struc
if ((temp_rc == -ESHUTDOWN) &&
   (pSesInfo->server) && 
(pSesInfo->server->tsk)) {

send_sig(SIGKILL,pSesInfo->server->tsk,1);
-   
kthread_stop(pSesInfo->server->tsk);
}
} else
cFYI(1, ("No session or bad tcon"));
_

has a decent chance of fixing it.  But it's now racy against thread
*startup*: if we send SIGKILL to that task before it has done its
allow_signal(), it will presumably never get shut down.

Steve, can we just pull all the signal stuff out of there and use the
kthread machinery alone?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-22 Thread young dave

Hi,
I have tried the patch, it works.
could you explain it for me? thanks very much.

Regards
dave

2007/5/22, H. Peter Anvin <[EMAIL PROTECTED]>:

Could you try the attached patch for me?

-hpa

diff --git a/arch/i386/boot/edd.c b/arch/i386/boot/edd.c
index 84a0302..9697a56 100644
--- a/arch/i386/boot/edd.c
+++ b/arch/i386/boot/edd.c
@@ -47,8 +47,9 @@ static int read_sector(u8 devno, u64 lba, void *buf)
si = (size_t)&dapa;
dx = devno;
asm("pushfl; stc; int $0x13; setc %%al; popfl"
-   : "+a" (ax), "+S" (si), "+d" (devno)
-   : : "ebx", "ecx", "edi");
+   : "+a" (ax), "+S" (si), "+d" (dx)
+   : "m" (dapa)
+   : "ebx", "ecx", "edi", "memory");

if (!(u8)ax)
return 0;   /* OK */
@@ -59,7 +60,7 @@ static int read_sector(u8 devno, u64 lba, void *buf)
bx = (size_t)buf;
asm("pushfl; stc; int $0x13; setc %%al; popfl"
: "+a" (ax), "+c" (cx), "+d" (dx), "+b" (bx)
-   : : "esi", "edi");
+   : : "esi", "edi", "memory");

return -(u8)ax; /* 0 or -1 */
 }



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-22 Thread Williams, Dan J
> From: Cornelia Huck [mailto:[EMAIL PROTECTED]
> On Fri, 18 May 2007 09:30:09 -0700,
> "Williams, Dan J" <[EMAIL PROTECTED]> wrote:
> 
> > When CONFIG_DMA_ENGINE=n async_tx_find_channel takes the form:
> > ... async_tx_find_channel( ... )
> > {
> > return NULL;
> > }
> >
> > So in the S390 case the entire asynchronous path will be compiled
away.
> 
> Unfortunately, do_async_xor() (and others) is not ifdef'ed and
contains
> dma_map_page(), which led to the compile failure...

The approach I have taken is to add the missing definitions to
include/asm-s390/dma-mapping.h [ a non-outlook-mangled version of the
patch is pushed out in my rebased git tree ].  I was not able to fully
compile-test this change as the three s390-cross-toolchains I tried each
died early in the kernel build process.  The most common error was:
"s390-unknown-linux-gnu-ld: unrecognised emulation mode: elf64_s390"

---

s390: add dma mapping api stub definitions for async_tx

From: Dan Williams <[EMAIL PROTECTED]>

The asynchronous path in async_tx is meant to be compiled away on
platforms
like s390 with CONFIG_DMA_ENGINE=n.  However, it is difficult to compile
something away if it does not compile in the first place.  This patch
adds
the missing dma api definitions as BUG() stubs.

Cc: Cornelia Huck <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 include/asm-s390/dma-mapping.h |   78

 1 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/include/asm-s390/dma-mapping.h
b/include/asm-s390/dma-mapping.h
index 3f8c12f..33a3c82 100644
--- a/include/asm-s390/dma-mapping.h
+++ b/include/asm-s390/dma-mapping.h
@@ -4,9 +4,87 @@
  *  S390 version
  *
  *  This file exists so that #include  doesn't break
anything.
+ *  It also includes stub definitions of the API so common code like
async_tx
+ *  can compile.
  */
 
 #ifndef _ASM_DMA_MAPPING_H
 #define _ASM_DMA_MAPPING_H
 
+#include 
+#include 
+
+static inline dma_addr_t
+dma_map_single(struct device *dev, void *cpu_addr, size_t size,
+   enum dma_data_direction dir)
+{
+   BUG();
+   return 0;
+}
+
+static inline dma_addr_t
+dma_map_page(struct device *dev, struct page *page,
+unsigned long offset, size_t size,
+enum dma_data_direction dir)
+{
+   BUG();
+   return 0;
+}
+
+static inline void
+dma_unmap_single(struct device *dev, dma_addr_t handle, size_t size,
+   enum dma_data_direction dir)
+{
+   BUG();
+}
+
+static inline void
+dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+  enum dma_data_direction dir)
+{
+   BUG();
+}
+
+static inline int
+dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+   enum dma_data_direction dir)
+{
+   BUG();
+   return 0;
+}
+
+static inline void
+dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+enum dma_data_direction dir)
+{
+   BUG();
+}
+
+static inline void
+dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle, size_t
size,
+   enum dma_data_direction dir)
+{
+   BUG();
+}
+
+static inline void
+dma_sync_single_for_device(struct device *dev, dma_addr_t handle,
size_t size,
+  enum dma_data_direction dir)
+{
+   BUG();
+}
+
+static inline void
+dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int
nents,
+   enum dma_data_direction dir)
+{
+   BUG();
+}
+
+static inline void
+dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int
nents,
+  enum dma_data_direction dir)
+{
+   BUG();
+}
 #endif /* _ASM_DMA_MAPPING_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xfs-masters] Re: 2.6.22-rc1-mm1

2007-05-22 Thread Nathan Scott
On Tue, 2007-05-22 at 20:44 +1000, David Chinner wrote:
> 
> > xfs_buf_associate_memory is a mess.  My original plan was to get rid
> of
> > it, but I kept that out to keep that patchset small and easily
> reviable,
> > but it seems like that was a mistake.  My plan is the following:
> > 
> >  - xlog_bread and thus the whole buffer I/O path grows an iooffset
> >paramater that specifies at which offset into the buffer we start
> >the actual I/O.  That gets rid of all the
> xfs_buf_associate_memory
> >memory uses in the log recovery code
> 
> Perhaps a new field in the xfs_buf structure - that way call paths
> don't need to grow extra parameters and potentially increase

Thatd be unfortunate - there are very few iclog buffers relative to
every other metadata buffer, so growing the struct for all of those
too would not be ideal (I remember Steve going on pagebuf shrinking
exercises in the distant past, to fit more of em in memory at once,
I can't remember what benchmark in particular he was using though).

cheers.

-- 
Nathan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: evm BUG when reading sysfs file

2007-05-22 Thread Andrew Morton
On Tue, 22 May 2007 03:25:48 -0400
[EMAIL PROTECTED] (Joseph Fannin) wrote:

> On Tue, May 15, 2007 at 08:19:14PM -0700, Andrew Morton wrote:
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/
> 
> I've been getting this since 2.6.21-rc7-mm1:
> 
> [2.379310] BUG: unable to handle kernel paging request at virtual address 
> 4400d340
> [2.379491]  printing eip:
> [2.379573] c021c978
> [2.379656] *pdpt = 0353c001
> [2.379739] *pde = 
> [2.379824] Oops:  [#1]
> [2.379906] PREEMPT SMP
> [2.380059] Modules linked in: thermal processor dm_mod
> [2.380288] CPU:0
> [2.380289] EIP:0060:[]Not tainted VLI
> [2.380291] EFLAGS: 00010297   (2.6.22-rc1-mm1 #2)
> [2.380547] EIP is at vsnprintf+0x448/0x5d0
> [2.380633] eax: 4400d340   ebx: c348f034   ecx: 4400d340   edx: fffe
> [2.380721] esi: c03e0100   edi: 4400d340   ebp: c357ecc0   esp: c357ec68
> [2.380810] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [2.380898] Process udevtrigger (pid: 686, ti=c357e000 task=c1876df0 
> task.ti=c357e000)
> [2.380987] Stack: c348f014 0fec c03e1c60 c03e3cec c357eccc c0499b88 
> c357ece0 c0282513
> [2.381428]c348f014 0fec 3cb70fcb c348f034   
>  
> [2.381867] fffe c03e017c c357ed18 0034 c0494a20 
> c357ece0 c021cb9f
> [2.382305] Call Trace:
> [2.382470]  [] sprintf+0x1f/0x30
> [2.382594]  [] show_uevent+0xed/0x130
> [2.382720]  [] dev_attr_show+0x23/0x30
> [2.382843]  [] sysfs_read_file+0x97/0x140
> [2.382968]  [] vfs_read+0xaf/0x180
> [2.383096]  [] kernel_read+0x3a/0x50
> [2.383221]  [] evm_calc_hash+0x11c/0x240
> [2.383347]  [] evm_file_free+0xb9/0x330
> [2.383470]  [] __fput+0xba/0x180
> [2.383593]  [] fput+0x22/0x40
> [2.383715]  [] filp_close+0x47/0x70
> [2.383839]  [] sys_close+0x69/0xc0
> [2.383965]  [] syscall_call+0x7/0xb
> [2.384092]  [] 0xb7ebd0a7
> [2.384212]  ===
> [2.384295] INFO: lockdep is turned off.
> [2.384379] Code: 21 fd ff ff c6 03 25 e9 19 fd ff ff 8d 4f 04 b8
> 3b a2 3d c0 8b 55 e4 89 4d 08 8b 3f 81 ff ff 0f 00 00 0f 46 f8 89 f9
> 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 89 c6 8b 45 e0
> f6 45
> [2.386787] EIP: [] vsnprintf+0x448/0x5d0 SS:ESP 0068:c357ec68
> 
> This comes a bit after IMA bails out successfully, if that's relevant:
> 
> [1.708761] ima (ima_init): No TPM chip found(rc = -19), activating
> TPM-bypass!

OK, thanks.  Does the crash go away if you disable IMA, SLIM, etc in .config?

I think I'll drop all those patches, actually - they don't seem to be going
anywhere.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: evm BUG when reading sysfs file

2007-05-22 Thread Joseph Fannin
On Tue, May 15, 2007 at 08:19:14PM -0700, Andrew Morton wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/

I've been getting this since 2.6.21-rc7-mm1:

[2.379310] BUG: unable to handle kernel paging request at virtual address 
4400d340
[2.379491]  printing eip:
[2.379573] c021c978
[2.379656] *pdpt = 0353c001
[2.379739] *pde = 
[2.379824] Oops:  [#1]
[2.379906] PREEMPT SMP
[2.380059] Modules linked in: thermal processor dm_mod
[2.380288] CPU:0
[2.380289] EIP:0060:[]Not tainted VLI
[2.380291] EFLAGS: 00010297   (2.6.22-rc1-mm1 #2)
[2.380547] EIP is at vsnprintf+0x448/0x5d0
[2.380633] eax: 4400d340   ebx: c348f034   ecx: 4400d340   edx: fffe
[2.380721] esi: c03e0100   edi: 4400d340   ebp: c357ecc0   esp: c357ec68
[2.380810] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[2.380898] Process udevtrigger (pid: 686, ti=c357e000 task=c1876df0 
task.ti=c357e000)
[2.380987] Stack: c348f014 0fec c03e1c60 c03e3cec c357eccc c0499b88 
c357ece0 c0282513
[2.381428]c348f014 0fec 3cb70fcb c348f034   
 
[2.381867] fffe c03e017c c357ed18 0034 c0494a20 
c357ece0 c021cb9f
[2.382305] Call Trace:
[2.382470]  [] sprintf+0x1f/0x30
[2.382594]  [] show_uevent+0xed/0x130
[2.382720]  [] dev_attr_show+0x23/0x30
[2.382843]  [] sysfs_read_file+0x97/0x140
[2.382968]  [] vfs_read+0xaf/0x180
[2.383096]  [] kernel_read+0x3a/0x50
[2.383221]  [] evm_calc_hash+0x11c/0x240
[2.383347]  [] evm_file_free+0xb9/0x330
[2.383470]  [] __fput+0xba/0x180
[2.383593]  [] fput+0x22/0x40
[2.383715]  [] filp_close+0x47/0x70
[2.383839]  [] sys_close+0x69/0xc0
[2.383965]  [] syscall_call+0x7/0xb
[2.384092]  [] 0xb7ebd0a7
[2.384212]  ===
[2.384295] INFO: lockdep is turned off.
[2.384379] Code: 21 fd ff ff c6 03 25 e9 19 fd ff ff 8d 4f 04 b8
3b a2 3d c0 8b 55 e4 89 4d 08 8b 3f 81 ff ff 0f 00 00 0f 46 f8 89 f9
89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 89 c6 8b 45 e0
f6 45
[2.386787] EIP: [] vsnprintf+0x448/0x5d0 SS:ESP 0068:c357ec68

This comes a bit after IMA bails out successfully, if that's relevant:

[1.708761] ima (ima_init): No TPM chip found(rc = -19), activating
TPM-bypass!

--
Joseph Fannin
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xfs-masters] Re: 2.6.22-rc1-mm1

2007-05-22 Thread Michal Piotrowski

Hi David,

On 21/05/07, David Chinner <[EMAIL PROTECTED]> wrote:

On Fri, May 18, 2007 at 12:11:14PM +1000, David Chinner wrote:
> On Thu, May 17, 2007 at 10:05:11PM +0200, Michal Piotrowski wrote:
> > I applied your patch and I get another oops
> >
> > [  261.491499] XFS mounting filesystem loop0
> > [  261.501641] Ending clean XFS mount for filesystem: loop0
> > [  261.507698] SELinux: initialized (dev loop0, type xfs), uses xattr
> > [  261.567441] XFS mounting filesystem loop0
> > [  261.573931] allocation failed: out of vmalloc space - use vmalloc= 
to increase size.
> > [  261.582935] xfs_buf_get_noaddr: failed to map pages
> > [  261.592478] Ending clean XFS mount for filesystem: loop0
> > [  261.618543] SELinux: initialized (dev loop0, type xfs), uses xattr
> > [  261.691563] XFS mounting filesystem loop0
> > [  261.698927] allocation failed: out of vmalloc space - use vmalloc= 
to increase size.
> >   
> >   interesting
>
> Yeah, looks like a vmalloc leak is occurring. I haven't noticed
> it before because:
>
> VmallocTotal: 137427898368 kB
> VmallocUsed:   3128272 kB
> VmallocChunk: 137424770048 kB
>
> It takes a long time to leak enough vmapped space to run out on ia64...
>
> That tends to imply we have a mapped buffer being leaked somewhere.
> Interestingly, I don't see a memory leak so we must be freeing the
> memory associated with the buffer, just not unmapping it first. Not
> sure how that can happen yet.
.
>
> Looks like we're leaking 272kB of vmalloc space on each mount/unmount
> cycle. I'm trying to track this down now

I've found what is going on here - kmem_alloc() is decidedly more
forgiving than manually built page arrays and vmap/vunmap. Prior
to this change we wouldn't have even leaked memory

Christoph - this is an interaction with xfs_buf_associate_memory();
I'm not sure what it is doing is at all safe now that it never gets
passed kmem_alloc()d memory - it works for the log recovery case
because we use it in pairs - once to shorten the buffer and then once
to put it back the way it was.

But that doesn't work for the log buffers (we never return them to their
original state) and the log wrap case looks to work mostly by accident
now (and could posibly lead to double freeing pages)

It seems that what we really need with the new code is a xfs_buf_clone()
operation followed by trimming the range to what the secondary I/O needs
to span. This would work for the log buffer case as well. Your thoughts?

In the meantime, the following patch appears to fix the leak.


After a few minutes of mount/umount cycle everything seems to be ok,
problem fixed.

Thanks!



Cheers,

Dave.


Regards,
Michal

--
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xfs-masters] Re: 2.6.22-rc1-mm1

2007-05-22 Thread Christoph Hellwig
On Tue, May 22, 2007 at 08:44:30PM +1000, David Chinner wrote:
> Perhaps a new field in the xfs_buf structure - that way call paths
> don't need to grow extra parameters and potentially increase
> stack usage. The read path tends to be at the top of the stack
> when it gets blown in the writeback path

I have some patches to unwind the buffer I/O path, it's a little
to overcomplicated due to historical reasons.

> >the offset in xlog_sync aswell.
> 
> I don't want to have to introduce a mempool just for one xfs_buf per
> filesystem, so this would need to be able to take a xfs_buf (log->l_xbuf)
> that it clones to

Yes.  Note that we currently do a non-mempooled allocated for the page
array, which this would cure aswell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xfs-masters] Re: 2.6.22-rc1-mm1

2007-05-22 Thread David Chinner
On Mon, May 21, 2007 at 12:23:21PM +0200, Christoph Hellwig wrote:
> On Mon, May 21, 2007 at 08:11:42PM +1000, David Chinner wrote:
> > Christoph - this is an interaction with xfs_buf_associate_memory();
> > I'm not sure what it is doing is at all safe now that it never gets
> > passed kmem_alloc()d memory - it works for the log recovery case
> > because we use it in pairs - once to shorten the buffer and then once
> > to put it back the way it was.
> > 
> > But that doesn't work for the log buffers (we never return them to their
> > original state) and the log wrap case looks to work mostly by accident
> > now (and could posibly lead to double freeing pages)
> > 
> > It seems that what we really need with the new code is a xfs_buf_clone()
> > operation followed by trimming the range to what the secondary I/O needs
> > to span. This would work for the log buffer case as well. Your thoughts?
> 
> xfs_buf_associate_memory is a mess.  My original plan was to get rid of
> it, but I kept that out to keep that patchset small and easily reviable,
> but it seems like that was a mistake.  My plan is the following:
> 
>  - xlog_bread and thus the whole buffer I/O path grows an iooffset
>paramater that specifies at which offset into the buffer we start
>the actual I/O.  That gets rid of all the xfs_buf_associate_memory
>memory uses in the log recovery code

Perhaps a new field in the xfs_buf structure - that way call paths
don't need to grow extra parameters and potentially increase
stack usage. The read path tends to be at the top of the stack
when it gets blown in the writeback path

>  - add a buffer clone operation as suggested by you above, and use
>the offset in xlog_sync aswell.

I don't want to have to introduce a mempool just for one xfs_buf per
filesystem, so this would need to be able to take a xfs_buf (log->l_xbuf)
that it clones to 

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-22 Thread Maciej Rutecki
Matthew Garrett pisze:
.
> 
> Try any recent HP bios.
> 

Yes...

hp nx 6310, bios version:
F.06. cpufreq works, MFCG Bios Error in dmesg (PCI: BIOS Bug: MCFG area
at f800 is not E820-reserved)
F.08. like above + cpufreq broken
F.09 Remove this errors, but problem with reboot (too long time - remove
psmouse module doesn't help) - some people reports it (i didn't test it)
F.0B suspend to ram broken, after suspend to disk keyboard doesn't work
F.0D I don't have the heart test it...

-- 
Maciej Rutecki
http://www.maciek.unixy.pl


smime.p7s
Description: S/MIME Cryptographic Signature


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-22 Thread Goulven Guillard
Le 05/22/2007 11:16 AM, Matthew Garrett a déclaré :
> On Tue, May 22, 2007 at 11:06:36AM +0200, Pavel Machek wrote:
> 
>> We need to ignore trip point updates from BIOS, and we need to poll
>> thermals when use overrides trip points. That's expected. Plus I've
>> yet to see platform actually updating the trip points.
> 
> Try any recent HP bios.
> 

man cron... ;-)





-- 
~~
   |Oo|   La banquise fond !!! Adoptez un pingouin...
  /|\/|\
   |__|=> http://doc.ubuntu-fr.org/
   ^__^
~~~|  |~~~








-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-22 Thread Matthew Garrett
On Tue, May 22, 2007 at 11:06:36AM +0200, Pavel Machek wrote:

> We need to ignore trip point updates from BIOS, and we need to poll
> thermals when use overrides trip points. That's expected. Plus I've
> yet to see platform actually updating the trip points.

Try any recent HP bios.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-22 Thread Pavel Machek
Hi!

> > > So don't do it badly. The advantage of doing so is that you can make it 
> > > work properly, which you can't by putting it in the kernel.
> > 
> > You want stuff like critical shutdowns to work even if userspace is
> > dead.
> 
> I don't think anyone suggested putting the critical shutdown control in 
> userspace. The kernel already handles that fine.

No it does not. That is what this thread is about.

(On old xe3, critical trip point set by BIOS is ~95C, but machine dies
by hw safeguard at ~83C. Workaround is to lower critical trip point to
80C or so. Len broke this.)

> Imagine the following situation:
> 
> 1) Platform sets critical shutdown trip point to 85C
> 2) Userspace sets critical shutdown trip point to 95C
> 3) Temperature reaches 90C
> 4) Platform forces reevaluation of trip points
> 5) Entire invasion fleet is lost
> 
> How do you avoid that? Disable the ability for the platform to set trip 
> points? You're breaking the spec and potentially causing hardware 

We need to ignore trip point updates from BIOS, and we need to poll
thermals when use overrides trip points. That's expected. Plus I've
yet to see platform actually updating the trip points.

Speaking about hw damage... The broken BIOS on xe3 definitely caused
damage to its harddrive, so... we are preventing hw damage here.

(Plus, Len's patch broke user-kernel in stable series, without warning).
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-21 Thread young dave

Hi,

This implies a miscompile somewhere, *or* that your bios stomps on
registers that gcc expect preserved, and adding printf's disturbs the
register allocation sufficiently.


I think maybe it's caused by gcc optimize, so I add volatile to
read_sector inline assemblly, then kernel can boot successfully.

please check this patch :

diff -ur linux/arch/i386/boot/edd.c linux.new/arch/i386/boot/edd.c
--- linux/arch/i386/boot/edd.c  2007-05-22 10:08:59.0 +
+++ linux.new/arch/i386/boot/edd.c  2007-05-22 10:06:24.0 +
@@ -47,7 +47,7 @@
   ax = 0x4200;/* Extended Read */
   si = (size_t)&dapa;
   dx = devno;
-   asm ("pushfl; stc; int $0x13; setc %%al; popfl"
+   asm volatile("pushfl; stc; int $0x13; setc %%al; popfl"
   : "+a" (ax), "+S" (si), "+d" (devno)
   : : "ebx", "ecx", "edi");

@@ -58,7 +58,7 @@
   cx = 0x0001;/* Sector 0-0-1 */
   dx = devno;
   bx = (size_t)buf;
-   asm ("pushfl; stc; int $0x13; setc %%al; popfl"
+   asm volatile("pushfl; stc; int $0x13; setc %%al; popfl"
   : "+a" (ax), "+c" (cx), "+d" (dx), "+b" (bx)
   : : "esi", "edi");

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Matthew Garrett
On Tue, May 22, 2007 at 12:42:00AM +0200, Pavel Machek wrote:
> On Mon 2007-05-21 14:45:53, Matthew Garrett wrote:
> > So don't do it badly. The advantage of doing so is that you can make it 
> > work properly, which you can't by putting it in the kernel.
> 
> You want stuff like critical shutdowns to work even if userspace is
> dead.

I don't think anyone suggested putting the critical shutdown control in 
userspace. The kernel already handles that fine.

> I do not think you can control passive cooling adequately from 
> userspace, and you can certainly not prevent kernel from slowing 
> machine down too soon.

Given the choice between something impossible and something difficult, 
I'm inclined towards picking the difficult one.

> Plus, this is actually nasty user-visible change, and a regression
> from 2.6.21. I am not sure why we are even debating this; user-kernel
> interface changed without warning. Patch should be simply reverted.

In http://lkml.org/lkml/2007/1/27/93 you were more than happy to break 
an interface even though it could be fixed in a (ugly) way that made it 
work again. Here, there's no way to fix this properly - the platform 
will quite happily do things based on what it believes the trip points 
should be, and one of those things may be to alter the trip points. 
Imagine the following situation:

1) Platform sets critical shutdown trip point to 85C
2) Userspace sets critical shutdown trip point to 95C
3) Temperature reaches 90C
4) Platform forces reevaluation of trip points
5) Entire invasion fleet is lost

How do you avoid that? Disable the ability for the platform to set trip 
points? You're breaking the spec and potentially causing hardware 
damage. If you have specific hardware that requires specific spec 
breakage, then a better approach would probably be to quirk the kernel 
to rectify it. On the other hand, if it works with the Other Leading OS, 
we ought to be able to just fix the problem properly.
-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Pavel Machek
On Mon 2007-05-21 14:45:53, Matthew Garrett wrote:
> On Mon, May 21, 2007 at 03:40:46PM +0200, Pavel Machek wrote:
> > On Mon 2007-05-21 14:36:08, Matthew Garrett wrote:
> > > On Mon, May 21, 2007 at 03:29:48PM +0200, Pavel Machek wrote:
> > > > Significantly more correct? It forces you to do all the thermal
> > > > management in userspace!
> > > 
> > > Why's that a problem? 
> > 
> > Duplicating all the kernel logic in userspace, badly?
> 
> So don't do it badly. The advantage of doing so is that you can make it 
> work properly, which you can't by putting it in the kernel.

You want stuff like critical shutdowns to work even if userspace is
dead.

I do not think you can control passive cooling adequately from
userspace, and you can certainly not prevent kernel from slowing
machine down too soon.

Plus, this is actually nasty user-visible change, and a regression
from 2.6.21. I am not sure why we are even debating this; user-kernel
interface changed without warning. Patch should be simply reverted.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-21 Thread H. Peter Anvin
young dave wrote:
> Hi,
> 
> kernel booting and stopped in edd.c:read_sector.
> 
> I add debug messages around the two inline assemblly sentence, recompile
> kernel,
> now strange thing happend, the kernel booting directly, but the printf
> messages can't be seen because it's too rapid.
> 
> can we use printk in boot code?

Well, it's spelt "printf", but same thing.  (Since it doesn't take a
logging priority, it seems better to name it printf.)

This implies a miscompile somewhere, *or* that your bios stomps on
registers that gcc expect preserved, and adding printf's disturbs the
register allocation sufficiently.

Could you send me the arch/i386/boot/setup.elf file from the original,
failed, build?

Thanks.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Matthew Garrett
On Mon, May 21, 2007 at 03:40:46PM +0200, Pavel Machek wrote:
> On Mon 2007-05-21 14:36:08, Matthew Garrett wrote:
> > On Mon, May 21, 2007 at 03:29:48PM +0200, Pavel Machek wrote:
> > > Significantly more correct? It forces you to do all the thermal
> > > management in userspace!
> > 
> > Why's that a problem? 
> 
> Duplicating all the kernel logic in userspace, badly?

So don't do it badly. The advantage of doing so is that you can make it 
work properly, which you can't by putting it in the kernel.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Pavel Machek
On Mon 2007-05-21 14:36:08, Matthew Garrett wrote:
> On Mon, May 21, 2007 at 03:29:48PM +0200, Pavel Machek wrote:
> > > > No. Manually turning off fans is even worse hack.
> > > 
> > > It's significantly more correct.
> > 
> > Significantly more correct? It forces you to do all the thermal
> > management in userspace!
> 
> Why's that a problem? 

Duplicating all the kernel logic in userspace, badly?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Matthew Garrett
On Mon, May 21, 2007 at 03:29:48PM +0200, Pavel Machek wrote:
> > > No. Manually turning off fans is even worse hack.
> > 
> > It's significantly more correct.
> 
> Significantly more correct? It forces you to do all the thermal
> management in userspace!

Why's that a problem? Overriding the hardware policy has to be done 
somewhere, and doing it in userspace is no more dangerous than 
kernelspace.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Matthew Garrett
On Mon, May 21, 2007 at 02:10:48PM +0200, Pavel Machek wrote:

> > nope, the OS can't reliably override the processor passive trip point.
> > That is what _SCP and cooling_mode are for.
> 
> Yes, it is reliable if you turn on thermal polling.

As Len says, the system can force a reevaluation of the trip points at 
any time which will wipe out the local settings. Either you ignore the 
spec and the notifications (potentially risking misbehaving hardware) or 
you end up in a perpetual race.

> > if you want to change the state of the fans,
> > then poke /proc/acpi/fan/ directly.
> 
> Heh, you suggest this? It is even less functional than current
> solution -- which works okay as long as you keep thermal polling
> working.

If there are problems with the fan behaviour, why don't we fix them?

> > For folks with the reverse problem -- active cooling where the
> > fans kick in early than they'd like, they should just turn off
> > the fans via /proc/acpi/fan and not mess with the trip points at
> > all.
> 
> No. Manually turning off fans is even worse hack.

It's significantly more correct.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Pavel Machek
Hi!

> > > For folks with the reverse problem -- active cooling where the
> > > fans kick in early than they'd like, they should just turn off
> > > the fans via /proc/acpi/fan and not mess with the trip points at
> > > all.
> > 
> > No. Manually turning off fans is even worse hack.
> 
> It's significantly more correct.

Significantly more correct? It forces you to do all the thermal
management in userspace!
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Pavel Machek
On Thu 2007-05-17 18:42:43, Len Brown wrote:
> > Something similar happened to me on XE3, yes.
> > 
> > (Actual values were different; BIOS specified critical temperature at
> > cca 95C, but hw killed the power at cca 83C. Setting critical trip
> > point at 80C made the problem go away.)
> 
> Great, please file a bug and include the acpidump from the XE3
> and we'll fix it, rather than supporting a bogus (manual) workaround for it.

It is few years since I do not have that XE3 machine.

> Of course if your system is running at 80*C and the hardware shuts
> off at 83*C, you may have a broken fan, or one clogged with dust...

It _did_ have broken fan. It also had broken trip points.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Pavel Machek
Hi!

> > > No, writing trip-points is neither a fix, nor it is reasonable.
> > > It is a workaround at best, and it is a dangerous and mis-leading hack.
> > Yes it is a workaround for critical ACPI bugs like that or similar:
> > https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/22336
> 
> Thanks for pointing that out -- it is a great example
> of how powerful mis-information can be.
> 
> The fact that the trip-points are writable has obscured,
> rather than clarified, the actual causes of the failures.
> No less than 4 people in that bug report declared that
> cleaning the dust out of their fan fixed the root cause.
> A bunch more said that the issues went away when they 
> stopped using ubuntu's user-space power save daemon.
> 
> There are a couple more with broken active fan control --
> which also gets obscured rather than clarified by
> over-riding trip points.
> 
> And finally, there are probably some with clean fans
> that are working properly, but are thermally challenged
> systems.  I'll venture that Windows is NOT modifying or disabling
> the critical trip point to work around this issue.
> I'll venture that their thermal throttling is working
> and ours may not be.
> 
> perhaps it was the recently fixed mod_timer() bug in thermal.c,
> or perhaps it is one that we don't know about yet...
> 
> > It's also convenient to e.g. lower passive trip point to avoid fan
> > noise.
> 
> nope, the OS can't reliably override the processor passive trip point.
> That is what _SCP and cooling_mode are for.

Yes, it is reliable if you turn on thermal polling.

> The reason is that the BIOS can send us a trip-point changed event at any 
> time,
> the kernel will evaluate _PSV, and wipe out the modified OS version.
> 
> if you want to change the state of the fans,
> then poke /proc/acpi/fan/ directly.

Heh, you suggest this? It is even less functional than current
solution -- which works okay as long as you keep thermal polling
working.

> > It's there for a long time, why is this "a dangerous and mis-leading
> > hack." now?
> 
> It has been dangerous and misleading since the day it went in.
> If the user doesn't enable polling, then they are effectively
> writing random numbers that have absolutely no effect on
> the operation of the system, and hiding the numbers that
> do control the operation of the system.

You are misstating the situation. With thermal polling, it is pretty
much okay, and it is certainly better than "ride fans manually" hack
you suggested.

> For folks with the reverse problem -- active cooling where the
> fans kick in early than they'd like, they should just turn off
> the fans via /proc/acpi/fan and not mess with the trip points at
> all.

No. Manually turning off fans is even worse hack.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-21 Thread Thomas Renninger
On Sun, 2007-05-20 at 23:50 -0400, Len Brown wrote:
> On Saturday 19 May 2007 15:56, Thomas Renninger wrote:
> > On Thu, 2007-05-17 at 15:17 -0400, Len Brown wrote:
> > > On Thursday 17 May 2007 05:23, Pavel Machek wrote:
> > > 
> > > > > ACPI: thermal trip points are read-only
> > > > 
> > > > What was the rationale? Can we get this one reverted? 
> > > > 
> > > > Some machines (HP omnibook xe3) have broken trip points -- too high --
> > > > so machine will overheat and trigger hw shutdown before starting
> > > > passive cooling.
> > > > 
> > > > That's really broken, and write to trip points is reasonable way to
> > > > 'fix' that. (I'd understand if you only ever let trip points to
> > > > decrease... but otoh root should be able to shoot himself)
> > > 
> > > No, writing trip-points is neither a fix, nor it is reasonable.
> > > It is a workaround at best, and it is a dangerous and mis-leading hack.
> > Yes it is a workaround for critical ACPI bugs like that or similar:
> > https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/22336
> 
> Thanks for pointing that out -- it is a great example
> of how powerful mis-information can be.
> 
> The fact that the trip-points are writable has obscured,
> rather than clarified, the actual causes of the failures.
> No less than 4 people in that bug report declared that
> cleaning the dust out of their fan fixed the root cause.
> A bunch more said that the issues went away when they 
> stopped using ubuntu's user-space power save daemon.
> 
> There are a couple more with broken active fan control --
> which also gets obscured rather than clarified by
> over-riding trip points.
> 
> And finally, there are probably some with clean fans
> that are working properly, but are thermally challenged
> systems.  I'll venture that Windows is NOT modifying or disabling
> the critical trip point to work around this issue.
> I'll venture that their thermal throttling is working
> and ours may not be.
> 
> perhaps it was the recently fixed mod_timer() bug in thermal.c,
> or perhaps it is one that we don't know about yet...
> 
Whatever it was, it's in a final Ubuntu dist and the trip point
interface
could help some people to still be able to use it.

ACPI is very machine specific. 100 machines may work well and QA might
oversee the 100 and first where critical shutdowns or whatever happens.
Such workarounds are really helpful then.

Same for ignore _PPC and thermal polling (the latter is always on in our
distro,
I bet a lot machine would break if disabling it and just ripping out the
ability to set it, is really not a solution).

One big challenge in the ACPI subsystem (kernel or userspace) is to find
out BIOS implemenations that are at the limit of specs or which violate
the
specs and try to workaround them.
We are not in the position of M$ (at least in the desktop/laptop
segment) yet.
BIOS developers won't follow our implementations and IMO we should go
the
other way and provide more workarounds. If nobody needs them, the
better.

> > It's also convenient to e.g. lower passive trip point to avoid fan
> > noise.
> 
> nope, the OS can't reliably override the processor passive trip point.
> That is what _SCP and cooling_mode are for.
> 
> The reason is that the BIOS can send us a trip-point changed event at any 
> time,
> the kernel will evaluate _PSV, and wipe out the modified OS version.
> 
> if you want to change the state of the fans,
> then poke /proc/acpi/fan/ directly.
> This will have effect until the next trip point
> changes its state.

> 
> > Some people are used to it, I already wanted to write a little userspace
> > prog to use them as it is really easy to fake cooling_mode (trip points
> > are modified by BIOS) and eliminate fan noise and other things by e.g.
> > reducing passsive or whatever trip point.
> 
> please save this effort for a non-ACPI system.
> 
> > This is at least a major sysfs interface change, has this been discussed
> > somewhere before or declared deprecated?
> 
> it went out on linux-acpi, but I don't recall any discussion about it.
> 
> > It's there for a long time, why is this "a dangerous and mis-leading
> > hack." now?
> 
> It has been dangerous and misleading since the day it went in.
> If the user doesn't enable polling, then they are effectively
> writing random numbers that have absolutely no effect on
> the operation of the system, and hiding the numbers that
> do control the operation of the system.
> 
> > I'd suggest to revert this and I can come with something like "only
> > allow lower values
> > than BIOS provides" patch if the current implementation is considered
> > dangerous.
> 
> That simply will not address the issue.
> Indeed, all the entries in the ubuntu bug report are about hitting
> the critical temperature and having a critical shutdown when
> it isn't wanted.  These people want to RAISE the critical shutdown
> trip-point.  Their cooling problems must be fixed -- raising critical
> trip points causes the

Re: [xfs-masters] Re: 2.6.22-rc1-mm1

2007-05-21 Thread Christoph Hellwig
On Mon, May 21, 2007 at 08:11:42PM +1000, David Chinner wrote:
> Christoph - this is an interaction with xfs_buf_associate_memory();
> I'm not sure what it is doing is at all safe now that it never gets
> passed kmem_alloc()d memory - it works for the log recovery case
> because we use it in pairs - once to shorten the buffer and then once
> to put it back the way it was.
> 
> But that doesn't work for the log buffers (we never return them to their
> original state) and the log wrap case looks to work mostly by accident
> now (and could posibly lead to double freeing pages)
> 
> It seems that what we really need with the new code is a xfs_buf_clone()
> operation followed by trimming the range to what the secondary I/O needs
> to span. This would work for the log buffer case as well. Your thoughts?

xfs_buf_associate_memory is a mess.  My original plan was to get rid of
it, but I kept that out to keep that patchset small and easily reviable,
but it seems like that was a mistake.  My plan is the following:

 - xlog_bread and thus the whole buffer I/O path grows an iooffset
   paramater that specifies at which offset into the buffer we start
   the actual I/O.  That gets rid of all the xfs_buf_associate_memory
   memory uses in the log recovery code
 - add a buffer clone operation as suggested by you above, and use
   the offset in xlog_sync aswell.

until then you patch below looks fine.
   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xfs-masters] Re: 2.6.22-rc1-mm1

2007-05-21 Thread David Chinner
On Fri, May 18, 2007 at 12:11:14PM +1000, David Chinner wrote:
> On Thu, May 17, 2007 at 10:05:11PM +0200, Michal Piotrowski wrote:
> > I applied your patch and I get another oops
> > 
> > [  261.491499] XFS mounting filesystem loop0
> > [  261.501641] Ending clean XFS mount for filesystem: loop0
> > [  261.507698] SELinux: initialized (dev loop0, type xfs), uses xattr
> > [  261.567441] XFS mounting filesystem loop0
> > [  261.573931] allocation failed: out of vmalloc space - use vmalloc= 
> > to increase size.
> > [  261.582935] xfs_buf_get_noaddr: failed to map pages
> > [  261.592478] Ending clean XFS mount for filesystem: loop0
> > [  261.618543] SELinux: initialized (dev loop0, type xfs), uses xattr
> > [  261.691563] XFS mounting filesystem loop0
> > [  261.698927] allocation failed: out of vmalloc space - use vmalloc= 
> > to increase size.
> >   
> >   interesting
> 
> Yeah, looks like a vmalloc leak is occurring. I haven't noticed
> it before because:
> 
> VmallocTotal: 137427898368 kB
> VmallocUsed:   3128272 kB
> VmallocChunk: 137424770048 kB
> 
> It takes a long time to leak enough vmapped space to run out on ia64...
> 
> That tends to imply we have a mapped buffer being leaked somewhere.
> Interestingly, I don't see a memory leak so we must be freeing the
> memory associated with the buffer, just not unmapping it first. Not
> sure how that can happen yet.
.
> 
> Looks like we're leaking 272kB of vmalloc space on each mount/unmount
> cycle. I'm trying to track this down now

I've found what is going on here - kmem_alloc() is decidedly more
forgiving than manually built page arrays and vmap/vunmap. Prior
to this change we wouldn't have even leaked memory

Christoph - this is an interaction with xfs_buf_associate_memory();
I'm not sure what it is doing is at all safe now that it never gets
passed kmem_alloc()d memory - it works for the log recovery case
because we use it in pairs - once to shorten the buffer and then once
to put it back the way it was.

But that doesn't work for the log buffers (we never return them to their
original state) and the log wrap case looks to work mostly by accident
now (and could posibly lead to double freeing pages)

It seems that what we really need with the new code is a xfs_buf_clone()
operation followed by trimming the range to what the secondary I/O needs
to span. This would work for the log buffer case as well. Your thoughts?

In the meantime, the following patch appears to fix the leak.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/xfs_log.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-05-21 19:51:18.0 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_log.c  2007-05-21 19:57:30.960084657 +1000
@@ -1457,7 +1457,7 @@ xlog_sync(xlog_t  *log,
} else {
iclog->ic_bwritecnt = 1;
}
-   XFS_BUF_SET_PTR(bp, (xfs_caddr_t) &(iclog->ic_header), count);
+   XFS_BUF_SET_COUNT(bp, count);
XFS_BUF_SET_FSPRIVATE(bp, iclog);   /* save for later */
XFS_BUF_ZEROFLAGS(bp);
XFS_BUF_BUSY(bp);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-21 Thread Williams, Dan J
> From: Cornelia Huck [mailto:[EMAIL PROTECTED]
> On Fri, 18 May 2007 09:30:09 -0700,
> "Williams, Dan J" <[EMAIL PROTECTED]> wrote:
> 
> > When CONFIG_DMA_ENGINE=n async_tx_find_channel takes the form:
> > ... async_tx_find_channel( ... )
> > {
> > return NULL;
> > }
> >
> > So in the S390 case the entire asynchronous path will be compiled
away.
> 
> Unfortunately, do_async_xor() (and others) is not ifdef'ed and
contains
> dma_map_page(), which led to the compile failure...

Sorry, I did not realize dma_map_page did not exist on s390.  I am
building an s390 cross compiler so I can clean up these errors.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-21 Thread young dave

Hi,

kernel booting and stopped in edd.c:read_sector.

I add debug messages around the two inline assemblly sentence, recompile kernel,
now strange thing happend, the kernel booting directly, but the printf
messages can't be seen because it's too rapid.

can we use printk in boot code?

the change of edd.c:
--- edd.c.bak   2007-05-21 14:38:34.0 +
+++ edd.c   2007-05-21 15:58:02.0 +
@@ -47,9 +47,11 @@ static int read_sector(u8 devno, u64 lba
   ax = 0x4200;/* Extended Read */
   si = (size_t)&dapa;
   dx = devno;
+   printf("before first inline\n");
   asm("pushfl; stc; int $0x13; setc %%al; popfl"
   : "+a" (ax), "+S" (si), "+d" (devno)
   : : "ebx", "ecx", "edi");
+   printf("after first inline\n");

   if (!(u8)ax)
   return 0;   /* OK */
@@ -58,9 +60,11 @@ static int read_sector(u8 devno, u64 lba
   cx = 0x0001;/* Sector 0-0-1 */
   dx = devno;
   bx = (size_t)buf;
+   printf("before second inline\n");
   asm("pushfl; stc; int $0x13; setc %%al; popfl"
   : "+a" (ax), "+c" (cx), "+d" (dx), "+b" (bx)
   : : "esi", "edi");
+   printf("after second inline\n");

   return -(u8)ax; /* 0 or -1 */
}



Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-21 Thread Cornelia Huck
On Fri, 18 May 2007 09:30:09 -0700,
"Williams, Dan J" <[EMAIL PROTECTED]> wrote:

> When CONFIG_DMA_ENGINE=n async_tx_find_channel takes the form:
> ... async_tx_find_channel( ... )
> {
>   return NULL;
> }
> 
> So in the S390 case the entire asynchronous path will be compiled away.

Unfortunately, do_async_xor() (and others) is not ifdef'ed and contains
dma_map_page(), which led to the compile failure...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-20 Thread young dave

Hi,

Could you put printf's in the setup code (especially
arch/i386/boot/main.c) to see how far it runs before it dies?

-hpa


I add some debug info to main.c, the result is that the kernel stopped
in query_edd();

Then I use kernel argument edd=off, the kernel booted happilly.

I will read the edd.c to see what happened. do you have some suggestion?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-20 Thread H. Peter Anvin
young dave wrote:
> Hi,
> 
> I tried the vga option , and the selection menu appeared, then I
> select 0(80x25) and nothing happened.
> 

OK.

Could you put printf's in the setup code (especially
arch/i386/boot/main.c) to see how far it runs before it dies?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-20 Thread young dave

Hi,

I tried the vga option , and the selection menu appeared, then I
select 0(80x25) and nothing happened.

2007/5/21, H. Peter Anvin <[EMAIL PROTECTED]>:

young dave wrote:
> Hi,
> My cpu is Intel(R) Pentium(R) D CPU 2.80GHz, below are the lspci
> output and kernel

Could you please try booting with "vga=ask", and see if you get the
video mode selection menu?

-hpa


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22-rc1-mm1] section mismatch.

2007-05-20 Thread Sam Ravnborg
On Mon, May 21, 2007 at 07:01:48AM +0400, Dan Kruchinin wrote:
> Hi.
> 
> Section mismatch:
> --
> WARNING: init/built-in.o - Section mismatch: reference to .init.text:
> from .text between 'rest_init' (at offset 0x11e) and 'try_name'
> WARNING: arch/i386/mach-generic/built-in.o - Section mismatch: reference
> to .init.text: from .data between 'apic_bigsmp' (at offset 0xc4) and
> 'cpu.5773'
> WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference
> to .init.text:amd_init_mtrr from .text between 'mtrr_bp_init' (at offset
> 0xe3ea) and 'ipi_handler'
> WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference
> to .init.text:cyrix_init_mtrr from .text between 'mtrr_bp_init' (at
> offset 0xe3ef) and 'ipi_handler'
> WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference
> to .init.text:centaur_init_mtrr from .text between 'mtrr_bp_init' (at
> offset 0xe3f4) and 'ipi_handler'
> --

Patches for all these are queued up for next -mm.
But thank you for reporting anyway.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-20 Thread H. Peter Anvin
young dave wrote:
> Hi,
> My cpu is Intel(R) Pentium(R) D CPU 2.80GHz, below are the lspci
> output and kernel

Could you please try booting with "vga=ask", and see if you get the
video mode selection menu?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1 [cannot change thermal trip points]

2007-05-20 Thread Len Brown
On Saturday 19 May 2007 15:56, Thomas Renninger wrote:
> On Thu, 2007-05-17 at 15:17 -0400, Len Brown wrote:
> > On Thursday 17 May 2007 05:23, Pavel Machek wrote:
> > 
> > > > ACPI: thermal trip points are read-only
> > > 
> > > What was the rationale? Can we get this one reverted? 
> > > 
> > > Some machines (HP omnibook xe3) have broken trip points -- too high --
> > > so machine will overheat and trigger hw shutdown before starting
> > > passive cooling.
> > > 
> > > That's really broken, and write to trip points is reasonable way to
> > > 'fix' that. (I'd understand if you only ever let trip points to
> > > decrease... but otoh root should be able to shoot himself)
> > 
> > No, writing trip-points is neither a fix, nor it is reasonable.
> > It is a workaround at best, and it is a dangerous and mis-leading hack.
> Yes it is a workaround for critical ACPI bugs like that or similar:
> https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/22336

Thanks for pointing that out -- it is a great example
of how powerful mis-information can be.

The fact that the trip-points are writable has obscured,
rather than clarified, the actual causes of the failures.
No less than 4 people in that bug report declared that
cleaning the dust out of their fan fixed the root cause.
A bunch more said that the issues went away when they 
stopped using ubuntu's user-space power save daemon.

There are a couple more with broken active fan control --
which also gets obscured rather than clarified by
over-riding trip points.

And finally, there are probably some with clean fans
that are working properly, but are thermally challenged
systems.  I'll venture that Windows is NOT modifying or disabling
the critical trip point to work around this issue.
I'll venture that their thermal throttling is working
and ours may not be.

perhaps it was the recently fixed mod_timer() bug in thermal.c,
or perhaps it is one that we don't know about yet...

> It's also convenient to e.g. lower passive trip point to avoid fan
> noise.

nope, the OS can't reliably override the processor passive trip point.
That is what _SCP and cooling_mode are for.

The reason is that the BIOS can send us a trip-point changed event at any time,
the kernel will evaluate _PSV, and wipe out the modified OS version.

if you want to change the state of the fans,
then poke /proc/acpi/fan/ directly.
This will have effect until the next trip point
changes its state.

> Some people are used to it, I already wanted to write a little userspace
> prog to use them as it is really easy to fake cooling_mode (trip points
> are modified by BIOS) and eliminate fan noise and other things by e.g.
> reducing passsive or whatever trip point.

please save this effort for a non-ACPI system.

> This is at least a major sysfs interface change, has this been discussed
> somewhere before or declared deprecated?

it went out on linux-acpi, but I don't recall any discussion about it.

> It's there for a long time, why is this "a dangerous and mis-leading
> hack." now?

It has been dangerous and misleading since the day it went in.
If the user doesn't enable polling, then they are effectively
writing random numbers that have absolutely no effect on
the operation of the system, and hiding the numbers that
do control the operation of the system.

> I'd suggest to revert this and I can come with something like "only
> allow lower values
> than BIOS provides" patch if the current implementation is considered
> dangerous.

That simply will not address the issue.
Indeed, all the entries in the ubuntu bug report are about hitting
the critical temperature and having a critical shutdown when
it isn't wanted.  These people want to RAISE the critical shutdown
trip-point.  Their cooling problems must be fixed -- raising critical
trip points causes them instead to be ignored.

For folks with the reverse problem -- active cooling where the
fans kick in early than they'd like, they should just turn off
the fans via /proc/acpi/fan and not mess with the trip points at all.
If they make a mistake, they will be forgiven when the system
reaches the next trip point and turns the fan back on.

thanks,
-Len


> > The OS has no capability to actually change the ACPI trip points
> > that are used by the BIOS.  Changing the OS copy of them
> > to make the user think that trip events will actually
> > happen when the temperature crosses the OS copy is crazy.
> > 
> > If there are systems with broken thermals and the
> > ACPI thermal control needs and over-ride to turn
> > on the fan, then that is fine -- but using
> > fake trip-points and giving the user the impression
> > that they are real is not viable.
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22-rc1-mm1] vaio laptop (SZ72B) immediately resumes after STR

2007-05-20 Thread Mattia Dongili
On Sun, May 20, 2007 at 06:22:23PM -0700, David Brownell wrote:
> On Sunday 20 May 2007, Mattia Dongili wrote:
> > 
> > $ cat /proc/acpi/wakeup
> > Device  S-state   Status   Sysfs node
> > PWRB  S4*enabled   
> > S1F0  S4 disabled  
> > S1F1  S4 disabled  
> > S1F2  S4 disabled  
> > S1F3  S4 disabled  
> > S1F4  S4 disabled  
> > S1F5  S4 disabled  
> > S1F6  S4 disabled  
> > S1F7  S4 disabled  
> > TLAN  S3 disabled  pci::07:00.0
> > DLAN  S3 disabled  
> > S6F0  S4 disabled  
> > S6F1  S4 disabled  
> > S6F2  S4 disabled  
> > S6F3  S4 disabled  
> > S6F4  S4 disabled  
> > S6F5  S4 disabled  
> > S6F6  S4 disabled  
> > S6F7  S4 disabled  
> > USB1  S3 disabled  pci::00:1d.0
> > USB2  S3 disabled  pci::00:1d.1
> > USB3  S3 disabled  pci::00:1d.2
> > USB4  S3 disabled  pci::00:1d.3
> > USB7  S3 disabled  pci::00:1d.7
> > SLT0  S4 disabled  
> > LANC  S3 disabled  
> > EC0   S5 disabled  
> 
> That's strangely busy ... what ARE all those devices?  :)

the S[16]F* are tons of acpi devices... don't know what they are, they
are attached to the PCI-E port
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
(rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 
(rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 
(rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 
(rev 02)
if you're interested the DSDT is here:
http://www.linux.it/~malattia/sony-laptop/DSDT.sz72b.type3.dsl

> But only the PCI ones -- or certain devices connected to USB
> root hubs -- could be affected by that patch.
> 
> So another experiment you could do, if you want faster info
> than "git bisect" can provide, is building drivers for those
> PCI devices as modules (ehci-hcd, uhci-hcd, sky2) and then
> finding which one causes the trouble by removing them before
> STR.

it's ehci-hcd! and apart from the fact that removing it causes a BUG in
cpufreq no the system stays correctly asleep when suspended.

...
> > > My suspicion, based on the dmesg and seeing what drivers actually
> > > try to enable wakeup, would be the 'sky2' driver.  The other two
> > 
> > FWIW the sky2 is never functional upon resume, I need to ifdown, rmmod,
> > modprobe and ifup again to get some networking...
> 
> Try "rmmod sky2" *before* suspend, to see if that matters.
> 
> Also "rmmod uhci-hcd", which will keep USB from doing anything
> with that biometric thingie.
> 
> I suspect one or the other of those will be the issue.

very close :)
-- 
mattia
:wq!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >