Re: [PATCH] Prevent IDE boot ops on NUMA system in mainline

2008-02-11 Thread Linus Torvalds


On Mon, 11 Feb 2008, Linus Torvalds wrote:
 
 So we should probably make pcibus_to_node() be an inline function for that 
 case

Or, we could just do the ugliest patch ever, namely

-#define pcibus_to_node(node)   (-1)
+#define pcibus_to_node(node)   ((int)(long)(node),-1)

Wow. It's so ugly it's almost wraps around and comes out the other sides 
and looks pretty.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Prevent IDE boot ops on NUMA system in mainline

2008-02-11 Thread Linus Torvalds


On Mon, 11 Feb 2008, Andi Kleen wrote:
 
 Without this patch a Opteron test system here oopses at boot with currentg 
 git. 
 
 Calling to_pci_dev() on a NULL pointer gives a negative value so the 
 following NULL 
 pointer check never triggers and then an illegal address is referenced. Check 
 the 
 unadjusted original device pointer for NULL instead.
 
 Signed-off-by: Andi Kleen [EMAIL PROTECTED]
 
 Index: linux/include/linux/ide.h
 ===
 --- linux.orig/include/linux/ide.h
 +++ linux/include/linux/ide.h
 @@ -1294,7 +1294,7 @@ static inline void ide_dump_identify(u8 
  static inline int hwif_to_node(ide_hwif_t *hwif)
  {
   struct pci_dev *dev = to_pci_dev(hwif-dev);
 - return dev ? pcibus_to_node(dev-bus) : -1;
 + return hwif-dev ? pcibus_to_node(dev-bus) : -1;
  }

Ok, I applied this, but it causes a *lot* of noise about unused variable 
'dev' because pcibus_to_node() is defined to be -1 when you don't do any 
strange NUMA thing (so that dev-bus usage thing is never even seen by 
the compiler.

So we should probably make pcibus_to_node() be an inline function for that 
case, or just make that thing be

return hwif-dev ?
pcibus_to_node(to_pci_dev(hwif-dev)-bus)
   :
-1;

or something. Because now things are really noisy.

Preferences, anybody? Bartlomiej?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] IDE updates part #4

2008-02-05 Thread Linus Torvalds


On Sat, 2 Feb 2008, Bartlomiej Zolnierkiewicz wrote:
 
 * next part of IDE probing code re-organization saga
   (that would be me)

This seems to cause very irritating and bogus messages for me:

Probing IDE interface ide0...
Probing IDE interface ide1...
ide2: I/O resource 0x0-0x7 not free.
ide2: ports already in use, skipping probe
ide3: I/O resource 0x0-0x7 not free.
ide3: ports already in use, skipping probe
ide4: I/O resource 0x0-0x7 not free.
ide4: ports already in use, skipping probe
ide5: I/O resource 0x0-0x7 not free.
ide5: ports already in use, skipping probe
ide6: I/O resource 0x0-0x7 not free.
ide6: ports already in use, skipping probe
ide7: I/O resource 0x0-0x7 not free.
ide7: ports already in use, skipping probe
ide8: I/O resource 0x0-0x7 not free.
ide8: ports already in use, skipping probe
ide9: I/O resource 0x0-0x7 not free.
ide9: ports already in use, skipping probe

and that's just totally bogus. It shouldn't even request that region, 
since it's not been allocated!

So that ide_device_add_all() is missing some checks. Should it check the 
probe[] array like ideprobe_init() used to, or what?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-10 Thread Linus Torvalds


On Mon, 10 Dec 2007, Marco Gatti wrote:
 
 I didn't compile completly.
 
 drivers/scsi/scsi_lib.c:1565:1: error: unterminated #else

Heh. That #else should be an #endif, of course.

It is a bit strange that it still tries to do IO to high memory. Either 
the whole 64 bit capability thing in AHCI is broken, or the bounce 
buffering doesn't work right. Or maybe you tried the iommu=off without 
the original patch that tried to turn off 64-bit DMA?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24-rc4

2007-12-04 Thread Linus Torvalds


On Tue, 4 Dec 2007, Maciej Rutecki wrote:

 ata1: SATA max UDMA/133 irq_stat 0x00400040, connection status changed irq 220
 ata2: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xe8585180 irq 220
 ata3: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xe8585200 irq 220
 ata4: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xe8585280 irq 220
 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 failed (Emask=0x1 Stat=0x51 Err=0x04)
 ata1: failed to recover some devices, retrying in 5 secs
 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 failed (Emask=0x1 Stat=0x51 Err=0x04)
 ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5)
 ata1.00: revalidation failed (errno=1)
 ata1: failed to recover some devices, retrying in 5 secs
 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata1.00: configured for UDMA/133
 
 dmesg, lspci, hdparm:
 http://www.unixy.pl/maciek/download/kernel/2.6.24-rc4/

Can you make a real bug report, which includes things like whether this 
ever worked without that timeout, and if so, when it last worked and what 
the dmesg was then?

(Also, it's almost totally pointless to make me the primary To: target for 
something like this. Jeff Garzik hopefully picks this up from the 
linux-ide list, but I'm adding him to the Cc too)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] Add and use media change notification

2007-11-04 Thread Linus Torvalds


On Sun, 4 Nov 2007, Jeff Garzik wrote:
 
 The end to CD-ROM polling...  newer SATA ATAPI hardware will emit
 'asynchronous notification' events when media is changed.  This adds
 support.

I *really* didn't want to pull this.

Not only is it after the -rc1 period, but I also think you pushed this ina 
really offensive manner, and I don't like how you and James have made this 
whole thing personal. You guys need to sort it out, and there is no way 
you can blame James for all your troubles, since I've heard the same kind 
of complaints about every single maintainer out there (including you) 
about some driver or other infrastructure issue.

So I'm unhappy about pulling this.

That said, I did finally decide to just pull it. Partly because the 
concept apparently has been in -mm for a while anyway (even if the final 
patch has not - but the patch itself isn't that large, I'd worry more 
about thngs like certain hardware simply not doing things right), but 
mostly because I hate it when others hold up driver features, and I 
decided that in this case this really is something that is better done 
earlier rather than later, to get exposure as soon as possible.

But I really think you need to lay off James, and help him rather than 
make every single complaint a big flame-war!

Please, Jeff?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] libata fixes

2007-10-30 Thread Linus Torvalds


On Tue, 30 Oct 2007, Jeff Garzik wrote:
 
 Mikael Pettersson (2):
   sata_promise: ASIC PRD table bug workaround, take 2
   sata_promise: cleanups

You and Mikael need to sort out the way you send/accept patches. 

Both of these commits had stuff like this:

Signed-off-by: Mikael Pettersson [EMAIL PROTECTED]
--
Changes since previous version:
* use new PDC_MAX_PRD constant to initialise sg_tablesize

 drivers/ata/sata_promise.c |   87 
++---
 1 files changed, 83 insertions(+), 4 deletions(-)
Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

which seems to be because Mikael uses two dashes instead of three to 
separate his real message from the stuff you have.

So either you need to teach Mikael to use the proper separators, or you 
need to edit these things down to be something readable instead of keeping 
the extraneous commentary around...

Pulled, but I'm hoping for cleaner commit messages in the future..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] libata fixes

2007-10-30 Thread Linus Torvalds


On Tue, 30 Oct 2007, Jeff Garzik wrote:
 
 Can we change git-am to accept two dashes as well as three?  :)
 
 It seems pretty common, not just with Mikael but several others who send
 patches to me.

Well, git-am actually used to be a lot less strict about the dashes, and 
we've made it *more* strict rather than less, because the more of these 
breaks we accept, the more likely it is that something that was intended 
to be part of the message gets thrown out.. So I'll say that I'm a bit 
nervous about extending it again.

The reason for the three dashes is actually that that is what a *diff* 
starts with. So if you look at what closes a description as far as 
git-am is concerned, they are currently all things that are likely to 
start a patch: Index:  or diff - or --- filename, and that last 
case was then extended to be manual break even without the filename 
information.

See git/builtin-mailinfo.c: patchbreak().

But you could try to sell it to Junio. He's the maintainer, and while I 
care about some other things and will argue violently against them, when 
it comes to something like this, Junio is the guy to go to.

That said, I really think you could just try to educate the people you 
work with. Maybe they just never even realized that three dashes is what 
you're supposed to use!

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression: commit ide: constify struct ide_port_info causes breakage

2007-10-26 Thread Linus Torvalds


On Fri, 26 Oct 2007, Russell King wrote:

 commit 8562043606430185cad26d085d46adcc7ad67fd1 is broken, causing:
 
   CC  drivers/ide/pci/cmd64x.o
   CC  drivers/ide/pci/hpt366.o
 drivers/ide/pci/hpt366.c:1428: error: hpt366_chipsets causes a section type 
 conflict
 
 and therefore should be reverted.
 
 The problem arises because hpt366 has other data marked with __devinitdata,
 so the compiler tries to define the initdata section with read-write
 attributes at one point, and read-only attributes when encountering the
 const-but-devinitdata declaration:

Sad. This is not too uncommon, but yeah, it's really sad that we cannot 
use const (as a compiler type safety marker) together with section 
definitions (to get it put in special init sections).

Bartlomiej - I don't think we need to revert the whole commit, but all 
those static const xyz __devinitdata = { .. things probably do need to 
have the const removed, even if it would be otherwise correct ;(

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/2] 2.6.23-rc3: known regressions with patches

2007-09-01 Thread Linus Torvalds


On Wed, 29 Aug 2007, Michal Piotrowski wrote:
 
 Clocks  time
 
 Subject : double hpet clocksource  hard freeze
 References  : http://lkml.org/lkml/2007/8/23/257
 Last known good : ?
 Submitter   : Paolo Ornati [EMAIL PROTECTED]
 Caused-By   : Tony Luck [EMAIL PROTECTED]
   commit 0aa366f351d044703e25c8425e508170e80d83b1
 Handled-By  : John Stultz [EMAIL PROTECTED]
   Tony Luck [EMAIL PROTECTED]
 Patch   : http://lkml.org/lkml/2007/8/23/285
 Status  : patch available

Should be solved by 3b2b64fd311c92f2137eb7cee7025794cd854057, although 
differently from the the patch suggested above. So the regression should 
be gone a of -rc5.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pata_hpt37x: Fix 2.6.22 clock PLL regression

2007-07-24 Thread Linus Torvalds


On Tue, 24 Jul 2007, Alan Cox wrote:
 
   Just one version of Linux ago
   The PLL code broke - oh no!
   But set the right mode
   And fix up the code
   Makes the PLL timing sync go

Alan, I'm getting a bit worried about you.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches 1/2] warnings: attack valid cases spotted by warnings

2007-07-18 Thread Linus Torvalds


On Wed, 18 Jul 2007, Roland Dreier wrote:
 
 BTW, I noticed one interesting thing while starting on this cleanup.
 I wanted to make sure that the generated code didn't change with the
 first step, and I actually discovered that the patch below seems to
 make the generated code *better*, maybe because some gcc alias
 analysis doesn't get as paranoid without the void *:

Absolutely.

The way to get pretty much any compiler to generate better code is:
 - code it simply
 - don't have tons of variables with overlapping lifetime
 - use limited-scope variables (ie don't have the variables at the 
   outermost scope, declare them in the smallest scope you can)

and trying to split things up into functions helps all of these. 

In fact, you can often get better code even when the functions aren't even 
inlined, because of the simpler code issue. Gcc sometimes tries to be 
too clever with its CSE etc, and generates really nasty complex code with 
lots of register spills, just because it keeps stuff live rather than just 
regenerating them.

So inlining a function doesn't even make it faster, all the time. You want 
to inline when 

 - the function is so small that the call is literally a big part of it, 
   and it doesn't even need more than a couple of registers, so the 
   calling convention has more register pressure than inlining the 
   function itself would have.

OR

 - the callers tend to have constant arguments that can benefit from 
   constant folding inside the function

but inlining in many other circumstances actually generates *worse* code 
and just makes debugging harder and the I$ footprint bigger.

 And here's the patch itself -- I think this is a reasonable size step
 to break things up into.  I assume that this is the basic form of the
 cleanup that you're proposing?

Yes, this looks good. Doing these kinds of things for the various other 
things is likely to make the code more readable, and as you already found 
out, the generated code doesn't tend to be worse either. It might not 
_always_ be a win size and performance-wise, but it can be, and so 
readability should generally always be the #1 goal, because quite often it 
actually does help the compiler too.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches 1/2] warnings: attack valid cases spotted by warnings

2007-07-17 Thread Linus Torvalds


On Tue, 17 Jul 2007, Andrew Morton wrote:
 
 (rofl, look at that mess: it was utterly impractical, unrealistic and
 stupid for gcc to go and UTFify the compiler output.  Sigh.  LANG=C, guys).

Yeah, I absolutely *detest* how gcc does idiotic quoting just because you 
happen to be in a UTF-8 locale. There's no reason for it what-so-ever, and 
I don't understand the mindset of whoever did that crap.

They should use a nice ASCII tick-mark ('), not some fancy quotation 
marks.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches 1/2] warnings: attack valid cases spotted by warnings

2007-07-17 Thread Linus Torvalds


On Tue, 17 Jul 2007, Roland Dreier wrote:

 So setting a variable to something meaningless (guaranteeing that a 
 garbage value is used in case of a bug) just to shut up a warning makes 
 no sense -- it's no safer than leaving the code as is.  

Wrong.

It's safer for two reasons:
 - now everybody will see the *same* behaviour
 - the meaningless value is guaranteed to not be a security leak

but the whole shut up bogus warnings is the best reason.

So it *is* safer than leaving the code as-is.

Of course, usually the best approach is to rewrite the code to be simpler, 
so that even gcc sees that something is obviously initialized. Sadly, 
people seldom do the right thing, and sometimes gcc just blows incredibly 
hard.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches 1/2] warnings: attack valid cases spotted by warnings

2007-07-17 Thread Linus Torvalds


On Tue, 17 Jul 2007, Roland Dreier wrote:

 I think this patch (on top of the previous one) actually makes the
 code clearer

Quite frankly, calling this making the code clearer is a bit ridiculous.

That code still is absolute *crap* from a readability angle. It doesn't 
follow any sane coding standards, and certainly not the most important 
ones (keep the function small, don't have tons of local variables), 
and it has absolutely ridiculously ugly casts that get repeated over and 
over and over again.

Quite frankly, I don't quite understand where you get those enormous balls 
you have, that you can then talk about how ugly it is to just add a = 0 
that shuts up a compiler warning. That's the _least_ ugly part of the 
whole damn function!

So rather than sending out that idiotic patch, look at that code for five 
seconds, and ponder whether it really needs to be that ugly.

Here's a few things that you could *really* do to make it somewhat more 
readable:

 - make that whole switch() statement from hell another function 
   entirely, and have it return the size of the thing, so that you don't 
   need to have

wqe += xxx
size += xxx / 16;

   repeated fifty times (and so that it's also obvious that the  
   always matches).

 - make each switch case actually call a small function with the argument 
   cast to the right pointer type, so that you need *one* cast per case, 
   rather than a handful.

End result? More readable source code, with functions that are 20 lines 
long (or less), rather than 200 lines of spagetti-coding.

And you know what? That's actually more important than 16 bytes of object 
code, although looking at the size of the infiniband code, I *seriously* 
doubt any infiniband person has ever cared about object code size in their 
life. That thing is not for weak machines or stomachs.

The warnign (and fixing it up) is the _least_ of the problems in that 
code, methinks.

Anyway, here's a totally untested cleanup that compiles but probably 
doesn't work, because I didn't check that I did the right thing with all 
the pointer arithmetic (ie when I change wqe to a real structure pointer 
instead of just a void *, maybe I left some pointer arithmetic around 
that expected it to work as a byte pointer, but now really works on the 
whole structure size instead).

So this patch is NOT meant to be applied, but it is meant to teach people 
how things like this should be done. They should *not* be one big function 
with lots of case statements. They should be lots of small functions!

Linus
---
 drivers/infiniband/hw/mthca/mthca_qp.c |  236 ---
 1 files changed, 122 insertions(+), 114 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c 
b/drivers/infiniband/hw/mthca/mthca_qp.c
index 11f1d99..74da9bc 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1578,6 +1578,113 @@ static inline int mthca_wq_overflow(struct mthca_wq 
*wq, int nreq,
return cur + nreq = wq-max;
 }
 
+static int handle_next_seg(struct ib_send_wr *wr, struct mthca_next_seg * wqe)
+{
+   wqe-nda_op = 0;
+   wqe-ee_nds = 0;
+   wqe-flags =((wr-send_flags  IB_SEND_SIGNALED) ?
+cpu_to_be32(MTHCA_NEXT_CQ_UPDATE) : 0) |
+   ((wr-send_flags  IB_SEND_SOLICITED) ?
+cpu_to_be32(MTHCA_NEXT_SOLICIT) : 0)   |
+   cpu_to_be32(1);
+
+   if (wr-opcode == IB_WR_SEND_WITH_IMM ||
+   wr-opcode == IB_WR_RDMA_WRITE_WITH_IMM)
+   wqe-imm = wr-imm_data;
+
+   return sizeof(struct mthca_next_seg);
+}
+
+static int handle_raddr_seg(struct mthca_dev *dev, struct mthca_qp *qp, struct 
ib_send_wr *wr,
+   struct mthca_raddr_seg *wqe, int ind)
+{
+   switch (qp-transport) {
+   case RC:
+   switch (wr-opcode) {
+   case IB_WR_ATOMIC_CMP_AND_SWP:
+   case IB_WR_ATOMIC_FETCH_AND_ADD: {
+   struct mthca_atomic_seg *atomic;
+
+   wqe-raddr = cpu_to_be64(wr-wr.atomic.remote_addr);
+   wqe-rkey = cpu_to_be32(wr-wr.atomic.rkey);
+   wqe-reserved = 0;
+
+   atomic = (struct mthca_atomic_seg *) (wqe+1);
+
+   if (wr-opcode == IB_WR_ATOMIC_CMP_AND_SWP) {
+   atomic-swap_add = 
cpu_to_be64(wr-wr.atomic.swap);
+   atomic-compare = 
cpu_to_be64(wr-wr.atomic.compare_add);
+   } else {
+   atomic-swap_add = 
cpu_to_be64(wr-wr.atomic.compare_add);
+   atomic-compare = 0;
+   }
+
+   return sizeof (struct mthca_raddr_seg) +
+   sizeof (struct mthca_atomic_seg);
+   }
+
+   case IB_WR_RDMA_WRITE:
+   case 

Re: [git patches 1/2] warnings: attack valid cases spotted by warnings

2007-07-17 Thread Linus Torvalds


On Tue, 17 Jul 2007, Roland Dreier wrote:
 
   Anyway, here's a totally untested cleanup that compiles but probably 
   doesn't work, because I didn't check that I did the right thing with all 
   the pointer arithmetic (ie when I change wqe to a real structure pointer 
   instead of just a void *, maybe I left some pointer arithmetic around 
   that expected it to work as a byte pointer, but now really works on the 
   whole structure size instead).
 
 Given that you took the time to do this, I'll get the patch into a
 working state and apply it.  And maybe split it into reviewable chunks
 while I'm at it ;)

Hey, I appreciate it, but I really do have to warn you that I did this all 
blind, and just meant for it to be a I think this kind of direction is 
more productive thing. I'm not going to guarantee that it works at all.

I spent more time than I really wanted to on actually making sure the end 
result even compiles (quite often, I'm perfectly happy to just send out 
pseudo-code to indicate what I think should be done), but maybe I 
shouldn't have done that, just so that nobody thinks that the patch is 
necessarily going to *work*.

In other words, it's a totally mindless cleanup and re-factorization. It 
*may* work, but quite frankly, I did it just for that one email, and spent 
the absolutly minimum possible time thinking about it. As a result, I 
really think it needs some serious looking over.

I also think that my new handle_raddr_seg() function could itself be 
split up into subfunctions along the switch() subcases.

So not only wasn't it meant to be guaranteed correct, but it wasn't even 
meant to be seen as the best possible final situation: it could be - and 
should be - factored out even more (and the same goes for a lot of the 
other functions in that file that I didn't really look at, just glance and 
notice that they have some of the same problem patterns).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/2] 2.6.22-rc7: known regressions

2007-07-05 Thread Linus Torvalds


On Wed, 4 Jul 2007, David Woodhouse wrote:
 
 Oh, and here's another one for you. My Bluetooth mouse just stopped
 working and hidd is deadlocked...

Looks like it is stuck on hidp_session_sem.

Nothing after 2.6.21 seems to have even touched that semaphore usage, and 
in fact there's not a whole lot of changes to the hidp code at all (and 
none of them look even remotely interesting). 

So I suspect it's something lower down in the bluetooth stack, or it's a 
long-standing problem that you are somehow able to trigger more easily 
now. Is it consistent?

Can you showo the traces for the _other_ processes that are in bluetooth 
functions? Because there should be other processes there, holding that 
hidp_session_sem rwsem.

[ Alternatively, there is some process that doesn't release it in an error 
  case, but that is definitely not a regression if so: the changes to 
  net/bluetooth/hidp/core.c since 2.6.21 really are trivial. ]

IOW, more info needed, I think.

Linus

---
 hidd  D 1FE27798  5940  1695  1 (NOTLB)
 Call Trace:
 [ef3ddb70] [0004] 0x4 (unreliable)
 [ef3ddc30] [c0008e7c] __switch_to+0x50/0x68
 [ef3ddc50] [c02d5998] schedule+0x3cc/0x480
 [ef3ddc80] [c0137a20] rwsem_down_failed_common+0x1c4/0x1f4
 [ef3ddcb0] [c02d7454] rwsem_down_write_failed+0x28/0x40
 [ef3ddce0] [c004ff60] down_write+0x50/0x64
 [ef3ddd00] [f27f2068] hidp_add_connection+0x168/0x75c [hidp]
 [ef3ddd40] [f27f2e44] hidp_sock_ioctl+0x140/0x414 [hidp]
 [ef3ddeb0] [c024da6c] sock_ioctl+0x248/0x284
 [ef3dded0] [c00ab02c] do_ioctl+0x38/0x84
 [ef3ddee0] [c00ab448] vfs_ioctl+0x3d0/0x404
 [ef3ddf10] [c00ab4e4] sys_ioctl+0x68/0x98
 
 -- 
 dwmw2
 
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/2] 2.6.22-rc7: known regressions

2007-07-05 Thread Linus Torvalds

Looks input-related..

On Thu, 5 Jul 2007, David Woodhouse wrote:
 
 Hm, it's not something new. It's an oops I saw occasionally in 2.6.21-rc
 too, whenever we had CONFIG_SYSFS_DEPRECATED set.
 
  Unable to handle kernel paging request for data at address 0x6b6b6b6b

Ok, that 0x6b is obviously the kfree() poisoning, ie it looks like a 
use-after-free problem with a pointer being loaded from a structure that 
had been free'd-

And the trace seems to be (ignore the unreliable one):

  NIP [c001870c] strlen+0x4/0x18
  LR [c0134fec] kobject_get_path+0x34/0xc4
  Call Trace:
  [eed5be90] [c01d5124] class_uevent+0xac/0x1bc
  [eed5bed0] [c01357e4] kobject_uevent_env+0x23c/0x460
  [eed5bf20] [c01d485c] class_device_del+0x178/0x1a0
  [eed5bf40] [c01d489c] class_device_unregister+0x18/0x30
  [eed5bf60] [c021f820] input_unregister_device+0xf4/0x130
  [eed5bf70] [c0242f4c] hidinput_disconnect+0x2c/0x60
  [eed5bf90] [f27f2bac] hidp_session+0x550/0x584 [hidp]
  [eed5bff0] [c0013e28] kernel_thread+0x44/0x60

Where we have a few missing functions due to inlining, ie the real 
sequence seems to be:

class_device_del -
  kobject_uevent_env -
class_uevent -
  kobject_get_path -
get_kobj_path_length -
  parent = kobj;
  do {
strlen(parent-k_name /* kobject_name(parent) */);
parent = parent-parent;
  } while (parent);

so either the kobj or one of it's parents had already been freed when it 
was unregistered due to the disconnect.

I'm not seeing any reference counting or other protection for the device 
(input) on hid-inputs list. But I don't know the code. Dmitry? Jiri?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] libata fixes

2007-07-03 Thread Linus Torvalds


On Tue, 3 Jul 2007, Alan Cox wrote:

  Chuck Ebbert (1):
pata_ali: fix UDMA settings
 
 Could you please fix your git tree to have the proper credits for patches
 you pull from bugzilla. 
 
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=242472

I don't think it is Jeff who needs to fix anything: it seems that it is 
Chuck Ebbert you should talk to. He's the one that dropped the 
attribution. Probably because of some insane system he uses (he has a 
comment in that bugzilla about patch from comment #14 is in CVS now..

CVS? What kind if insane setup do you have there at Red Hat?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] libata fixes

2007-06-27 Thread Linus Torvalds


On Wed, 27 Jun 2007, Jeff Garzik wrote:
 
 Such would be a diagnostic that would trigger on valid SCSI commands, when the
 user is doing nothing wrong and the system can indeed complete the command
 just fine.  Additionally, this is moving us in the direction of what the IDE
 driver has apparently been doing.

Indeed. At least the IDE CD-ROM driver does

if ((rq-data_len  15) || (addr  mask))
info-dma = 0;

where the mask is the dma_alignment mask. So it requires the length to 
be a multiple of 16, and also requires a certain alignment of the data 
(which defaults to 32 bytes for some reason I cannot for the life of me 
remember).

The generic BIO layer does that DMA alignment check too when mapping user 
pages.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: Linux v2.6.22-rc3

2007-06-09 Thread Linus Torvalds


On Thu, 7 Jun 2007, Jeff Garzik wrote:

 Ack'ing the sata_promise change was easy, but with this one it would
 be nice to wait a bit before changing the core probe code that [now]
 every ATA setup goes through, based on a single bug report.

So what's the resolution? Right now this is apparently the reasong for 
two of our current regressions, and we need to solve it.

Just wait is not an option. The options are:
 - fix things
 - revert everything that caused the regressions

and quite frankly, I don't think the second alternative is palatable to 
anybody, including you.

 The values assist in detecting ghost devices (same device appearing
 on both master and slave) and TF register malfunctions, and I would
 appreciate not breaking _that_ so late in 2.6.22-rc for a single
 report.  Thankfully we have -some- ghost device prevention code
 elsewhere, but this is part of it.

Apparently this isn't even true. As far as I can tell, the old code used 
to wait for lbal to be 1, but if it never became one, it would just 
silently time out and not *do* anything about it. Later commands would 
just work, and the device would go its merry ways.

IOW, the new code is *crap*. The old code was arguably not wonderful 
either, but because of its nature, the old code not working didn't 
actually matter all that much.

So the biggest change (and the one that broke peoples machines) is that 
the code that you claim matters, *never* apparently did matter, and now 
that we've made it matter (by returning an error, and aborting the SRST 
sequence as a failure), people are complaining.

I really think we should apply Tejun's patch. Add a delay there, by all 
means if you are nervous: but no, it shouldn't be 150ms. This is the 
_post_ reset handling, we've already done the 150ms wait for the reset!

In fact, if you look at ata_bus_post_reset(), you'll notice that for 
port0, we just do the ata_wait_ready() call. Tejun's patch just made us 
do the same (logical) thing for port1 too!  If you think it breaks due to 
ome timeout issue, then the port0 thing was already broken.

(And maybe we could make the 

ap-ops-dev_select(ap, 1);
rc = ata_wait_ready(ap, deadline);

instead be a

ata_dev_select(ap, 1, 1, 1);

and you'd wait even more? But that's actually a bigger change than Tejun's 
thing, and it uses ata_wait_idle() rather than ata_wait_ready(), and 
doesn't return any status.. I dunno. But right now, we have several known 
broken things, that apparently weren't broken before!)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libata: always use polling SETXFER

2007-06-03 Thread Linus Torvalds


On Sun, 3 Jun 2007, Jeff Garzik wrote:
 
 If we have another -rc, I would probably be OK with pushing it for 2.6.22,
 otherwise I would prefer to wait for 2.6.23.

We'll definitely have another -rc. I'll push -rc4 tonight, and while I'm 
hoping that we'll have resolved a number of the regressions, I can 
guarantee an -rc5 and I'd be very surprised if we don't have an -rc6 too.

(In fact, -rc6 tends to be what I consider the sweet spot for when I start 
thinking that I'm ready for a release).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux v2.6.22-rc3

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Jeff Garzik wrote:
 
 I'm about to dive into some heads-down RHEL backporting (whee), so I cannot
 look at the code in depth this weekend, but here are my basic thoughts:
 
 * We knew there would be fallout from the new reset-sequence code, and this is
 clearly in that category.
 
 * It worked before #reset-seq merge AFAICT, which implies the old method of
 probing -- which included SRST -- worked.

Well, I don't think it really worked before. It apparently always had a 
bad 30-second timeout (probably because the reset just didn't work at 
all). It's just that the old code didn't care, and since the identify then 
worked, it was all good.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux v2.6.22-rc3

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Jeff Garzik wrote:
 
 With these old PATA devices, device reset is six of one, half-dozen of the
 other.  Using SRST is the only way to kick some ATAPI devices into working:
 http://suif.stanford.edu/~csapuntz/blackmagic.html#reset

Well, wouldn't it be a good thing to
 1) if BUSY/DRQ is set even before you try the problem, obviously skip the 
two polite cases, and go to #4
 2) try to just do an IDENTIFY 
 3) if that doesn't work, do a HOST RESET and then try again
 4) if that doesn't work, do the full SRST

(or some variation of the above).

That at least has the potential to get you six working devices, and half a 
dozen other working devices, for a total of 12. With no unnecessary resets 
or long timeouts!

(Btw, the 150ms wait after reset is really nasty. A few of those, and 
we're wasting seconds during bootup. Why the hell does it do that, when 
the old driver - and the spec - said 2ms?)

 I'm mainly interested in hearing feedback from Fedora 7 damage, before making
 a major decision about the probing code.  If this is a single dain bramaged
 device, we should avoid punishing the majority.  But if this is a trend, it
 warrants careful reconsideration.

I thought the default in Fedora was to use the PATA driver first? If so, 
you're going to miss a lot of cases that just work, because they use the 
old driver.

At least that was what my situation was on the P4 that I recently 
complained about the set transfer mode problem: it used to use the old 
PATA drivers that didn't have the issue, so I never even _knew_ the new 
libata-based code had problems.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux v2.6.22-rc3

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Dave Jones wrote:
 
 There are no old drivers in F7 and beyond.
 
 # CONFIG_IDE is not set

Ahh, that's certainly going to root out the issues. Now let's hope that 
people install it..

(Fedora seems to make it hard on purpose to upgrade between major 
revisions, but maybe that's just me not reading the docs as usual ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux v2.6.22-rc3

2007-05-31 Thread Linus Torvalds


On Fri, 1 Jun 2007, Tejun Heo wrote:

 Gregor's cdrom has broken SRST support which is extremely rare and
 broken.

Well, the concept is neither rare nor arguably all that broken. 

The paper standards floating around in the industry are so much toilet 
paper. The only standard that seems to really matter is what Windows has 
traditionally done. We may not like it, but there it is...

This bites us more when it comes to the real el-cheapo stuff, notably when 
it comes to various USB mass storage things (which have some random 
USB-flash controller cobbled together by a senile llama on crack), and is 
almost unheard of for anythign that is server-grade, but when it comes 
to no-brand random devices, it really does tend to be the case that the 
only testing they ever had was using Windows.

And hardware is seldom any different from software: if it wasn't tested, 
it probably doesn't work.

So it would be good if somebody knew what the Windows ID/startup sequence 
was/is. I think we figured it out by trial-and-error for the USB mass 
storage stuff. But it tends to boil down to: don't do things that aren't 
absolutely required (for SCSI, it was things like not asking for mode 
pages that weren't absolutely required, because some devices wouldn't 
support it, and would simply lock up if you did so!)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux v2.6.22-rc3

2007-05-29 Thread Linus Torvalds


On Tue, 29 May 2007, Tejun Heo wrote:
 
 Aieee, so the drive doesn't like the new SRST sequence.

It would appear that the old code largely ignored the SRST error entirely, 
no?

If we *used* to do (in ata_bus_post_reset()):

if (dev0)
ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);

and you changed that to actually care about the return value:

if (dev0) {
rc = ata_wait_ready(ap, deadline);
if (rc  rc != -ENODEV)
return rc;
}

(in _two_ places). That change also changed the same post_reset handling 
in a totally _different_ way: it used to do ata_busy_sleep() twice, now it 
still does it twice, but it does it with the same timeout value, so if 
the first one times out, then the second one won't be given any timeout AT 
ALL!

And to make matters worse: the first timeout seems to be for ANOTHER PORT 
ENTIRELY! So you seem to break port 1 even if the timeout happened on port 
0, as far as I can read that sequence. 

So I think your ata_bus_post_reset() changes are rather suspect. The fact 
that you don't change the timeout, and use the same deadline for two 
different ports (and for multiple commands to the same port, afaik), seems 
rather suspect. The old code also didn't care about failures in certain 
phases of the reset sequence, and it appears that it did so for good 
reason.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux v2.6.22-rc3

2007-05-27 Thread Linus Torvalds


On Sun, 27 May 2007, Gregor Jasny wrote:

 2007/5/27, Jeff Garzik [EMAIL PROTECTED]:
  Does the attached patch fix things?
 
 No it doesn't. Note that I'm using ata_piix.

The commit message for that thing is badly worded. It definitely hits 
ata_piix too, and it concerns a lot more than just the LITE-ON CD-ROM 
drives (ie that patch fixes a bug for me on a PIIX system I have access 
to, with a TDK drive).

But if it doesn't help for you, you have some other issue (which is not 
surprising - yours wasn't a SETFXSR error, and I don't think it would have 
worked very well before either).

So since it apparently _did_ work for you before, can you bisect it?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] IDE fixes

2007-05-23 Thread Linus Torvalds


On Thu, 24 May 2007, Bartlomiej Zolnierkiewicz wrote:
 diff --git a/drivers/ide/pci/serverworks.c b/drivers/ide/pci/serverworks.c
 index 6234f80..47bcd91 100644
 --- a/drivers/ide/pci/serverworks.c
 +++ b/drivers/ide/pci/serverworks.c
 @@ -173,7 +179,7 @@ dma_pio:
  ((dma_stat(1(5+unit)))==(1(5+unit {
   u8 dmaspeed = dma_timing;
  
 - dma_timing = ~0xFF;
 + dma_timing = ~0xFFU;

This is just crap.

The old code was _also_ crap, but the new code just is worse. 

What's the point of this, really?

dma_timing is a 8-bit value, so the above is just a *really* stupid and 
bad way of saying

dma_timing = 0;

and whoever wrote that code is just terminally confused.

I pulled, but that driver is CRAP. Please don't add new crap blindly like 
that. 

Andrew, that was your change, it appears. Tssk.

Please, we do NOT fix compiler warnings without understanding the code! 
That's a sure way to just introduce _new_ bugs, rather than fix old ones. 
So please, please, please, realize that the compiler is _stupid_, and 
fixing warnings without understanding the code is bad.

In this case, anybody who actually spends 5 seconds looking at the code 
should have realized that the warning is just another way of saying that 
the author of the code was on some bad drugs, and the warnings WERE BETTER 
OFF REMAINING! Because that code _should_ have warnings. Big fat warnings 
about incompetence!

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] IDE fixes

2007-05-23 Thread Linus Torvalds


On Wed, 23 May 2007, Linus Torvalds wrote:
 
 So please, please, please, realize that the compiler is _stupid_, and 
 fixing warnings without understanding the code is bad.

Btw, this is fundamental. If you don't need to understand the code, then 
the compiler could have just fixed it for you, and there was no need to 
warn.

So compiler warnings have two cases:

 - the compiler is being a complete a**hole, and the warning should exist. 
   It happens.
 - it can be fixed, but only by understanding what the code wants to do.

In no case is it ok to just shut up the warning. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/3] 2.6.22-rc1: known regressions v2

2007-05-16 Thread Linus Torvalds


On Wed, 16 May 2007, Christoph Lameter wrote:

 On Wed, 16 May 2007, Michal Piotrowski wrote:
 
  Memory management
  
  Subject: kernel BUG at include/linux/slub_def.h:88 kmalloc_index()
  References : http://bugzilla.kernel.org/show_bug.cgi?id=8476
  Submitter  : Cherwin R. Nooitmeer [EMAIL PROTECTED]
  Status : Unknown
 
 
 This a kmalloc(0) that needs fixing.

Well, needs fixing is a bit strong.

It's a scary message for something we've always handled, and that we still 
handle fine, we just complain about it.

So we'll probably just turn the message off for 2.6.22, but in the 
meantime, we leave it on and try to fix as many of these as we can be 
bothered to.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add suspend/resume for HPET

2007-03-31 Thread Linus Torvalds


On Sat, 31 Mar 2007, Thomas Gleixner wrote:
 
 While I agree in principle with the patch, I'm a bit uncomfortable. The
 sys device suspend / resume ordering is not guaranteed and relies on the
 registering order.

Well, this is why we probably should try to get away from the system 
device issue, exactly because system devices are totally outside the 
normal ordering and only have a random linear order.

If the clocksources were actually in the device tree, you'd get all the 
normal guarantees about hierarchical ordering..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add suspend/resume for HPET

2007-03-31 Thread Linus Torvalds


On Sat, 31 Mar 2007, Thomas Gleixner wrote:
 
 Right, but clock - sources/events need to be extremly late suspended and
 early resumed. How can we ensure this ?

Make them be at the top of the device tree by adding them early. That's 
the whole point of the device tree after all - we have an ordering that is 
enforced by its topology, and that we sort by when things were added.

Right now the way things work is (iirc - somebody like Greg should 
double-check me) is that we add all devices to the power list at 
device_add() time by traversing the devices fromt he root all the way out, 
and doing a device_add() which does a device_pm_add(), which in turn adds 
it to the power-management list - so that the list is always topologically 
sorted.

So the only thing that needs to be done is to make sure that we add the 
timer devices early during bootup - something we have to do *anyway*. If a 
device is added early in bootup, that automatically means that it will be 
suspended late, and resumed early - because we maintain that order all the 
way through..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add suspend/resume for HPET

2007-03-31 Thread Linus Torvalds


On Sat, 31 Mar 2007, Maxim Levitsky wrote:
 
 So maybe I was right afrer all,
 Maybe it is better to add a suspend/resume hook to each clock source and call 
 it from timekeeping_resume() ?

Umm.. WHy not make the device tree look like this:

-- clocksource -- +-- HPET
|
+-- TSC
|
+-- i8259
|
+-- lapic timer
|
.. whatever else

and use the struct device that we *have* for this? The whole struct 
device is literally designed to do this, and to be embedded into whatever 
bigger structures you have that describes higher-level behaviour. Ie you'd 
put a struct device inside the struct clocksource.

That thingalready *has* the suspend/resume hooks, and it will mean that 
people will see the clocks in the device tree rather than have a new 
notion.


Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions

2007-03-29 Thread Linus Torvalds


On Thu, 29 Mar 2007, Maxim wrote:

 On Thursday 29 March 2007 07:08:58 Linus Torvalds wrote:
  
  (Or, better yet, shouldn't we set boot_hpet_disable when we decide not 
  to use the HPET, and set hpet_virt_address to NULL?)
 
 This is done here
 
 out_nohpet:
   iounmap(hpet_virt_address);
   hpet_virt_address = NULL;

No, that only clears hpet_virt_address, and thus makes all subsequent 
hpet_readl() and hpet_writel() calls oops.

But it doesn't actually *tell* anybody that the HPET is disabled, so if 
you later on do

if (is_hpet_capable()) {
time = hpet_readl(..);
..

you will just Oops!

So as far as I can see, even with your latest patch, if hpet_enable() 
fails (and triggers the goto out_nohpet cases), you'll just oops 
immediately when you try to suspend/resume the HPET.

THAT was what I meant - when we clear hpet_virt_address, we should also 
tell all potential subsequent users that the HPET is not there!

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Add suspend/resume for HPET

2007-03-29 Thread Linus Torvalds


On Thu, 29 Mar 2007, Maxim Levitsky wrote:

 Subject: Add suspend/resume for HPET

 This adds support of suspend/resume on i386 for HPET

 Signed-off-by: Maxim Levitsky [EMAIL PROTECTED]
 
 ---
  arch/i386/kernel/hpet.c |   68 
 +++

Btw, what about arch/x86_64/kernel/hpet.c?

That thing seems totally broken. Lookie here:

  arch/x86_64/kernel/hpet.c:irqreturn_t hpet_rtc_interrupt(int irq, void 
*dev_id, struct pt_regs *regs)
  drivers/char/rtc.c:extern irqreturn_t hpet_rtc_interrupt(int irq, void 
*dev_id);

anybody see a problem? The x86-64 version doesn't seem to be very well 
maintained. Is there some fundamental reason why this file isn't shared 
across architectures?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions

2007-03-29 Thread Linus Torvalds


On Thu, 29 Mar 2007, Maxim Levitsky wrote:

   I agree, and as you said I did exactly that:

Gaah. I'm blind. Sorry. Your patch did indeed do exactly that, I somehow 
overlooked it.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] libata fixes

2007-03-28 Thread Linus Torvalds


On Wed, 28 Mar 2007, Jeff Garzik wrote:

 Jeff Garzik wrote:
  This disables libata ACPI, among other things.
 
 If a -rc6 is possible, that would be quite nice...

Heh. I don't think -rc6 is possible - it's inevitable. We have too 
much fallout from the timer changes still outstanding. It looks people 
finally figured out a big issue with the HPET timer, and that hopefully 
resolves most of the remaining timer-related regressions, but yes, we most 
definitely _will_ have an -rc6.

Andrew, what's your feeling apart from the timer fallout?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/6] 2.6.21-rc4: known regressions

2007-03-28 Thread Linus Torvalds


On Wed, 28 Mar 2007, Maxim wrote:
 
   Now I don't have a clue how to set those bits if only HPET is used as 
 clock source because now clocksources
   don't have _any_ resume hook.

One thing that drives me wild about that clocksource resume thing is 
that it seems to think that clocksources are somehow different from any 
other system devices..

Why isn't the HPET considered a device, and has it's own *device* 
suspend and resume? Why do we seem to think that only set_mode() 
etc should wake up clock sources?

It's a *device*, dammit. It should save and resume like one (probably as a 
system device). The set_mode() etc stuff is at a completely different 
(higher) conceptual level.

Thomas? It does seem like Maxim has hit the nail on the head (at least 
partly) on the HPET timer resume problems..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [3/6] 2.6.21-rc4: known regressions

2007-03-28 Thread Linus Torvalds


On Wed, 28 Mar 2007, David Brownell wrote:

 On Wednesday 28 March 2007 9:38 am, Linus Torvalds wrote:
 
  It's a *device*, dammit. It should save and resume like one (probably as a 
  system device). The set_mode() etc stuff is at a completely different 
  (higher) conceptual level.
 
 Agreed, except about probably as a system device.
 
 Last I checked, there was no good reason to use sysdev suspend()/resume()
 rather than platform_device suspend_late()/early_resume().  Which more
 or less means no good reason to use sysdev in new code...

I won't disagree - it might well be much nicer to just show it in the 
real device tree. I'm not 100% sure where in the tree it would go, 
though. It should probably be inside the root entry, before any of the 
PCI buses. It's generally what we've used those system device things 
for, but I agree that it would be better to just make system devices show 
up early on the regular device list than it is to have them be special 
cases.

Bit I think that's a separate (and fairly small) issue compared to the 
don't use the clocksource infrastructure as a make-believe suspend/resume 
mechanism problem that Maxim's patch had.

(Maxim, don't take that the wrong way - I think your analysis and patch 
were great, I just think another organization would be better)

 Also, making HPET use the legacy mode seems like a step backwards.

I don't think that's actually legacy in any sense but the interrupt 
delivery, where the legacy mode bit is not so much that the HPET itself 
is legacy but that it *replaces* legacy devices.

But I may have misunderstood the thing. I'm an old fart, so I know the old 
timers much better than I know the new ones ;). Somebody feel free to hit 
me with the clue-2x4.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions

2007-03-28 Thread Linus Torvalds


On Thu, 29 Mar 2007, Maxim wrote:

   I am sending here a patch that as was discussed here adds hpet to list 
 of system devices
   and adds suspend/resume hooks this way.
   I tested it and it works fine.

Ok, it certainly looks better, but it *also* looks like it just assumes 
the HPET is there. Which would work in testing _with_ a HPET, but would 
likely break on hardware without one, no?

Shouldn't there be at least something like a

if (!is_hpet_capable())
return 0;

at the top of that init routine? I'd also expect that you'd need to check 
that hpet_virt_address is valid or something?

(Or, better yet, shouldn't we set boot_hpet_disable when we decide not 
to use the HPET, and set hpet_virt_address to NULL?)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ATA ACPI (was Re: Linux 2.6.21-rc5)

2007-03-27 Thread Linus Torvalds


On Tue, 27 Mar 2007, Jeff Garzik wrote:
 
 FWIW, I'm still leaning towards disabling libata ACPI support by default for
 2.6.21.

Hey, I'm not going to argue against anything that says disable ACPI. Of 
*course* it should be disabled if there aren't thousands of machines that 
are in user hands that actually need it (and none that regress).

Anybody want to send me a patch?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] IDE fixes

2007-03-17 Thread Linus Torvalds


On Sat, 17 Mar 2007, Bartlomiej Zolnierkiewicz wrote:
 
 to receive the following updates:
 
  b/drivers/ide/Kconfig   |   48 -
  b/drivers/ide/Makefile  |1 
  b/drivers/ide/arm/icside.c  |   13 
  b/drivers/ide/ide-dma.c |2 
  b/drivers/ide/ide.c |4 
  b/drivers/ide/mips/au1xxx-ide.c |3 
  b/drivers/ide/pci/Makefile  |1 
  b/drivers/ide/pci/cmd64x.c  |   45 -
  b/drivers/ide/pci/jmicron.c |   29 
  b/drivers/ide/pci/scc_pata.c|  858 
 
  b/drivers/ide/setup-pci.c   |5 
  b/include/asm-mips/mach-au1x00/au1xxx_ide.h |   34 -
  drivers/ide/ppc/scc_pata.c  |  858 
 
  13 files changed, 915 insertions(+), 986 deletions(-)

Please use

git diff -M --stat --summary 

to generate the diffstat (where the -M is the important part). Your 
scc_pata.c change was a pure rename, but because you didn't ask for 
rename detection, it got shown as a file create/delete pair. It *should* 
have looked like this:

 drivers/ide/Kconfig   |   48 +
 drivers/ide/Makefile  |1 -
 drivers/ide/arm/icside.c  |   13 +---
 drivers/ide/ide-dma.c |2 +-
 drivers/ide/ide.c |4 --
 drivers/ide/mips/au1xxx-ide.c |3 +-
 drivers/ide/pci/Makefile  |1 +
 drivers/ide/pci/cmd64x.c  |   45 ---
 drivers/ide/pci/jmicron.c |   29 ++---
 drivers/ide/{ppc = pci}/scc_pata.c   |0
 drivers/ide/setup-pci.c   |5 ---
 include/asm-mips/mach-au1x00/au1xxx_ide.h |   34 
 12 files changed, 57 insertions(+), 128 deletions(-)
 rename drivers/ide/{ppc = pci}/scc_pata.c (100%)

which is a lot more accurate than what you get by feeding a diff through 
diffstat that has no clue about renames (look at the summary for number 
of lines changed ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [git patches] libata fixes

2007-03-11 Thread Linus Torvalds


On Sun, 11 Mar 2007, Paul Rolland wrote:
 
 Nope... I tried several patches from Tejun, and also some that Jeff posted
 to linux-ide, but no luck. The only way to have this DVD-RW working is to
 use irqpoll on the command line...

So it has *never* worked? That's what I'm trying to see - you had a 
before and after dmesg in one of your posts, and the before one 
looked fine (as if it was working) because it didn't have the error 
messages.

So I'm just trying to figure out where the regression started...

 To complete, here are some more output from the machine : 

What happens if you disable MSI entirely? Use pci=nomsi on the command 
line.

The

ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x104)

messages happen for you on the controller that claims MSI:

ata2: SATA max UDMA/133 cmd 0xc208a980 ctl 0x 
bmdma 0x irq 504

and quite frankly, we've had lots of bugs with MSI, both in hardware and 
in software.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [git patches] libata fixes

2007-03-11 Thread Linus Torvalds


On Sun, 11 Mar 2007, Paul Rolland wrote:

 My machine is having two problems : the one you are describing above,
 which is due to a SIL controler being connected to one port of the ICH7
 (at least, it seems to), and probing it goes  timeout, but nothing is
 connected on it.

Ok, so that's just a message irritation, not actually bothersome 
otherwise?

 The second problem is a Jmicron363 controler that is failing to detect
 the DVD-RW that is connected, unless I use the irqpoll option as Tejun has
 suggested.

.. and this one has never worked without irqpoll?

 But, as you suggest it, I'm adding pci=nomsi to the command line
 rebooting... no change for this part of the problem.
 
 OK, the /proc/interrupt for this config, and the dmesg attached.
 
 3 [23:22] [EMAIL PROTECTED]:~ cat /proc/interrupts 
CPU0   CPU1   
   0: 297549  0   IO-APIC-edge  timer
   1:  7  0   IO-APIC-edge  i8042
   4: 13  0   IO-APIC-edge  serial
   6:  5  0   IO-APIC-edge  floppy
   8:  1  0   IO-APIC-edge  rtc
   9:  0  0   IO-APIC-fasteoi   acpi
  12:126  0   IO-APIC-edge  i8042
  14:   8313  0   IO-APIC-edge  libata
  15:  0  0   IO-APIC-edge  libata
  16:  0  0   IO-APIC-fasteoi   eth1, libata

So it's the irq16 one that is the Jmicron controller and just isn't 
getting any interrupts?

Since all the other interrupts work (and MSI worked for other 
controllers), I don't think it's interrupt-routing related. Especially as 
MSI shouldn't even care about things like that.

And since it all works when irqpoll is used, that implies that the 
*only* thing that is broken is literally irq delivery.

Is there possibly some jmicron-specific enable interrupts bit? 

 PS : I'd like to try 2.6.21-rc3, but it seems that this is breaking my
 config : disk naming is no more the same, and I end up with a panic
 Warning: unable to open an initial console
 though i've been compiling with the same .config I was using for 2.6.21-rc2

Gaah. Can you get a log through serial console or netconsole to see what 
changed?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: and try remove another quirk on this computers Re: [3/6] 2.6.21-rc2: known regressions

2007-03-09 Thread Linus Torvalds


On Sat, 10 Mar 2007, Sergio Monteiro Basto wrote:
 
 With this quirk I got this oops on hibernate (but computer still
 working) 

Well, strictly speaking it's a warning, not an oops per se. 

What happens is that the quirk wants to do an ioremap_nocache(), which 
allocates memory, and that happens very early during initialization when 
interrupts are disabled.

And you're really not supposed to allocate memory, except using 
GFP_ATOMIC. But we've always been lax about that during early boot, so we 
have stuff that does. And resume ends up doing a lot of the same things 
early boot does, and shows issues like this.

So the quirk is probably still a good idea, and the warning message is 
just that - a very scary warning message, but not an indicator that 
anything is seriously screwed up for you.

(It is an indication of a real bug, though, even though it's harmless in 
practice in this case)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libata: Cable detection fixes

2007-03-01 Thread Linus Torvalds


On Thu, 1 Mar 2007, Alan wrote:

 The patch switches the identify order so that we can do the drive side
 detection correctly. 
 
 Secondly we add a -cable_detect() method called after the identify
 sequence which allows a host to do host side detection at this point
 should it wish, or to modify the results of the drive side identify.

Alan, sign-offs?

Jeff, should I just expect to get these things through you?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] libata updates (mostly fixes)

2007-02-15 Thread Linus Torvalds


On Thu, 15 Feb 2007, Jeff Garzik wrote:
 
 diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
 index db185f3..d51f0f1 100644
 --- a/arch/ia64/Kconfig
 +++ b/arch/ia64/Kconfig
 @@ -22,6 +22,7 @@ config IA64
  
  config 64BIT
   bool
 + select ATA_NONSTANDARD if ATA
   default y

Ok, this is just _strange_.

Tying ATA_NONSTANDARD into ia64 by tying it to the 64BIT config variable 
may work (well, I _assume_ it does), but it's just psychedelic.

How about adding a separate config entry like

config IA64_ATA
bool
depends on ATA
select ATA_NONSTANDARD
default y

which kind of makes sense when you squint just the right way..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] libata updates (mostly fixes)

2007-02-15 Thread Linus Torvalds


On Thu, 15 Feb 2007, Linus Torvalds wrote:
 
 Ok, this is just _strange_.

Btw, I did pull, but I still think we shouldn't do those kinds of strange 
Kconfig file games.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

2007-01-29 Thread Linus Torvalds


On Mon, 29 Jan 2007, Mike Galbraith wrote:
 
 The extremely unlikely winner is:
 
 29b08d2bae854f66d3cfd5f57aaf2e7c2c7fce32 is first bad commit

Yeah, that's not going to be it. You probably had a bad kernel there 
somewhere that you called good. 

Git bisect is wonderful for figuring out (reasonably quickly) where 
problems are, but exactly because it zeroes in on the bug so quickly, if 
you give it faulty data, it zeroes in on something *else* very quickly and 
without any kind of sense. 

There's just no redundancy there, so one small bit error in the input, and 
you get a totally wrong end result ;)

I'm trying to narrow it down myself. It all *seemed* to work with the 
commit I suspected initially (the support larger block pc requests one).

But yes, I haven't actually tried to burn anything either. I also just 
started up nero and looked whether I'd get the error messages in dmesg.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

2007-01-29 Thread Linus Torvalds


On Mon, 29 Jan 2007, Linus Torvalds wrote:
 
 That said, I'm making progress with my bisection. 16 revisions left to 
 test after this, and three of those sixteen are
 
   Remove unnecessary blk_queue_bounce in SG_IO
   fix SG_IO bio leak
   remove blk_queue_activity_fn

I've now bisected to the point where those are the only commits left, so 
it's definitely one of the three.

Two more reboots and I should know exactly which one broke nero.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

2007-01-29 Thread Linus Torvalds


On Mon, 29 Jan 2007, Linus Torvalds wrote:
 
 Two more reboots and I should know exactly which one broke nero.

This one.

However, the scary thing is that I think the patch really is correct, and 
I wonder if nero has some strange work-around for an older bug.. Although 
I don't see how you could even have that, since afaik, the behaviour 
before the fix was literally just a leak that a user process shouldn't be 
able to see.

Very strange. Will add some debugging printk's.

Linus


commit 77d172ce2719b5ad2dc0637452c8871d9cba344c
Author: FUJITA Tomonori [EMAIL PROTECTED]
Date:   Mon Dec 11 10:01:34 2006 +0100

[PATCH] fix SG_IO bio leak

This patch fixes bio leaks in SG_IO. rq-bio can be changed after io
completion, so we need to reset rq-bio before calling blk_rq_unmap_user()

http://marc.theaimsgroup.com/?l=linux-kernelm=116570666807983w=2

Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
Signed-off-by: Jens Axboe [EMAIL PROTECTED]
---
 block/scsi_ioctl.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index b3e2107..045cabd 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -228,6 +228,7 @@ static int sg_io(struct file *file, request_queue_t *q,
struct request *rq;
char sense[SCSI_SENSE_BUFFERSIZE];
unsigned char cmd[BLK_MAX_CDB];
+   struct bio *bio;
 
if (hdr-interface_id != 'S')
return -EINVAL;
@@ -308,6 +309,7 @@ static int sg_io(struct file *file, request_queue_t *q,
if (ret)
goto out;
 
+   bio = rq-bio;
rq-retries = 0;
 
start_time = jiffies;
@@ -338,6 +340,7 @@ static int sg_io(struct file *file, request_queue_t *q,
hdr-sb_len_wr = len;
}
 
+   rq-bio = bio;
if (blk_rq_unmap_user(rq))
ret = -EFAULT;
 
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

2007-01-29 Thread Linus Torvalds

Uwe, others, does this patch fix your problem?

It will have a few printk's that it spews out, but if it fixes your 
problem, at least we know a bit more.

Linus
---
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 2528a0c..f0ff151 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,8 +333,13 @@ static int sg_io(struct file *file, request_queue_t *q,
hdr-sb_len_wr = len;
}
 
-   if (blk_rq_unmap_user(bio))
+   if (rq-bio != bio)
+   printk(rq-bio = %p, bio = %p\n, rq-bio, bio);
+
+   if (blk_rq_unmap_user(rq-bio)) {
+   printk(blk_rq_unmap_user failed!\n);
ret = -EFAULT;
+   }
 
/* may not have succeeded, but output values written to control
 * structure (struct sg_io_hdr).  */
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

2007-01-29 Thread Linus Torvalds


On Mon, 29 Jan 2007, Mike Christie wrote:
 
 rq-bio is NULL here, so no data is coped back to userspace and it seems
 nero just stops trying to talk to the drive after this.

Well, except that's what we used to do in 2.6.19 too. So what changed?

 Because nero just gives up, no more commands are sent and we do not get
 flooded with status errors like before so it sort of looks like it
 solves the problem but it doesn't - at least that is what is happening here.

Yeah, I can't burn with Nero either, although I'll also say that I've 
never even tried before, so my leet Nero skillz are nonexistant.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

2007-01-29 Thread Linus Torvalds


On Mon, 29 Jan 2007, Mike Christie wrote:
 
 Ok. here is a fix with the overflow check sg.c has. Patch
 was made against Linus's tree and tested with nero.
 
 Userspace does not send us jiffies. Use msecs_to_jiffies
 and check for overflow like sg.c
 
 Signed-off-by: Mike Christie [EMAIL PROTECTED]

Thanks. Fixed the ',' that Andrew pointed out, committed and pushed out.

What a stupid bug. And I even _looked_ at that patch, since there weren't 
that many that changed anything in this area, and it looked just subtly 
right enough that no warning bells ever went off.

Thanks to everybody, but especially Mike for showing us the errors of our 
ways.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

2007-01-28 Thread Linus Torvalds


[ Added Jeff, Jens and Mike Christie to Cc. I would _guess_ this is 
  associated with the larger block pc request stuff: Mike, Jens? James B 
  added for good luck.

  It apparently started happening somewhere between 2.6.19 and 2.6.20-rc2, 
  and doing a

gitk v2.6.19..v2.6.20-rc2 block/scsi_ioctl.c drivers/ide/

  I don't see anything else that really looks all that suspicious.. Unless 
  maybe it's that Fix SG_IO leak. Jeff added because of the hddtemp 
  issue, but I think that was effectively SATA-only, so probably isn't 
  relevant.

  Damn, I just realized that Jens is in the middle of his vacation for a 
  week..

  Mike, can you please look at this and check? ]

On Mon, 29 Jan 2007, Mike Galbraith wrote:
 
 FWIW, I just tried it with 2.6.20-rc6, and can confirm.  Once nero is
 run, the kernel never gives up retrying whatever command failed, so I
 get...
 
 [ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete 
 DataRequest }
 [ 4362.981475] ide: failed opcode was: unknown
 [ 4362.986183] hdd: drive not ready for command

Ok, I tried the demo version you can download at

http://www.nero.com/eng/nerolinux-prog.html

and yes, nerolinux seems broken. I've never used it before, but it 
triggers:

hda: irq timeout: status=0xc0 { Busy }
ide: failed opcode was: unknown

and after that there is indeed an endless stream of:

hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command

Which eventially switches to

hda: status error: status=0x59 { DriveReady SeekComplete DataRequest 
Error }
hda: status error: error=0x40 { LastFailedSense=0x04 }
ide: failed opcode was: unknown
hda: drive not ready for command

However, it appears to be rather hard to debug, with nerolinux being some 
closed black box. Does anybody who knows nero know if there is some way to 
get debug information out of it to see what it tried to do?

Can somebody try to bisect this?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

2007-01-28 Thread Linus Torvalds


On Mon, 29 Jan 2007, Mike Galbraith wrote:
 
 Unfortunately, I'm git impaired.  I am rummaging as we speak though.

Ok, I'm personally heading to bed, but it rally should be as simple as

 - get the git tree in the first place
 - do

git bisect good v2.6.19
git bisect bad v2.6.20-rc2
.. it will pick a point for you to try ..
.. compile, boot, test ..

git bisect {good|bad} depending on results

 - until (found)

(Of course, you should check that -rc2 really is bad to make sure. I think 
that's what Uwe reported, though. And I don't think we've done anything 
after -rc2 that could impact this, so I don't doubt it).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary Intel SATA problem: frozen

2006-11-29 Thread Linus Torvalds


On Wed, 29 Nov 2006, Tejun Heo wrote:
 
 You pushed your box really hard and the kernel can't get the memory it wants.
 Not really relevant to SATA problem.

And it's not even really a bug - the caller is supposed to be ok with it. 
It's a warning message that the kernel spits out just because we've had 
problems in the past with callers that did _not_ handle an allocation 
error gracefully, so the warnign is spit out to (a) let us know something 
happened and (b) if there's a subsequent oops due to dereferencing a NULL 
pointer, it becomes easier to pinpoint what the sequence of events was.

So it's an atomic allocation that happens on the receive path in the 
network when you've run out of pages (because you're getting enough 
network traffic that earlier receives have used up all buffers, and so 
much disk IO that we haven't had time to clean any new pages yet), and 
getting an allocation failure there really is normal, it's just very 
unusual.

So that particular dump _looks_ scary, but it happens to be totally a 
non-issue unless something else happens afterwards to imply that the 
caller had trouble with the allocation failure.

It's also a sign of trouble if you can trigger it _easily_. It should be 
something that only triggers under very high load and under unusual 
circumstances.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Scary Intel SATA errors..

2006-11-28 Thread Linus Torvalds

Jeff, what does this mean:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x21)
ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1: soft resetting port
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1: soft resetting port
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1: soft resetting port
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1.00: disabled
ata1: EH complete

followed by various IO errors:

sd 0:0:0:0: SCSI error: return code = 0x0004
end_request: I/O error, dev sda, sector 335093
sd 0:0:0:0: SCSI error: return code = 0x0004
end_request: I/O error, dev sda, sector 896217445
Buffer I/O error on device dm-0, logical block 112001027
lost page write due to I/O error on dm-0
..

nasty, nasty, nasty.

This is with ata_piix on a Intel i965 motherboard (everything Intel)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Scary Intel SATA problem: frozen

2006-11-28 Thread Linus Torvalds

[ You may or may not have gotten my previous email. The kernel stayed 
  working, but due to the IO errors the filesystem got re-mounted 
  read-only, and I'm not sure that the email I sent out in that state 
  actually ever made it out. I suspect it didn't. ]

Jeff,
 I just had a scary thing on my nice new Intel i965 box (all Intel 
chipsets apart from some strange Marvell IDE interface that I'm not using 
and that no driver even detected, and a TI firewire thing that I'm 
similarly not using).

The machine basically froze for about a minute or so (well, things worked 
surprisingly well, considering that apparently no disk IO happened - I 
initially thought it was just firefox that had frozen up, since my mail 
session seemed to be fine), and after it came back the filesystem was 
mounted read-only and nothing really worked any more..

I have no idea what status 0xD0 means: it looks like ATA_BUSY + ATA_DRDY + 
bit#4, but what is bit#4?

And clearly, the soft-reset isn't doing squat.

Ideas?

Linus


Boot-time messages:

libata version 2.00 loaded.
..
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with 
idebus=xx
Probing IDE interface ide0...
Probing IDE interface ide1...
ide-floppy driver 0.99.newide
ata_piix :00:1f.2: version 2.00ac6
ata_piix :00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt :00:1f.2[A] - GSI 19 (level, low) - IRQ 19
PCI: Setting latency timer of device :00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19
ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19
scsi0 : ata_piix
ata1.00: ATA-7, max UDMA/133, 976773168 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : ata_piix
ata2.00: ATAPI, max UDMA/66
ata2.00: configured for UDMA/66
scsi 0:0:0:0: Direct-Access ATA  WDC WD5000YS-01M 07.0 PQ: 0 
ANSI: 5
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2
sd 0:0:0:0: Attached scsi disk sda
sd 0:0:0:0: Attached scsi generic sg0 type 0
scsi 1:0:0:0: CD-ROMPLEXTOR  DVDR   PX-755A   1.04 PQ: 0 
ANSI: 5
sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 1:0:0:0: Attached scsi CD-ROM sr0
sr 1:0:0:0: Attached scsi generic sg1 type 5
ata_piix :00:1f.5: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt :00:1f.5[A] - GSI 19 (level, low) - IRQ 19
PCI: Setting latency timer of device :00:1f.5 to 64
ata3: SATA max UDMA/133 cmd 0x2138 ctl 0x2176 bmdma 0x20F0 irq 19
ata4: SATA max UDMA/133 cmd 0x2130 ctl 0x2172 bmdma 0x20F8 irq 19
scsi2 : ata_piix
ATA: abnormal status 0x7F on port 0x213F
scsi3 : ata_piix
ATA: abnormal status 0x7F on port 0x2137

Problem starts:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x21)
ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1: soft resetting port
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1: soft resetting port
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal status 0xD0 on port 0x214F
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1: soft resetting port
ATA: abnormal status 0xD0 on port 0x214F
ATA: abnormal 

Re: Scary Intel SATA problem: frozen

2006-11-28 Thread Linus Torvalds


On Tue, 28 Nov 2006, Alan wrote:

 On Tue, 28 Nov 2006 09:31:51 -0800 (PST)
 Linus Torvalds [EMAIL PROTECTED] wrote:
 
   I just had a scary thing on my nice new Intel i965 box (all Intel 
  chipsets apart from some strange Marvell IDE interface that I'm not using 
  and that no driver even detected, and a TI firewire thing that I'm 
 
 Mr Morton has the Marvell libata driver in his tree waiting to head your
 way.

Well, I don't actually personally want it (I have nothing connected to it, 
nor any intention of connecting anything in the future), I just want my 
bog-standard PIIX driver to not do the scary things to me.

Mommy, mommy, the IDE messages/behaviour is scaring me!

I just mentioned the Marvell chip because apart from those two (unused) 
chips, the box is absolutely and utterly bog-standard Intel-everything. 
The i965 may still be somewhat unusual right now, but that's going to 
change, and if there's something strange going on, we should try to fix it 
asap.

It could be a one-off thing (knock wood), but on the other hand, I've only 
been using this machine for a couple of weeks now, and I can't remember 
seeing anything even remotely similar on my other machines (including the 
earlier-generation i945 SATA setup that I've had a lot longer). So I worry 
that it's something i965-specific, and that will be a _very_ common 
chipset soon enough.

One data-point that may or may not be relevant: the afore-mentioned i945 
machine that I've had longer is otherwise reasonably similar, but the DVD 
drive on that one is in legacy mode. Not that I see why it should matter 
(the problem happened on the harddisk, not the DVD)...

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary Intel SATA problem: frozen

2006-11-28 Thread Linus Torvalds


On Tue, 28 Nov 2006, Jeff Garzik wrote:
 
 Does jgarzik/libata-dev.git#upstream (don't pull, just test) work for you?

Well, since I can't really test, I don't know. This problem has happened 
just once in the couple of weeks I've used that machine, and I wasn't even 
doing anything strange when it triggered (no heavy IO, no special 
programs, no nothing - I was literally just reading email and I think 
trying to browse over to news.com or something..)

So I was more hoping that you'd say that it's a known issue, and already 
fixed, or that the status bits would give you some clue and make you say 
Ahh, we don't handle that case. I have nothing to test. The thing 
seems to work, and I have no known way to trigger the problem...

 I'm pretty sure this is already fixed, by the polling IDENTIFY for ata_piix
 patchset.

Hmm. That sounds like it should just affect the bootup identification, 
which has always worked fine for me. Would it fix the softreset too?

Anyway, I can certainly try yout current upstream branch, but as 
mentioned, the standard kernel works fine for me generally, so I don't 
really know what I can offer (except if upstream simply doesn't work at 
all, in which case I'll certainly let you know ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary Intel SATA problem: frozen

2006-11-28 Thread Linus Torvalds


On Tue, 28 Nov 2006, Jonas Lundgren wrote:
 
 Example of what I mean with crappy performance:
 dd if=/dev/zero of=test232 bs=1M count=100; time sync
 100+0 records in
 100+0 records out
 104857600 bytes (105 MB) copied, 0.130424 s, 804 MB/s
 real 0m21.104s

Ok, that's definitely not the same thing I see.

I get

real0m2.673s

for the time sync part of your example, so the chipset can definitely do 
better than your 4-5 MB/s.

And your read performance seems fine. Strange.

I suspect it's related to your RAID usage. I've only got a single disk in 
my system. Maybe there is something problematic in sending commands to 
alternating SATA ports on the same controller with the i965 thing?

The switching between SATA ports thing migt actually be a clue, because 
while I've had this thing for a few weeks, I only used the DVD drive for 
the first time the day before yesterday, and didn't actually even have the 
SCSI CD-ROM support compiled in until then (copied a config from another 
machine that had the DVD-rom on the legacy side, so it used the more 
common IDE-CD thing).

So maybe these _are_ related somehow, and my problem showed up because I 
actually had concurrent access to my DVD drive (some KDE media daemon 
checking to see if I inserted a music CD or something?). Jeff, Tejun, is 
there any reason to believe that the two channels on a PIIX ata controller 
are somehow tied together and it could be problematic for concurrent 
accesses?

Jonas definitely has the same error messages:

 Dmesg output from the error(s): (sda and sdb are 2 * 74GB raptor SATA
 drives in a Linux software raid0)
 
 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata1.00: (BMDMA stat 0x20)
 ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
 ata1: port is slow to respond, please be patient
 ata1: port failed to respond (30 secs)
 ata1: soft resetting port
 ATA: abnormal status 0xD0 on port 0xFA07
 ATA: abnormal status 0xD0 on port 0xFA07
 ATA: abnormal status 0xD0 on port 0xFA07
 ATA: abnormal status 0xD0 on port 0xFA07
 ATA: abnormal status 0xD0 on port 0xFA07
 ATA: abnormal status 0xD0 on port 0xFA07

That all looks exactly like mine did.

Except:

 ata1: EH complete
 SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB)
 sda: Write Protect is off
 sda: Mode Sense: 00 3a 00 00
 SCSI device sda: drive cache: write back

Jonas' disks came back.

So while Jonas' behaviour/problems otherwise don't seem to match mine at 
all, there might be some underlying commonality..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary Intel SATA problem: frozen

2006-11-28 Thread Linus Torvalds


On Tue, 28 Nov 2006, Jeff Garzik wrote:
 
 I was sorta wondering in that direction too.  If its in legacy mode (PATA and
 SATA smushed together), that's a possibility.  But native or AHCI modes, the
 channels are pretty independent (which is the nature of SATA).

Well, what I was more wondering about is whether perhaps the legacy mode 
emulation - even when it isn't actually used - means that there is simply 
some shared state (read: chipset bug that nobody noticed).

 Historical note:  ata_piix is IMO more complicated than ahci, because the
 silicon is emulating the PATA interface using an internal (probably huge)
 state machine, converting PATA behavior to sending/receiving SATA packets.

Well, there's bound to be the same big state machine working the other 
way, and maybe the chip simply internally gets confused. Or, as you say, 
simply because the emulation state machinery has to be taken into account, 
and _that_ ends up beign shared between the two otherwise independent 
channels..

How hard would it be to just force a shared spinlock between two sata 
channels on the same controller? It sounds like Jonas has a very 
repeatable setup, so even if I can't repeat my problem, if the performance 
degradation on writes is related, he can check his thing..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary Intel SATA problem: frozen

2006-11-28 Thread Linus Torvalds


On Tue, 28 Nov 2006, Jeff Garzik wrote:
 
 ap-host (struct ata_host) already has a spinlock for precisely just that...
 :)

Right, but do we actually take it? I'm not seing any spin_lock's in 
ata_piix.c, but I don't know the SATA layers enough to say whether upper 
layers take it or not..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [another PATCH] Fix crash on boot in kmalloc_node IDE changes

2005-07-07 Thread Linus Torvalds


On Thu, 7 Jul 2005, Christoph Lameter wrote:
 
 Here is IMHO the right way to fix this. Test for the hwif != NULL and
 test for pci_dev != NULL before determining the node number of the pci 
 bus that the device is connected to.

I think this is pretty unreadable.

If you make it use a trivial inline function for the thing, I think that 
would be ok, though.

Complex expressions are bad.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] IDE update

2005-07-05 Thread Linus Torvalds


On Tue, 5 Jul 2005, Jens Axboe wrote:
 
 Looks interesting, 2.6 spends oodles of times copying to user space.
 Lets check if raw reads perform ok, please try and time this app in 2.4
 and 2.6 as well.

I think it's just that 2.4.x used to allow longer command queues. I think
MAX_NR_REQUESTS is 1024 in 2.4.x, and just 128 in 2.6.x or something like
that.

Also, the congestion thresholds are questionable: we consider a queue
congested if it is within 12% of full, but then we consider it uncongested
whenever it falls to within 18% of full, which I bet means that for some
streaming loads we have just a 6% window that we keep adding new
requests to (we wait when we're almost full, but then we start adding
requests again when we're _still_ almost full). Jens, we talked about this
long ago, but I don't think we ever did any timings.

Making things worse, things like this are only visible on stupid hardware
that has long latencies to get started (many SCSI controllers used to have
horrid latencies), so you'll never even see any difference on a lot of 
hardware.

It's probably worth testing with a bigger request limit. I forget what the 
/proc interfaces are (and am too lazy to look it up), Jens can tell us ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html