Re: [PATCH] Prevent IDE boot ops on NUMA system in mainline
On Mon, 11 Feb 2008, Linus Torvalds wrote: So we should probably make pcibus_to_node() be an inline function for that case Or, we could just do the ugliest patch ever, namely -#define pcibus_to_node(node) (-1) +#define pcibus_to_node(node) ((int)(long)(node),-1) Wow. It's so ugly it's almost wraps around and comes out the other sides and looks pretty. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Prevent IDE boot ops on NUMA system in mainline
On Mon, 11 Feb 2008, Andi Kleen wrote: Without this patch a Opteron test system here oopses at boot with currentg git. Calling to_pci_dev() on a NULL pointer gives a negative value so the following NULL pointer check never triggers and then an illegal address is referenced. Check the unadjusted original device pointer for NULL instead. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux/include/linux/ide.h === --- linux.orig/include/linux/ide.h +++ linux/include/linux/ide.h @@ -1294,7 +1294,7 @@ static inline void ide_dump_identify(u8 static inline int hwif_to_node(ide_hwif_t *hwif) { struct pci_dev *dev = to_pci_dev(hwif-dev); - return dev ? pcibus_to_node(dev-bus) : -1; + return hwif-dev ? pcibus_to_node(dev-bus) : -1; } Ok, I applied this, but it causes a *lot* of noise about unused variable 'dev' because pcibus_to_node() is defined to be -1 when you don't do any strange NUMA thing (so that dev-bus usage thing is never even seen by the compiler. So we should probably make pcibus_to_node() be an inline function for that case, or just make that thing be return hwif-dev ? pcibus_to_node(to_pci_dev(hwif-dev)-bus) : -1; or something. Because now things are really noisy. Preferences, anybody? Bartlomiej? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] IDE updates part #4
On Sat, 2 Feb 2008, Bartlomiej Zolnierkiewicz wrote: * next part of IDE probing code re-organization saga (that would be me) This seems to cause very irritating and bogus messages for me: Probing IDE interface ide0... Probing IDE interface ide1... ide2: I/O resource 0x0-0x7 not free. ide2: ports already in use, skipping probe ide3: I/O resource 0x0-0x7 not free. ide3: ports already in use, skipping probe ide4: I/O resource 0x0-0x7 not free. ide4: ports already in use, skipping probe ide5: I/O resource 0x0-0x7 not free. ide5: ports already in use, skipping probe ide6: I/O resource 0x0-0x7 not free. ide6: ports already in use, skipping probe ide7: I/O resource 0x0-0x7 not free. ide7: ports already in use, skipping probe ide8: I/O resource 0x0-0x7 not free. ide8: ports already in use, skipping probe ide9: I/O resource 0x0-0x7 not free. ide9: ports already in use, skipping probe and that's just totally bogus. It shouldn't even request that region, since it's not been allocated! So that ide_device_add_all() is missing some checks. Should it check the probe[] array like ideprobe_init() used to, or what? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug: get EXT3-fs error Allocating block in system zone
On Mon, 10 Dec 2007, Marco Gatti wrote: I didn't compile completly. drivers/scsi/scsi_lib.c:1565:1: error: unterminated #else Heh. That #else should be an #endif, of course. It is a bit strange that it still tries to do IO to high memory. Either the whole 64 bit capability thing in AHCI is broken, or the bounce buffering doesn't work right. Or maybe you tried the iommu=off without the original patch that tried to turn off 64-bit DMA? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 2.6.24-rc4
On Tue, 4 Dec 2007, Maciej Rutecki wrote: ata1: SATA max UDMA/133 irq_stat 0x00400040, connection status changed irq 220 ata2: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xe8585180 irq 220 ata3: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xe8585200 irq 220 ata4: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xe8585280 irq 220 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 failed (Emask=0x1 Stat=0x51 Err=0x04) ata1: failed to recover some devices, retrying in 5 secs ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 failed (Emask=0x1 Stat=0x51 Err=0x04) ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5) ata1.00: revalidation failed (errno=1) ata1: failed to recover some devices, retrying in 5 secs ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 dmesg, lspci, hdparm: http://www.unixy.pl/maciek/download/kernel/2.6.24-rc4/ Can you make a real bug report, which includes things like whether this ever worked without that timeout, and if so, when it last worked and what the dmesg was then? (Also, it's almost totally pointless to make me the primary To: target for something like this. Jeff Garzik hopefully picks this up from the linux-ide list, but I'm adding him to the Cc too) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] Add and use media change notification
On Sun, 4 Nov 2007, Jeff Garzik wrote: The end to CD-ROM polling... newer SATA ATAPI hardware will emit 'asynchronous notification' events when media is changed. This adds support. I *really* didn't want to pull this. Not only is it after the -rc1 period, but I also think you pushed this ina really offensive manner, and I don't like how you and James have made this whole thing personal. You guys need to sort it out, and there is no way you can blame James for all your troubles, since I've heard the same kind of complaints about every single maintainer out there (including you) about some driver or other infrastructure issue. So I'm unhappy about pulling this. That said, I did finally decide to just pull it. Partly because the concept apparently has been in -mm for a while anyway (even if the final patch has not - but the patch itself isn't that large, I'd worry more about thngs like certain hardware simply not doing things right), but mostly because I hate it when others hold up driver features, and I decided that in this case this really is something that is better done earlier rather than later, to get exposure as soon as possible. But I really think you need to lay off James, and help him rather than make every single complaint a big flame-war! Please, Jeff? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] libata fixes
On Tue, 30 Oct 2007, Jeff Garzik wrote: Mikael Pettersson (2): sata_promise: ASIC PRD table bug workaround, take 2 sata_promise: cleanups You and Mikael need to sort out the way you send/accept patches. Both of these commits had stuff like this: Signed-off-by: Mikael Pettersson [EMAIL PROTECTED] -- Changes since previous version: * use new PDC_MAX_PRD constant to initialise sg_tablesize drivers/ata/sata_promise.c | 87 ++--- 1 files changed, 83 insertions(+), 4 deletions(-) Signed-off-by: Jeff Garzik [EMAIL PROTECTED] which seems to be because Mikael uses two dashes instead of three to separate his real message from the stuff you have. So either you need to teach Mikael to use the proper separators, or you need to edit these things down to be something readable instead of keeping the extraneous commentary around... Pulled, but I'm hoping for cleaner commit messages in the future.. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] libata fixes
On Tue, 30 Oct 2007, Jeff Garzik wrote: Can we change git-am to accept two dashes as well as three? :) It seems pretty common, not just with Mikael but several others who send patches to me. Well, git-am actually used to be a lot less strict about the dashes, and we've made it *more* strict rather than less, because the more of these breaks we accept, the more likely it is that something that was intended to be part of the message gets thrown out.. So I'll say that I'm a bit nervous about extending it again. The reason for the three dashes is actually that that is what a *diff* starts with. So if you look at what closes a description as far as git-am is concerned, they are currently all things that are likely to start a patch: Index: or diff - or --- filename, and that last case was then extended to be manual break even without the filename information. See git/builtin-mailinfo.c: patchbreak(). But you could try to sell it to Junio. He's the maintainer, and while I care about some other things and will argue violently against them, when it comes to something like this, Junio is the guy to go to. That said, I really think you could just try to educate the people you work with. Maybe they just never even realized that three dashes is what you're supposed to use! Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression: commit ide: constify struct ide_port_info causes breakage
On Fri, 26 Oct 2007, Russell King wrote: commit 8562043606430185cad26d085d46adcc7ad67fd1 is broken, causing: CC drivers/ide/pci/cmd64x.o CC drivers/ide/pci/hpt366.o drivers/ide/pci/hpt366.c:1428: error: hpt366_chipsets causes a section type conflict and therefore should be reverted. The problem arises because hpt366 has other data marked with __devinitdata, so the compiler tries to define the initdata section with read-write attributes at one point, and read-only attributes when encountering the const-but-devinitdata declaration: Sad. This is not too uncommon, but yeah, it's really sad that we cannot use const (as a compiler type safety marker) together with section definitions (to get it put in special init sections). Bartlomiej - I don't think we need to revert the whole commit, but all those static const xyz __devinitdata = { .. things probably do need to have the const removed, even if it would be otherwise correct ;( Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/2] 2.6.23-rc3: known regressions with patches
On Wed, 29 Aug 2007, Michal Piotrowski wrote: Clocks time Subject : double hpet clocksource hard freeze References : http://lkml.org/lkml/2007/8/23/257 Last known good : ? Submitter : Paolo Ornati [EMAIL PROTECTED] Caused-By : Tony Luck [EMAIL PROTECTED] commit 0aa366f351d044703e25c8425e508170e80d83b1 Handled-By : John Stultz [EMAIL PROTECTED] Tony Luck [EMAIL PROTECTED] Patch : http://lkml.org/lkml/2007/8/23/285 Status : patch available Should be solved by 3b2b64fd311c92f2137eb7cee7025794cd854057, although differently from the the patch suggested above. So the regression should be gone a of -rc5. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pata_hpt37x: Fix 2.6.22 clock PLL regression
On Tue, 24 Jul 2007, Alan Cox wrote: Just one version of Linux ago The PLL code broke - oh no! But set the right mode And fix up the code Makes the PLL timing sync go Alan, I'm getting a bit worried about you. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches 1/2] warnings: attack valid cases spotted by warnings
On Wed, 18 Jul 2007, Roland Dreier wrote: BTW, I noticed one interesting thing while starting on this cleanup. I wanted to make sure that the generated code didn't change with the first step, and I actually discovered that the patch below seems to make the generated code *better*, maybe because some gcc alias analysis doesn't get as paranoid without the void *: Absolutely. The way to get pretty much any compiler to generate better code is: - code it simply - don't have tons of variables with overlapping lifetime - use limited-scope variables (ie don't have the variables at the outermost scope, declare them in the smallest scope you can) and trying to split things up into functions helps all of these. In fact, you can often get better code even when the functions aren't even inlined, because of the simpler code issue. Gcc sometimes tries to be too clever with its CSE etc, and generates really nasty complex code with lots of register spills, just because it keeps stuff live rather than just regenerating them. So inlining a function doesn't even make it faster, all the time. You want to inline when - the function is so small that the call is literally a big part of it, and it doesn't even need more than a couple of registers, so the calling convention has more register pressure than inlining the function itself would have. OR - the callers tend to have constant arguments that can benefit from constant folding inside the function but inlining in many other circumstances actually generates *worse* code and just makes debugging harder and the I$ footprint bigger. And here's the patch itself -- I think this is a reasonable size step to break things up into. I assume that this is the basic form of the cleanup that you're proposing? Yes, this looks good. Doing these kinds of things for the various other things is likely to make the code more readable, and as you already found out, the generated code doesn't tend to be worse either. It might not _always_ be a win size and performance-wise, but it can be, and so readability should generally always be the #1 goal, because quite often it actually does help the compiler too. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches 1/2] warnings: attack valid cases spotted by warnings
On Tue, 17 Jul 2007, Andrew Morton wrote: (rofl, look at that mess: it was utterly impractical, unrealistic and stupid for gcc to go and UTFify the compiler output. Sigh. LANG=C, guys). Yeah, I absolutely *detest* how gcc does idiotic quoting just because you happen to be in a UTF-8 locale. There's no reason for it what-so-ever, and I don't understand the mindset of whoever did that crap. They should use a nice ASCII tick-mark ('), not some fancy quotation marks. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches 1/2] warnings: attack valid cases spotted by warnings
On Tue, 17 Jul 2007, Roland Dreier wrote: So setting a variable to something meaningless (guaranteeing that a garbage value is used in case of a bug) just to shut up a warning makes no sense -- it's no safer than leaving the code as is. Wrong. It's safer for two reasons: - now everybody will see the *same* behaviour - the meaningless value is guaranteed to not be a security leak but the whole shut up bogus warnings is the best reason. So it *is* safer than leaving the code as-is. Of course, usually the best approach is to rewrite the code to be simpler, so that even gcc sees that something is obviously initialized. Sadly, people seldom do the right thing, and sometimes gcc just blows incredibly hard. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches 1/2] warnings: attack valid cases spotted by warnings
On Tue, 17 Jul 2007, Roland Dreier wrote: I think this patch (on top of the previous one) actually makes the code clearer Quite frankly, calling this making the code clearer is a bit ridiculous. That code still is absolute *crap* from a readability angle. It doesn't follow any sane coding standards, and certainly not the most important ones (keep the function small, don't have tons of local variables), and it has absolutely ridiculously ugly casts that get repeated over and over and over again. Quite frankly, I don't quite understand where you get those enormous balls you have, that you can then talk about how ugly it is to just add a = 0 that shuts up a compiler warning. That's the _least_ ugly part of the whole damn function! So rather than sending out that idiotic patch, look at that code for five seconds, and ponder whether it really needs to be that ugly. Here's a few things that you could *really* do to make it somewhat more readable: - make that whole switch() statement from hell another function entirely, and have it return the size of the thing, so that you don't need to have wqe += xxx size += xxx / 16; repeated fifty times (and so that it's also obvious that the always matches). - make each switch case actually call a small function with the argument cast to the right pointer type, so that you need *one* cast per case, rather than a handful. End result? More readable source code, with functions that are 20 lines long (or less), rather than 200 lines of spagetti-coding. And you know what? That's actually more important than 16 bytes of object code, although looking at the size of the infiniband code, I *seriously* doubt any infiniband person has ever cared about object code size in their life. That thing is not for weak machines or stomachs. The warnign (and fixing it up) is the _least_ of the problems in that code, methinks. Anyway, here's a totally untested cleanup that compiles but probably doesn't work, because I didn't check that I did the right thing with all the pointer arithmetic (ie when I change wqe to a real structure pointer instead of just a void *, maybe I left some pointer arithmetic around that expected it to work as a byte pointer, but now really works on the whole structure size instead). So this patch is NOT meant to be applied, but it is meant to teach people how things like this should be done. They should *not* be one big function with lots of case statements. They should be lots of small functions! Linus --- drivers/infiniband/hw/mthca/mthca_qp.c | 236 --- 1 files changed, 122 insertions(+), 114 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 11f1d99..74da9bc 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1578,6 +1578,113 @@ static inline int mthca_wq_overflow(struct mthca_wq *wq, int nreq, return cur + nreq = wq-max; } +static int handle_next_seg(struct ib_send_wr *wr, struct mthca_next_seg * wqe) +{ + wqe-nda_op = 0; + wqe-ee_nds = 0; + wqe-flags =((wr-send_flags IB_SEND_SIGNALED) ? +cpu_to_be32(MTHCA_NEXT_CQ_UPDATE) : 0) | + ((wr-send_flags IB_SEND_SOLICITED) ? +cpu_to_be32(MTHCA_NEXT_SOLICIT) : 0) | + cpu_to_be32(1); + + if (wr-opcode == IB_WR_SEND_WITH_IMM || + wr-opcode == IB_WR_RDMA_WRITE_WITH_IMM) + wqe-imm = wr-imm_data; + + return sizeof(struct mthca_next_seg); +} + +static int handle_raddr_seg(struct mthca_dev *dev, struct mthca_qp *qp, struct ib_send_wr *wr, + struct mthca_raddr_seg *wqe, int ind) +{ + switch (qp-transport) { + case RC: + switch (wr-opcode) { + case IB_WR_ATOMIC_CMP_AND_SWP: + case IB_WR_ATOMIC_FETCH_AND_ADD: { + struct mthca_atomic_seg *atomic; + + wqe-raddr = cpu_to_be64(wr-wr.atomic.remote_addr); + wqe-rkey = cpu_to_be32(wr-wr.atomic.rkey); + wqe-reserved = 0; + + atomic = (struct mthca_atomic_seg *) (wqe+1); + + if (wr-opcode == IB_WR_ATOMIC_CMP_AND_SWP) { + atomic-swap_add = cpu_to_be64(wr-wr.atomic.swap); + atomic-compare = cpu_to_be64(wr-wr.atomic.compare_add); + } else { + atomic-swap_add = cpu_to_be64(wr-wr.atomic.compare_add); + atomic-compare = 0; + } + + return sizeof (struct mthca_raddr_seg) + + sizeof (struct mthca_atomic_seg); + } + + case IB_WR_RDMA_WRITE: + case
Re: [git patches 1/2] warnings: attack valid cases spotted by warnings
On Tue, 17 Jul 2007, Roland Dreier wrote: Anyway, here's a totally untested cleanup that compiles but probably doesn't work, because I didn't check that I did the right thing with all the pointer arithmetic (ie when I change wqe to a real structure pointer instead of just a void *, maybe I left some pointer arithmetic around that expected it to work as a byte pointer, but now really works on the whole structure size instead). Given that you took the time to do this, I'll get the patch into a working state and apply it. And maybe split it into reviewable chunks while I'm at it ;) Hey, I appreciate it, but I really do have to warn you that I did this all blind, and just meant for it to be a I think this kind of direction is more productive thing. I'm not going to guarantee that it works at all. I spent more time than I really wanted to on actually making sure the end result even compiles (quite often, I'm perfectly happy to just send out pseudo-code to indicate what I think should be done), but maybe I shouldn't have done that, just so that nobody thinks that the patch is necessarily going to *work*. In other words, it's a totally mindless cleanup and re-factorization. It *may* work, but quite frankly, I did it just for that one email, and spent the absolutly minimum possible time thinking about it. As a result, I really think it needs some serious looking over. I also think that my new handle_raddr_seg() function could itself be split up into subfunctions along the switch() subcases. So not only wasn't it meant to be guaranteed correct, but it wasn't even meant to be seen as the best possible final situation: it could be - and should be - factored out even more (and the same goes for a lot of the other functions in that file that I didn't really look at, just glance and notice that they have some of the same problem patterns). Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/2] 2.6.22-rc7: known regressions
On Wed, 4 Jul 2007, David Woodhouse wrote: Oh, and here's another one for you. My Bluetooth mouse just stopped working and hidd is deadlocked... Looks like it is stuck on hidp_session_sem. Nothing after 2.6.21 seems to have even touched that semaphore usage, and in fact there's not a whole lot of changes to the hidp code at all (and none of them look even remotely interesting). So I suspect it's something lower down in the bluetooth stack, or it's a long-standing problem that you are somehow able to trigger more easily now. Is it consistent? Can you showo the traces for the _other_ processes that are in bluetooth functions? Because there should be other processes there, holding that hidp_session_sem rwsem. [ Alternatively, there is some process that doesn't release it in an error case, but that is definitely not a regression if so: the changes to net/bluetooth/hidp/core.c since 2.6.21 really are trivial. ] IOW, more info needed, I think. Linus --- hidd D 1FE27798 5940 1695 1 (NOTLB) Call Trace: [ef3ddb70] [0004] 0x4 (unreliable) [ef3ddc30] [c0008e7c] __switch_to+0x50/0x68 [ef3ddc50] [c02d5998] schedule+0x3cc/0x480 [ef3ddc80] [c0137a20] rwsem_down_failed_common+0x1c4/0x1f4 [ef3ddcb0] [c02d7454] rwsem_down_write_failed+0x28/0x40 [ef3ddce0] [c004ff60] down_write+0x50/0x64 [ef3ddd00] [f27f2068] hidp_add_connection+0x168/0x75c [hidp] [ef3ddd40] [f27f2e44] hidp_sock_ioctl+0x140/0x414 [hidp] [ef3ddeb0] [c024da6c] sock_ioctl+0x248/0x284 [ef3dded0] [c00ab02c] do_ioctl+0x38/0x84 [ef3ddee0] [c00ab448] vfs_ioctl+0x3d0/0x404 [ef3ddf10] [c00ab4e4] sys_ioctl+0x68/0x98 -- dwmw2 - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/2] 2.6.22-rc7: known regressions
Looks input-related.. On Thu, 5 Jul 2007, David Woodhouse wrote: Hm, it's not something new. It's an oops I saw occasionally in 2.6.21-rc too, whenever we had CONFIG_SYSFS_DEPRECATED set. Unable to handle kernel paging request for data at address 0x6b6b6b6b Ok, that 0x6b is obviously the kfree() poisoning, ie it looks like a use-after-free problem with a pointer being loaded from a structure that had been free'd- And the trace seems to be (ignore the unreliable one): NIP [c001870c] strlen+0x4/0x18 LR [c0134fec] kobject_get_path+0x34/0xc4 Call Trace: [eed5be90] [c01d5124] class_uevent+0xac/0x1bc [eed5bed0] [c01357e4] kobject_uevent_env+0x23c/0x460 [eed5bf20] [c01d485c] class_device_del+0x178/0x1a0 [eed5bf40] [c01d489c] class_device_unregister+0x18/0x30 [eed5bf60] [c021f820] input_unregister_device+0xf4/0x130 [eed5bf70] [c0242f4c] hidinput_disconnect+0x2c/0x60 [eed5bf90] [f27f2bac] hidp_session+0x550/0x584 [hidp] [eed5bff0] [c0013e28] kernel_thread+0x44/0x60 Where we have a few missing functions due to inlining, ie the real sequence seems to be: class_device_del - kobject_uevent_env - class_uevent - kobject_get_path - get_kobj_path_length - parent = kobj; do { strlen(parent-k_name /* kobject_name(parent) */); parent = parent-parent; } while (parent); so either the kobj or one of it's parents had already been freed when it was unregistered due to the disconnect. I'm not seeing any reference counting or other protection for the device (input) on hid-inputs list. But I don't know the code. Dmitry? Jiri? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] libata fixes
On Tue, 3 Jul 2007, Alan Cox wrote: Chuck Ebbert (1): pata_ali: fix UDMA settings Could you please fix your git tree to have the proper credits for patches you pull from bugzilla. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=242472 I don't think it is Jeff who needs to fix anything: it seems that it is Chuck Ebbert you should talk to. He's the one that dropped the attribution. Probably because of some insane system he uses (he has a comment in that bugzilla about patch from comment #14 is in CVS now.. CVS? What kind if insane setup do you have there at Red Hat? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] libata fixes
On Wed, 27 Jun 2007, Jeff Garzik wrote: Such would be a diagnostic that would trigger on valid SCSI commands, when the user is doing nothing wrong and the system can indeed complete the command just fine. Additionally, this is moving us in the direction of what the IDE driver has apparently been doing. Indeed. At least the IDE CD-ROM driver does if ((rq-data_len 15) || (addr mask)) info-dma = 0; where the mask is the dma_alignment mask. So it requires the length to be a multiple of 16, and also requires a certain alignment of the data (which defaults to 32 bytes for some reason I cannot for the life of me remember). The generic BIO layer does that DMA alignment check too when mapping user pages. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Re: Linux v2.6.22-rc3
On Thu, 7 Jun 2007, Jeff Garzik wrote: Ack'ing the sata_promise change was easy, but with this one it would be nice to wait a bit before changing the core probe code that [now] every ATA setup goes through, based on a single bug report. So what's the resolution? Right now this is apparently the reasong for two of our current regressions, and we need to solve it. Just wait is not an option. The options are: - fix things - revert everything that caused the regressions and quite frankly, I don't think the second alternative is palatable to anybody, including you. The values assist in detecting ghost devices (same device appearing on both master and slave) and TF register malfunctions, and I would appreciate not breaking _that_ so late in 2.6.22-rc for a single report. Thankfully we have -some- ghost device prevention code elsewhere, but this is part of it. Apparently this isn't even true. As far as I can tell, the old code used to wait for lbal to be 1, but if it never became one, it would just silently time out and not *do* anything about it. Later commands would just work, and the device would go its merry ways. IOW, the new code is *crap*. The old code was arguably not wonderful either, but because of its nature, the old code not working didn't actually matter all that much. So the biggest change (and the one that broke peoples machines) is that the code that you claim matters, *never* apparently did matter, and now that we've made it matter (by returning an error, and aborting the SRST sequence as a failure), people are complaining. I really think we should apply Tejun's patch. Add a delay there, by all means if you are nervous: but no, it shouldn't be 150ms. This is the _post_ reset handling, we've already done the 150ms wait for the reset! In fact, if you look at ata_bus_post_reset(), you'll notice that for port0, we just do the ata_wait_ready() call. Tejun's patch just made us do the same (logical) thing for port1 too! If you think it breaks due to ome timeout issue, then the port0 thing was already broken. (And maybe we could make the ap-ops-dev_select(ap, 1); rc = ata_wait_ready(ap, deadline); instead be a ata_dev_select(ap, 1, 1, 1); and you'd wait even more? But that's actually a bigger change than Tejun's thing, and it uses ata_wait_idle() rather than ata_wait_ready(), and doesn't return any status.. I dunno. But right now, we have several known broken things, that apparently weren't broken before!) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libata: always use polling SETXFER
On Sun, 3 Jun 2007, Jeff Garzik wrote: If we have another -rc, I would probably be OK with pushing it for 2.6.22, otherwise I would prefer to wait for 2.6.23. We'll definitely have another -rc. I'll push -rc4 tonight, and while I'm hoping that we'll have resolved a number of the regressions, I can guarantee an -rc5 and I'd be very surprised if we don't have an -rc6 too. (In fact, -rc6 tends to be what I consider the sweet spot for when I start thinking that I'm ready for a release). Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux v2.6.22-rc3
On Fri, 1 Jun 2007, Jeff Garzik wrote: I'm about to dive into some heads-down RHEL backporting (whee), so I cannot look at the code in depth this weekend, but here are my basic thoughts: * We knew there would be fallout from the new reset-sequence code, and this is clearly in that category. * It worked before #reset-seq merge AFAICT, which implies the old method of probing -- which included SRST -- worked. Well, I don't think it really worked before. It apparently always had a bad 30-second timeout (probably because the reset just didn't work at all). It's just that the old code didn't care, and since the identify then worked, it was all good. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux v2.6.22-rc3
On Fri, 1 Jun 2007, Jeff Garzik wrote: With these old PATA devices, device reset is six of one, half-dozen of the other. Using SRST is the only way to kick some ATAPI devices into working: http://suif.stanford.edu/~csapuntz/blackmagic.html#reset Well, wouldn't it be a good thing to 1) if BUSY/DRQ is set even before you try the problem, obviously skip the two polite cases, and go to #4 2) try to just do an IDENTIFY 3) if that doesn't work, do a HOST RESET and then try again 4) if that doesn't work, do the full SRST (or some variation of the above). That at least has the potential to get you six working devices, and half a dozen other working devices, for a total of 12. With no unnecessary resets or long timeouts! (Btw, the 150ms wait after reset is really nasty. A few of those, and we're wasting seconds during bootup. Why the hell does it do that, when the old driver - and the spec - said 2ms?) I'm mainly interested in hearing feedback from Fedora 7 damage, before making a major decision about the probing code. If this is a single dain bramaged device, we should avoid punishing the majority. But if this is a trend, it warrants careful reconsideration. I thought the default in Fedora was to use the PATA driver first? If so, you're going to miss a lot of cases that just work, because they use the old driver. At least that was what my situation was on the P4 that I recently complained about the set transfer mode problem: it used to use the old PATA drivers that didn't have the issue, so I never even _knew_ the new libata-based code had problems. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux v2.6.22-rc3
On Fri, 1 Jun 2007, Dave Jones wrote: There are no old drivers in F7 and beyond. # CONFIG_IDE is not set Ahh, that's certainly going to root out the issues. Now let's hope that people install it.. (Fedora seems to make it hard on purpose to upgrade between major revisions, but maybe that's just me not reading the docs as usual ;) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux v2.6.22-rc3
On Fri, 1 Jun 2007, Tejun Heo wrote: Gregor's cdrom has broken SRST support which is extremely rare and broken. Well, the concept is neither rare nor arguably all that broken. The paper standards floating around in the industry are so much toilet paper. The only standard that seems to really matter is what Windows has traditionally done. We may not like it, but there it is... This bites us more when it comes to the real el-cheapo stuff, notably when it comes to various USB mass storage things (which have some random USB-flash controller cobbled together by a senile llama on crack), and is almost unheard of for anythign that is server-grade, but when it comes to no-brand random devices, it really does tend to be the case that the only testing they ever had was using Windows. And hardware is seldom any different from software: if it wasn't tested, it probably doesn't work. So it would be good if somebody knew what the Windows ID/startup sequence was/is. I think we figured it out by trial-and-error for the USB mass storage stuff. But it tends to boil down to: don't do things that aren't absolutely required (for SCSI, it was things like not asking for mode pages that weren't absolutely required, because some devices wouldn't support it, and would simply lock up if you did so!) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux v2.6.22-rc3
On Tue, 29 May 2007, Tejun Heo wrote: Aieee, so the drive doesn't like the new SRST sequence. It would appear that the old code largely ignored the SRST error entirely, no? If we *used* to do (in ata_bus_post_reset()): if (dev0) ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT); and you changed that to actually care about the return value: if (dev0) { rc = ata_wait_ready(ap, deadline); if (rc rc != -ENODEV) return rc; } (in _two_ places). That change also changed the same post_reset handling in a totally _different_ way: it used to do ata_busy_sleep() twice, now it still does it twice, but it does it with the same timeout value, so if the first one times out, then the second one won't be given any timeout AT ALL! And to make matters worse: the first timeout seems to be for ANOTHER PORT ENTIRELY! So you seem to break port 1 even if the timeout happened on port 0, as far as I can read that sequence. So I think your ata_bus_post_reset() changes are rather suspect. The fact that you don't change the timeout, and use the same deadline for two different ports (and for multiple commands to the same port, afaik), seems rather suspect. The old code also didn't care about failures in certain phases of the reset sequence, and it appears that it did so for good reason. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux v2.6.22-rc3
On Sun, 27 May 2007, Gregor Jasny wrote: 2007/5/27, Jeff Garzik [EMAIL PROTECTED]: Does the attached patch fix things? No it doesn't. Note that I'm using ata_piix. The commit message for that thing is badly worded. It definitely hits ata_piix too, and it concerns a lot more than just the LITE-ON CD-ROM drives (ie that patch fixes a bug for me on a PIIX system I have access to, with a TDK drive). But if it doesn't help for you, you have some other issue (which is not surprising - yours wasn't a SETFXSR error, and I don't think it would have worked very well before either). So since it apparently _did_ work for you before, can you bisect it? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] IDE fixes
On Thu, 24 May 2007, Bartlomiej Zolnierkiewicz wrote: diff --git a/drivers/ide/pci/serverworks.c b/drivers/ide/pci/serverworks.c index 6234f80..47bcd91 100644 --- a/drivers/ide/pci/serverworks.c +++ b/drivers/ide/pci/serverworks.c @@ -173,7 +179,7 @@ dma_pio: ((dma_stat(1(5+unit)))==(1(5+unit { u8 dmaspeed = dma_timing; - dma_timing = ~0xFF; + dma_timing = ~0xFFU; This is just crap. The old code was _also_ crap, but the new code just is worse. What's the point of this, really? dma_timing is a 8-bit value, so the above is just a *really* stupid and bad way of saying dma_timing = 0; and whoever wrote that code is just terminally confused. I pulled, but that driver is CRAP. Please don't add new crap blindly like that. Andrew, that was your change, it appears. Tssk. Please, we do NOT fix compiler warnings without understanding the code! That's a sure way to just introduce _new_ bugs, rather than fix old ones. So please, please, please, realize that the compiler is _stupid_, and fixing warnings without understanding the code is bad. In this case, anybody who actually spends 5 seconds looking at the code should have realized that the warning is just another way of saying that the author of the code was on some bad drugs, and the warnings WERE BETTER OFF REMAINING! Because that code _should_ have warnings. Big fat warnings about incompetence! Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] IDE fixes
On Wed, 23 May 2007, Linus Torvalds wrote: So please, please, please, realize that the compiler is _stupid_, and fixing warnings without understanding the code is bad. Btw, this is fundamental. If you don't need to understand the code, then the compiler could have just fixed it for you, and there was no need to warn. So compiler warnings have two cases: - the compiler is being a complete a**hole, and the warning should exist. It happens. - it can be fixed, but only by understanding what the code wants to do. In no case is it ok to just shut up the warning. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/3] 2.6.22-rc1: known regressions v2
On Wed, 16 May 2007, Christoph Lameter wrote: On Wed, 16 May 2007, Michal Piotrowski wrote: Memory management Subject: kernel BUG at include/linux/slub_def.h:88 kmalloc_index() References : http://bugzilla.kernel.org/show_bug.cgi?id=8476 Submitter : Cherwin R. Nooitmeer [EMAIL PROTECTED] Status : Unknown This a kmalloc(0) that needs fixing. Well, needs fixing is a bit strong. It's a scary message for something we've always handled, and that we still handle fine, we just complain about it. So we'll probably just turn the message off for 2.6.22, but in the meantime, we leave it on and try to fix as many of these as we can be bothered to. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
On Sat, 31 Mar 2007, Thomas Gleixner wrote: While I agree in principle with the patch, I'm a bit uncomfortable. The sys device suspend / resume ordering is not guaranteed and relies on the registering order. Well, this is why we probably should try to get away from the system device issue, exactly because system devices are totally outside the normal ordering and only have a random linear order. If the clocksources were actually in the device tree, you'd get all the normal guarantees about hierarchical ordering.. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
On Sat, 31 Mar 2007, Thomas Gleixner wrote: Right, but clock - sources/events need to be extremly late suspended and early resumed. How can we ensure this ? Make them be at the top of the device tree by adding them early. That's the whole point of the device tree after all - we have an ordering that is enforced by its topology, and that we sort by when things were added. Right now the way things work is (iirc - somebody like Greg should double-check me) is that we add all devices to the power list at device_add() time by traversing the devices fromt he root all the way out, and doing a device_add() which does a device_pm_add(), which in turn adds it to the power-management list - so that the list is always topologically sorted. So the only thing that needs to be done is to make sure that we add the timer devices early during bootup - something we have to do *anyway*. If a device is added early in bootup, that automatically means that it will be suspended late, and resumed early - because we maintain that order all the way through.. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
On Sat, 31 Mar 2007, Maxim Levitsky wrote: So maybe I was right afrer all, Maybe it is better to add a suspend/resume hook to each clock source and call it from timekeeping_resume() ? Umm.. WHy not make the device tree look like this: -- clocksource -- +-- HPET | +-- TSC | +-- i8259 | +-- lapic timer | .. whatever else and use the struct device that we *have* for this? The whole struct device is literally designed to do this, and to be embedded into whatever bigger structures you have that describes higher-level behaviour. Ie you'd put a struct device inside the struct clocksource. That thingalready *has* the suspend/resume hooks, and it will mean that people will see the clocks in the device tree rather than have a new notion. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions
On Thu, 29 Mar 2007, Maxim wrote: On Thursday 29 March 2007 07:08:58 Linus Torvalds wrote: (Or, better yet, shouldn't we set boot_hpet_disable when we decide not to use the HPET, and set hpet_virt_address to NULL?) This is done here out_nohpet: iounmap(hpet_virt_address); hpet_virt_address = NULL; No, that only clears hpet_virt_address, and thus makes all subsequent hpet_readl() and hpet_writel() calls oops. But it doesn't actually *tell* anybody that the HPET is disabled, so if you later on do if (is_hpet_capable()) { time = hpet_readl(..); .. you will just Oops! So as far as I can see, even with your latest patch, if hpet_enable() fails (and triggers the goto out_nohpet cases), you'll just oops immediately when you try to suspend/resume the HPET. THAT was what I meant - when we clear hpet_virt_address, we should also tell all potential subsequent users that the HPET is not there! Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
On Thu, 29 Mar 2007, Maxim Levitsky wrote: Subject: Add suspend/resume for HPET This adds support of suspend/resume on i386 for HPET Signed-off-by: Maxim Levitsky [EMAIL PROTECTED] --- arch/i386/kernel/hpet.c | 68 +++ Btw, what about arch/x86_64/kernel/hpet.c? That thing seems totally broken. Lookie here: arch/x86_64/kernel/hpet.c:irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs) drivers/char/rtc.c:extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id); anybody see a problem? The x86-64 version doesn't seem to be very well maintained. Is there some fundamental reason why this file isn't shared across architectures? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions
On Thu, 29 Mar 2007, Maxim Levitsky wrote: I agree, and as you said I did exactly that: Gaah. I'm blind. Sorry. Your patch did indeed do exactly that, I somehow overlooked it. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] libata fixes
On Wed, 28 Mar 2007, Jeff Garzik wrote: Jeff Garzik wrote: This disables libata ACPI, among other things. If a -rc6 is possible, that would be quite nice... Heh. I don't think -rc6 is possible - it's inevitable. We have too much fallout from the timer changes still outstanding. It looks people finally figured out a big issue with the HPET timer, and that hopefully resolves most of the remaining timer-related regressions, but yes, we most definitely _will_ have an -rc6. Andrew, what's your feeling apart from the timer fallout? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/6] 2.6.21-rc4: known regressions
On Wed, 28 Mar 2007, Maxim wrote: Now I don't have a clue how to set those bits if only HPET is used as clock source because now clocksources don't have _any_ resume hook. One thing that drives me wild about that clocksource resume thing is that it seems to think that clocksources are somehow different from any other system devices.. Why isn't the HPET considered a device, and has it's own *device* suspend and resume? Why do we seem to think that only set_mode() etc should wake up clock sources? It's a *device*, dammit. It should save and resume like one (probably as a system device). The set_mode() etc stuff is at a completely different (higher) conceptual level. Thomas? It does seem like Maxim has hit the nail on the head (at least partly) on the HPET timer resume problems.. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [3/6] 2.6.21-rc4: known regressions
On Wed, 28 Mar 2007, David Brownell wrote: On Wednesday 28 March 2007 9:38 am, Linus Torvalds wrote: It's a *device*, dammit. It should save and resume like one (probably as a system device). The set_mode() etc stuff is at a completely different (higher) conceptual level. Agreed, except about probably as a system device. Last I checked, there was no good reason to use sysdev suspend()/resume() rather than platform_device suspend_late()/early_resume(). Which more or less means no good reason to use sysdev in new code... I won't disagree - it might well be much nicer to just show it in the real device tree. I'm not 100% sure where in the tree it would go, though. It should probably be inside the root entry, before any of the PCI buses. It's generally what we've used those system device things for, but I agree that it would be better to just make system devices show up early on the regular device list than it is to have them be special cases. Bit I think that's a separate (and fairly small) issue compared to the don't use the clocksource infrastructure as a make-believe suspend/resume mechanism problem that Maxim's patch had. (Maxim, don't take that the wrong way - I think your analysis and patch were great, I just think another organization would be better) Also, making HPET use the legacy mode seems like a step backwards. I don't think that's actually legacy in any sense but the interrupt delivery, where the legacy mode bit is not so much that the HPET itself is legacy but that it *replaces* legacy devices. But I may have misunderstood the thing. I'm an old fart, so I know the old timers much better than I know the new ones ;). Somebody feel free to hit me with the clue-2x4. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ PATCH] Add suspend/resume for HPET was: Re: [3/6] 2.6.21-rc4: known regressions
On Thu, 29 Mar 2007, Maxim wrote: I am sending here a patch that as was discussed here adds hpet to list of system devices and adds suspend/resume hooks this way. I tested it and it works fine. Ok, it certainly looks better, but it *also* looks like it just assumes the HPET is there. Which would work in testing _with_ a HPET, but would likely break on hardware without one, no? Shouldn't there be at least something like a if (!is_hpet_capable()) return 0; at the top of that init routine? I'd also expect that you'd need to check that hpet_virt_address is valid or something? (Or, better yet, shouldn't we set boot_hpet_disable when we decide not to use the HPET, and set hpet_virt_address to NULL?) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ATA ACPI (was Re: Linux 2.6.21-rc5)
On Tue, 27 Mar 2007, Jeff Garzik wrote: FWIW, I'm still leaning towards disabling libata ACPI support by default for 2.6.21. Hey, I'm not going to argue against anything that says disable ACPI. Of *course* it should be disabled if there aren't thousands of machines that are in user hands that actually need it (and none that regress). Anybody want to send me a patch? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] IDE fixes
On Sat, 17 Mar 2007, Bartlomiej Zolnierkiewicz wrote: to receive the following updates: b/drivers/ide/Kconfig | 48 - b/drivers/ide/Makefile |1 b/drivers/ide/arm/icside.c | 13 b/drivers/ide/ide-dma.c |2 b/drivers/ide/ide.c |4 b/drivers/ide/mips/au1xxx-ide.c |3 b/drivers/ide/pci/Makefile |1 b/drivers/ide/pci/cmd64x.c | 45 - b/drivers/ide/pci/jmicron.c | 29 b/drivers/ide/pci/scc_pata.c| 858 b/drivers/ide/setup-pci.c |5 b/include/asm-mips/mach-au1x00/au1xxx_ide.h | 34 - drivers/ide/ppc/scc_pata.c | 858 13 files changed, 915 insertions(+), 986 deletions(-) Please use git diff -M --stat --summary to generate the diffstat (where the -M is the important part). Your scc_pata.c change was a pure rename, but because you didn't ask for rename detection, it got shown as a file create/delete pair. It *should* have looked like this: drivers/ide/Kconfig | 48 + drivers/ide/Makefile |1 - drivers/ide/arm/icside.c | 13 +--- drivers/ide/ide-dma.c |2 +- drivers/ide/ide.c |4 -- drivers/ide/mips/au1xxx-ide.c |3 +- drivers/ide/pci/Makefile |1 + drivers/ide/pci/cmd64x.c | 45 --- drivers/ide/pci/jmicron.c | 29 ++--- drivers/ide/{ppc = pci}/scc_pata.c |0 drivers/ide/setup-pci.c |5 --- include/asm-mips/mach-au1x00/au1xxx_ide.h | 34 12 files changed, 57 insertions(+), 128 deletions(-) rename drivers/ide/{ppc = pci}/scc_pata.c (100%) which is a lot more accurate than what you get by feeding a diff through diffstat that has no clue about renames (look at the summary for number of lines changed ;) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [git patches] libata fixes
On Sun, 11 Mar 2007, Paul Rolland wrote: Nope... I tried several patches from Tejun, and also some that Jeff posted to linux-ide, but no luck. The only way to have this DVD-RW working is to use irqpoll on the command line... So it has *never* worked? That's what I'm trying to see - you had a before and after dmesg in one of your posts, and the before one looked fine (as if it was working) because it didn't have the error messages. So I'm just trying to figure out where the regression started... To complete, here are some more output from the machine : What happens if you disable MSI entirely? Use pci=nomsi on the command line. The ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x104) messages happen for you on the controller that claims MSI: ata2: SATA max UDMA/133 cmd 0xc208a980 ctl 0x bmdma 0x irq 504 and quite frankly, we've had lots of bugs with MSI, both in hardware and in software. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [git patches] libata fixes
On Sun, 11 Mar 2007, Paul Rolland wrote: My machine is having two problems : the one you are describing above, which is due to a SIL controler being connected to one port of the ICH7 (at least, it seems to), and probing it goes timeout, but nothing is connected on it. Ok, so that's just a message irritation, not actually bothersome otherwise? The second problem is a Jmicron363 controler that is failing to detect the DVD-RW that is connected, unless I use the irqpoll option as Tejun has suggested. .. and this one has never worked without irqpoll? But, as you suggest it, I'm adding pci=nomsi to the command line rebooting... no change for this part of the problem. OK, the /proc/interrupt for this config, and the dmesg attached. 3 [23:22] [EMAIL PROTECTED]:~ cat /proc/interrupts CPU0 CPU1 0: 297549 0 IO-APIC-edge timer 1: 7 0 IO-APIC-edge i8042 4: 13 0 IO-APIC-edge serial 6: 5 0 IO-APIC-edge floppy 8: 1 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12:126 0 IO-APIC-edge i8042 14: 8313 0 IO-APIC-edge libata 15: 0 0 IO-APIC-edge libata 16: 0 0 IO-APIC-fasteoi eth1, libata So it's the irq16 one that is the Jmicron controller and just isn't getting any interrupts? Since all the other interrupts work (and MSI worked for other controllers), I don't think it's interrupt-routing related. Especially as MSI shouldn't even care about things like that. And since it all works when irqpoll is used, that implies that the *only* thing that is broken is literally irq delivery. Is there possibly some jmicron-specific enable interrupts bit? PS : I'd like to try 2.6.21-rc3, but it seems that this is breaking my config : disk naming is no more the same, and I end up with a panic Warning: unable to open an initial console though i've been compiling with the same .config I was using for 2.6.21-rc2 Gaah. Can you get a log through serial console or netconsole to see what changed? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: and try remove another quirk on this computers Re: [3/6] 2.6.21-rc2: known regressions
On Sat, 10 Mar 2007, Sergio Monteiro Basto wrote: With this quirk I got this oops on hibernate (but computer still working) Well, strictly speaking it's a warning, not an oops per se. What happens is that the quirk wants to do an ioremap_nocache(), which allocates memory, and that happens very early during initialization when interrupts are disabled. And you're really not supposed to allocate memory, except using GFP_ATOMIC. But we've always been lax about that during early boot, so we have stuff that does. And resume ends up doing a lot of the same things early boot does, and shows issues like this. So the quirk is probably still a good idea, and the warning message is just that - a very scary warning message, but not an indicator that anything is seriously screwed up for you. (It is an indication of a real bug, though, even though it's harmless in practice in this case) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libata: Cable detection fixes
On Thu, 1 Mar 2007, Alan wrote: The patch switches the identify order so that we can do the drive side detection correctly. Secondly we add a -cable_detect() method called after the identify sequence which allows a host to do host side detection at this point should it wish, or to modify the results of the drive side identify. Alan, sign-offs? Jeff, should I just expect to get these things through you? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] libata updates (mostly fixes)
On Thu, 15 Feb 2007, Jeff Garzik wrote: diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index db185f3..d51f0f1 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -22,6 +22,7 @@ config IA64 config 64BIT bool + select ATA_NONSTANDARD if ATA default y Ok, this is just _strange_. Tying ATA_NONSTANDARD into ia64 by tying it to the 64BIT config variable may work (well, I _assume_ it does), but it's just psychedelic. How about adding a separate config entry like config IA64_ATA bool depends on ATA select ATA_NONSTANDARD default y which kind of makes sense when you squint just the right way.. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] libata updates (mostly fixes)
On Thu, 15 Feb 2007, Linus Torvalds wrote: Ok, this is just _strange_. Btw, I did pull, but I still think we shouldn't do those kinds of strange Kconfig file games. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
On Mon, 29 Jan 2007, Mike Galbraith wrote: The extremely unlikely winner is: 29b08d2bae854f66d3cfd5f57aaf2e7c2c7fce32 is first bad commit Yeah, that's not going to be it. You probably had a bad kernel there somewhere that you called good. Git bisect is wonderful for figuring out (reasonably quickly) where problems are, but exactly because it zeroes in on the bug so quickly, if you give it faulty data, it zeroes in on something *else* very quickly and without any kind of sense. There's just no redundancy there, so one small bit error in the input, and you get a totally wrong end result ;) I'm trying to narrow it down myself. It all *seemed* to work with the commit I suspected initially (the support larger block pc requests one). But yes, I haven't actually tried to burn anything either. I also just started up nero and looked whether I'd get the error messages in dmesg. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
On Mon, 29 Jan 2007, Linus Torvalds wrote: That said, I'm making progress with my bisection. 16 revisions left to test after this, and three of those sixteen are Remove unnecessary blk_queue_bounce in SG_IO fix SG_IO bio leak remove blk_queue_activity_fn I've now bisected to the point where those are the only commits left, so it's definitely one of the three. Two more reboots and I should know exactly which one broke nero. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
On Mon, 29 Jan 2007, Linus Torvalds wrote: Two more reboots and I should know exactly which one broke nero. This one. However, the scary thing is that I think the patch really is correct, and I wonder if nero has some strange work-around for an older bug.. Although I don't see how you could even have that, since afaik, the behaviour before the fix was literally just a leak that a user process shouldn't be able to see. Very strange. Will add some debugging printk's. Linus commit 77d172ce2719b5ad2dc0637452c8871d9cba344c Author: FUJITA Tomonori [EMAIL PROTECTED] Date: Mon Dec 11 10:01:34 2006 +0100 [PATCH] fix SG_IO bio leak This patch fixes bio leaks in SG_IO. rq-bio can be changed after io completion, so we need to reset rq-bio before calling blk_rq_unmap_user() http://marc.theaimsgroup.com/?l=linux-kernelm=116570666807983w=2 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- block/scsi_ioctl.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index b3e2107..045cabd 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -228,6 +228,7 @@ static int sg_io(struct file *file, request_queue_t *q, struct request *rq; char sense[SCSI_SENSE_BUFFERSIZE]; unsigned char cmd[BLK_MAX_CDB]; + struct bio *bio; if (hdr-interface_id != 'S') return -EINVAL; @@ -308,6 +309,7 @@ static int sg_io(struct file *file, request_queue_t *q, if (ret) goto out; + bio = rq-bio; rq-retries = 0; start_time = jiffies; @@ -338,6 +340,7 @@ static int sg_io(struct file *file, request_queue_t *q, hdr-sb_len_wr = len; } + rq-bio = bio; if (blk_rq_unmap_user(rq)) ret = -EFAULT; - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
Uwe, others, does this patch fix your problem? It will have a few printk's that it spews out, but if it fixes your problem, at least we know a bit more. Linus --- diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 2528a0c..f0ff151 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -333,8 +333,13 @@ static int sg_io(struct file *file, request_queue_t *q, hdr-sb_len_wr = len; } - if (blk_rq_unmap_user(bio)) + if (rq-bio != bio) + printk(rq-bio = %p, bio = %p\n, rq-bio, bio); + + if (blk_rq_unmap_user(rq-bio)) { + printk(blk_rq_unmap_user failed!\n); ret = -EFAULT; + } /* may not have succeeded, but output values written to control * structure (struct sg_io_hdr). */ - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
On Mon, 29 Jan 2007, Mike Christie wrote: rq-bio is NULL here, so no data is coped back to userspace and it seems nero just stops trying to talk to the drive after this. Well, except that's what we used to do in 2.6.19 too. So what changed? Because nero just gives up, no more commands are sent and we do not get flooded with status errors like before so it sort of looks like it solves the problem but it doesn't - at least that is what is happening here. Yeah, I can't burn with Nero either, although I'll also say that I've never even tried before, so my leet Nero skillz are nonexistant. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
On Mon, 29 Jan 2007, Mike Christie wrote: Ok. here is a fix with the overflow check sg.c has. Patch was made against Linus's tree and tested with nero. Userspace does not send us jiffies. Use msecs_to_jiffies and check for overflow like sg.c Signed-off-by: Mike Christie [EMAIL PROTECTED] Thanks. Fixed the ',' that Andrew pointed out, committed and pushed out. What a stupid bug. And I even _looked_ at that patch, since there weren't that many that changed anything in this area, and it looked just subtly right enough that no warning bells ever went off. Thanks to everybody, but especially Mike for showing us the errors of our ways. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
[ Added Jeff, Jens and Mike Christie to Cc. I would _guess_ this is associated with the larger block pc request stuff: Mike, Jens? James B added for good luck. It apparently started happening somewhere between 2.6.19 and 2.6.20-rc2, and doing a gitk v2.6.19..v2.6.20-rc2 block/scsi_ioctl.c drivers/ide/ I don't see anything else that really looks all that suspicious.. Unless maybe it's that Fix SG_IO leak. Jeff added because of the hddtemp issue, but I think that was effectively SATA-only, so probably isn't relevant. Damn, I just realized that Jens is in the middle of his vacation for a week.. Mike, can you please look at this and check? ] On Mon, 29 Jan 2007, Mike Galbraith wrote: FWIW, I just tried it with 2.6.20-rc6, and can confirm. Once nero is run, the kernel never gives up retrying whatever command failed, so I get... [ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete DataRequest } [ 4362.981475] ide: failed opcode was: unknown [ 4362.986183] hdd: drive not ready for command Ok, I tried the demo version you can download at http://www.nero.com/eng/nerolinux-prog.html and yes, nerolinux seems broken. I've never used it before, but it triggers: hda: irq timeout: status=0xc0 { Busy } ide: failed opcode was: unknown and after that there is indeed an endless stream of: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hda: drive not ready for command Which eventially switches to hda: status error: status=0x59 { DriveReady SeekComplete DataRequest Error } hda: status error: error=0x40 { LastFailedSense=0x04 } ide: failed opcode was: unknown hda: drive not ready for command However, it appears to be rather hard to debug, with nerolinux being some closed black box. Does anybody who knows nero know if there is some way to get debug information out of it to see what it tried to do? Can somebody try to bisect this? Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
On Mon, 29 Jan 2007, Mike Galbraith wrote: Unfortunately, I'm git impaired. I am rummaging as we speak though. Ok, I'm personally heading to bed, but it rally should be as simple as - get the git tree in the first place - do git bisect good v2.6.19 git bisect bad v2.6.20-rc2 .. it will pick a point for you to try .. .. compile, boot, test .. git bisect {good|bad} depending on results - until (found) (Of course, you should check that -rc2 really is bad to make sure. I think that's what Uwe reported, though. And I don't think we've done anything after -rc2 that could impact this, so I don't doubt it). Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary Intel SATA problem: frozen
On Wed, 29 Nov 2006, Tejun Heo wrote: You pushed your box really hard and the kernel can't get the memory it wants. Not really relevant to SATA problem. And it's not even really a bug - the caller is supposed to be ok with it. It's a warning message that the kernel spits out just because we've had problems in the past with callers that did _not_ handle an allocation error gracefully, so the warnign is spit out to (a) let us know something happened and (b) if there's a subsequent oops due to dereferencing a NULL pointer, it becomes easier to pinpoint what the sequence of events was. So it's an atomic allocation that happens on the receive path in the network when you've run out of pages (because you're getting enough network traffic that earlier receives have used up all buffers, and so much disk IO that we haven't had time to clean any new pages yet), and getting an allocation failure there really is normal, it's just very unusual. So that particular dump _looks_ scary, but it happens to be totally a non-issue unless something else happens afterwards to imply that the caller had trouble with the allocation failure. It's also a sign of trouble if you can trigger it _easily_. It should be something that only triggers under very high load and under unusual circumstances. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Scary Intel SATA errors..
Jeff, what does this mean: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: (BMDMA stat 0x21) ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout) ata1: port is slow to respond, please be patient (Status 0xd0) ata1: port failed to respond (30 secs, Status 0xd0) ata1: soft resetting port ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0xd0) ata1: port failed to respond (30 secs, Status 0xd0) ata1: soft resetting port ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0xd0) ata1: port failed to respond (30 secs, Status 0xd0) ata1: soft resetting port ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1.00: disabled ata1: EH complete followed by various IO errors: sd 0:0:0:0: SCSI error: return code = 0x0004 end_request: I/O error, dev sda, sector 335093 sd 0:0:0:0: SCSI error: return code = 0x0004 end_request: I/O error, dev sda, sector 896217445 Buffer I/O error on device dm-0, logical block 112001027 lost page write due to I/O error on dm-0 .. nasty, nasty, nasty. This is with ata_piix on a Intel i965 motherboard (everything Intel) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Scary Intel SATA problem: frozen
[ You may or may not have gotten my previous email. The kernel stayed working, but due to the IO errors the filesystem got re-mounted read-only, and I'm not sure that the email I sent out in that state actually ever made it out. I suspect it didn't. ] Jeff, I just had a scary thing on my nice new Intel i965 box (all Intel chipsets apart from some strange Marvell IDE interface that I'm not using and that no driver even detected, and a TI firewire thing that I'm similarly not using). The machine basically froze for about a minute or so (well, things worked surprisingly well, considering that apparently no disk IO happened - I initially thought it was just firefox that had frozen up, since my mail session seemed to be fine), and after it came back the filesystem was mounted read-only and nothing really worked any more.. I have no idea what status 0xD0 means: it looks like ATA_BUSY + ATA_DRDY + bit#4, but what is bit#4? And clearly, the soft-reset isn't doing squat. Ideas? Linus Boot-time messages: libata version 2.00 loaded. .. Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Probing IDE interface ide0... Probing IDE interface ide1... ide-floppy driver 0.99.newide ata_piix :00:1f.2: version 2.00ac6 ata_piix :00:1f.2: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt :00:1f.2[A] - GSI 19 (level, low) - IRQ 19 PCI: Setting latency timer of device :00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0x2148 ctl 0x217E bmdma 0x2110 irq 19 ata2: SATA max UDMA/133 cmd 0x2140 ctl 0x217A bmdma 0x2118 irq 19 scsi0 : ata_piix ata1.00: ATA-7, max UDMA/133, 976773168 sectors: LBA48 NCQ (depth 0/32) ata1.00: ata1: dev 0 multi count 16 ata1.00: configured for UDMA/133 scsi1 : ata_piix ata2.00: ATAPI, max UDMA/66 ata2.00: configured for UDMA/66 scsi 0:0:0:0: Direct-Access ATA WDC WD5000YS-01M 07.0 PQ: 0 ANSI: 5 SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back sda: sda1 sda2 sd 0:0:0:0: Attached scsi disk sda sd 0:0:0:0: Attached scsi generic sg0 type 0 scsi 1:0:0:0: CD-ROMPLEXTOR DVDR PX-755A 1.04 PQ: 0 ANSI: 5 sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.20 sr 1:0:0:0: Attached scsi CD-ROM sr0 sr 1:0:0:0: Attached scsi generic sg1 type 5 ata_piix :00:1f.5: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt :00:1f.5[A] - GSI 19 (level, low) - IRQ 19 PCI: Setting latency timer of device :00:1f.5 to 64 ata3: SATA max UDMA/133 cmd 0x2138 ctl 0x2176 bmdma 0x20F0 irq 19 ata4: SATA max UDMA/133 cmd 0x2130 ctl 0x2172 bmdma 0x20F8 irq 19 scsi2 : ata_piix ATA: abnormal status 0x7F on port 0x213F scsi3 : ata_piix ATA: abnormal status 0x7F on port 0x2137 Problem starts: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: (BMDMA stat 0x21) ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout) ata1: port is slow to respond, please be patient (Status 0xd0) ata1: port failed to respond (30 secs, Status 0xd0) ata1: soft resetting port ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0xd0) ata1: port failed to respond (30 secs, Status 0xd0) ata1: soft resetting port ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal status 0xD0 on port 0x214F ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0xd0) ata1: port failed to respond (30 secs, Status 0xd0) ata1: soft resetting port ATA: abnormal status 0xD0 on port 0x214F ATA: abnormal
Re: Scary Intel SATA problem: frozen
On Tue, 28 Nov 2006, Alan wrote: On Tue, 28 Nov 2006 09:31:51 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I just had a scary thing on my nice new Intel i965 box (all Intel chipsets apart from some strange Marvell IDE interface that I'm not using and that no driver even detected, and a TI firewire thing that I'm Mr Morton has the Marvell libata driver in his tree waiting to head your way. Well, I don't actually personally want it (I have nothing connected to it, nor any intention of connecting anything in the future), I just want my bog-standard PIIX driver to not do the scary things to me. Mommy, mommy, the IDE messages/behaviour is scaring me! I just mentioned the Marvell chip because apart from those two (unused) chips, the box is absolutely and utterly bog-standard Intel-everything. The i965 may still be somewhat unusual right now, but that's going to change, and if there's something strange going on, we should try to fix it asap. It could be a one-off thing (knock wood), but on the other hand, I've only been using this machine for a couple of weeks now, and I can't remember seeing anything even remotely similar on my other machines (including the earlier-generation i945 SATA setup that I've had a lot longer). So I worry that it's something i965-specific, and that will be a _very_ common chipset soon enough. One data-point that may or may not be relevant: the afore-mentioned i945 machine that I've had longer is otherwise reasonably similar, but the DVD drive on that one is in legacy mode. Not that I see why it should matter (the problem happened on the harddisk, not the DVD)... Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary Intel SATA problem: frozen
On Tue, 28 Nov 2006, Jeff Garzik wrote: Does jgarzik/libata-dev.git#upstream (don't pull, just test) work for you? Well, since I can't really test, I don't know. This problem has happened just once in the couple of weeks I've used that machine, and I wasn't even doing anything strange when it triggered (no heavy IO, no special programs, no nothing - I was literally just reading email and I think trying to browse over to news.com or something..) So I was more hoping that you'd say that it's a known issue, and already fixed, or that the status bits would give you some clue and make you say Ahh, we don't handle that case. I have nothing to test. The thing seems to work, and I have no known way to trigger the problem... I'm pretty sure this is already fixed, by the polling IDENTIFY for ata_piix patchset. Hmm. That sounds like it should just affect the bootup identification, which has always worked fine for me. Would it fix the softreset too? Anyway, I can certainly try yout current upstream branch, but as mentioned, the standard kernel works fine for me generally, so I don't really know what I can offer (except if upstream simply doesn't work at all, in which case I'll certainly let you know ;) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary Intel SATA problem: frozen
On Tue, 28 Nov 2006, Jonas Lundgren wrote: Example of what I mean with crappy performance: dd if=/dev/zero of=test232 bs=1M count=100; time sync 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.130424 s, 804 MB/s real 0m21.104s Ok, that's definitely not the same thing I see. I get real0m2.673s for the time sync part of your example, so the chipset can definitely do better than your 4-5 MB/s. And your read performance seems fine. Strange. I suspect it's related to your RAID usage. I've only got a single disk in my system. Maybe there is something problematic in sending commands to alternating SATA ports on the same controller with the i965 thing? The switching between SATA ports thing migt actually be a clue, because while I've had this thing for a few weeks, I only used the DVD drive for the first time the day before yesterday, and didn't actually even have the SCSI CD-ROM support compiled in until then (copied a config from another machine that had the DVD-rom on the legacy side, so it used the more common IDE-CD thing). So maybe these _are_ related somehow, and my problem showed up because I actually had concurrent access to my DVD drive (some KDE media daemon checking to see if I inserted a music CD or something?). Jeff, Tejun, is there any reason to believe that the two channels on a PIIX ata controller are somehow tied together and it could be problematic for concurrent accesses? Jonas definitely has the same error messages: Dmesg output from the error(s): (sda and sdb are 2 * 74GB raptor SATA drives in a Linux software raid0) ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: (BMDMA stat 0x20) ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout) ata1: port is slow to respond, please be patient ata1: port failed to respond (30 secs) ata1: soft resetting port ATA: abnormal status 0xD0 on port 0xFA07 ATA: abnormal status 0xD0 on port 0xFA07 ATA: abnormal status 0xD0 on port 0xFA07 ATA: abnormal status 0xD0 on port 0xFA07 ATA: abnormal status 0xD0 on port 0xFA07 ATA: abnormal status 0xD0 on port 0xFA07 That all looks exactly like mine did. Except: ata1: EH complete SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back Jonas' disks came back. So while Jonas' behaviour/problems otherwise don't seem to match mine at all, there might be some underlying commonality.. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary Intel SATA problem: frozen
On Tue, 28 Nov 2006, Jeff Garzik wrote: I was sorta wondering in that direction too. If its in legacy mode (PATA and SATA smushed together), that's a possibility. But native or AHCI modes, the channels are pretty independent (which is the nature of SATA). Well, what I was more wondering about is whether perhaps the legacy mode emulation - even when it isn't actually used - means that there is simply some shared state (read: chipset bug that nobody noticed). Historical note: ata_piix is IMO more complicated than ahci, because the silicon is emulating the PATA interface using an internal (probably huge) state machine, converting PATA behavior to sending/receiving SATA packets. Well, there's bound to be the same big state machine working the other way, and maybe the chip simply internally gets confused. Or, as you say, simply because the emulation state machinery has to be taken into account, and _that_ ends up beign shared between the two otherwise independent channels.. How hard would it be to just force a shared spinlock between two sata channels on the same controller? It sounds like Jonas has a very repeatable setup, so even if I can't repeat my problem, if the performance degradation on writes is related, he can check his thing.. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary Intel SATA problem: frozen
On Tue, 28 Nov 2006, Jeff Garzik wrote: ap-host (struct ata_host) already has a spinlock for precisely just that... :) Right, but do we actually take it? I'm not seing any spin_lock's in ata_piix.c, but I don't know the SATA layers enough to say whether upper layers take it or not.. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [another PATCH] Fix crash on boot in kmalloc_node IDE changes
On Thu, 7 Jul 2005, Christoph Lameter wrote: Here is IMHO the right way to fix this. Test for the hwif != NULL and test for pci_dev != NULL before determining the node number of the pci bus that the device is connected to. I think this is pretty unreadable. If you make it use a trivial inline function for the thing, I think that would be ok, though. Complex expressions are bad. Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] IDE update
On Tue, 5 Jul 2005, Jens Axboe wrote: Looks interesting, 2.6 spends oodles of times copying to user space. Lets check if raw reads perform ok, please try and time this app in 2.4 and 2.6 as well. I think it's just that 2.4.x used to allow longer command queues. I think MAX_NR_REQUESTS is 1024 in 2.4.x, and just 128 in 2.6.x or something like that. Also, the congestion thresholds are questionable: we consider a queue congested if it is within 12% of full, but then we consider it uncongested whenever it falls to within 18% of full, which I bet means that for some streaming loads we have just a 6% window that we keep adding new requests to (we wait when we're almost full, but then we start adding requests again when we're _still_ almost full). Jens, we talked about this long ago, but I don't think we ever did any timings. Making things worse, things like this are only visible on stupid hardware that has long latencies to get started (many SCSI controllers used to have horrid latencies), so you'll never even see any difference on a lot of hardware. It's probably worth testing with a bigger request limit. I forget what the /proc interfaces are (and am too lazy to look it up), Jens can tell us ;) Linus - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html