Re: 2.6.24-rc3-mm1: I/O error, system hangs
Le 25.11.2007 21:39, Laurent Riffard a écrit : > Le 25.11.2007 08:37, James Bottomley a écrit : >> On Sat, 2007-11-24 at 23:59 +0100, Laurent Riffard wrote: >>> Le 24.11.2007 14:26, James Bottomley a écrit : >>>> OK, could you post dmesgs again, please. I actually tested this >>> with an >>>> aic79xx card, and for me it does cause Domain Validation to succeed >>>> again. >>> James, >>> >>> Here is a dmesg produced by 2.6.24-rc3-mm1 + your patch "separates >>> the >>> BLOCK and QUIESCE states >>> correctly" (http://lkml.org/lkml/2007/11/24/8). >>> [...] >>> [ 25.521256] scsi0 : pata_via >>> [ 25.521711] scsi1 : pata_via >>> [ 25.524089] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xb800 irq >>> 14 >>> [ 25.524176] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xb808 irq >>> 15 >>> [ 25.683141] ata1.00: ATA-5: ST340016A, 3.75, max UDMA/100 >>> [ 25.683208] ata1.00: 78165360 sectors, multi 16: LBA >>> [ 25.683475] ata1.01: ATA-7: Maxtor 6Y080L0, YAR41BW0, max UDMA/133 >>> [ 25.684116] ata1.01: 160086528 sectors, multi 16: LBA >>> [ 25.691127] ata1.00: configured for UDMA/100 >>> [ 25.699142] ata1.01: configured for UDMA/100 >>> [ 26.170860] ata2.00: ATAPI: HL-DT-ST DVDRAM GSA-4165B, DL05, max UDMA/33 >>> [ 26.171562] ata2.01: ATAPI: CD-950E/AKU, A4Q, max MWDMA2, CDB intr >>> [ 26.330839] ata2.00: configured for UDMA/33 >>> [ 26.490828] ata2.01: configured for MWDMA2 >>> [ 26.503014] scsi 0:0:0:0: Direct-Access ATA ST340016A 3.75 PQ: >>> 0 ANSI: 5 >>> [ 26.504670] scsi 0:0:1:0: Direct-Access ATA Maxtor 6Y080L0 YAR4 >>> PQ: 0 ANSI: 5 >>> [ 26.509842] scsi 1:0:0:0: CD-ROMHL-DT-ST DVDRAM GSA-4165B >>> DL05 PQ: 0 ANSI: 5 >>> [ 26.511673] scsi 1:0:1:0: CD-ROME-IDECD-950E/AKU A4Q >>> PQ: 0 ANSI: 5 >> [...] >>> [ 60.216113] sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT >>> driverbyte=DRIVER_OK,SUGGEST_OK >>> [ 60.216124] end_request: I/O error, dev sda, sector 16460 >> I think this one's quite easy: PATA devices in libata are queue depth 1 >> (since they don't do NCQ). Thus, they're peculiarly sensitive to the >> bug where we fail over queue depth requests. >> >> On the other hand, I don't see how a filesystem request is getting >> REQ_FAILFAST ... unless there's a bio or readahead issue involved. >> Anyway, could you try this patch: >> >> http://marc.info/?l=linux-scsi&m=119592627425498 >> >> Which should fix the queue depth issue, and see if the errors go away? > > No, this one doesn't help... still happens with 2.6.24-rc3-mm2... -- laurent - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3-mm1: I/O error, system hangs
Le 25.11.2007 08:37, James Bottomley a écrit : > On Sat, 2007-11-24 at 23:59 +0100, Laurent Riffard wrote: >> Le 24.11.2007 14:26, James Bottomley a écrit : >>> OK, could you post dmesgs again, please. I actually tested this >> with an >>> aic79xx card, and for me it does cause Domain Validation to succeed >>> again. >> James, >> >> Here is a dmesg produced by 2.6.24-rc3-mm1 + your patch "separates >> the >> BLOCK and QUIESCE states >> correctly" (http://lkml.org/lkml/2007/11/24/8). >> >> How to reproduce : >> - boot >> - switch to a text console >> - capture dmesg in a file, sync, etc. There are 3 I/O errors, but the >> system does work. >> - switch to X console, log in the Gnome Desktop, the system partially >> hangs. >> - switch back to a text console: dmesg(1) still works, it shows some >> additonal I/O errors. At this point, any disk access makes the system >> completely hung. >> >> Additionnal data: >> - the I/O errors always happen on the same blocks. >> >> plain text document attachment (dmesg-2.6.24-rc3-mm1-patched) > [...] >> [ 25.521256] scsi0 : pata_via >> [ 25.521711] scsi1 : pata_via >> [ 25.524089] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xb800 irq >> 14 >> [ 25.524176] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xb808 irq >> 15 >> [ 25.683141] ata1.00: ATA-5: ST340016A, 3.75, max UDMA/100 >> [ 25.683208] ata1.00: 78165360 sectors, multi 16: LBA >> [ 25.683475] ata1.01: ATA-7: Maxtor 6Y080L0, YAR41BW0, max UDMA/133 >> [ 25.684116] ata1.01: 160086528 sectors, multi 16: LBA >> [ 25.691127] ata1.00: configured for UDMA/100 >> [ 25.699142] ata1.01: configured for UDMA/100 >> [ 26.170860] ata2.00: ATAPI: HL-DT-ST DVDRAM GSA-4165B, DL05, max UDMA/33 >> [ 26.171562] ata2.01: ATAPI: CD-950E/AKU, A4Q, max MWDMA2, CDB intr >> [ 26.330839] ata2.00: configured for UDMA/33 >> [ 26.490828] ata2.01: configured for MWDMA2 >> [ 26.503014] scsi 0:0:0:0: Direct-Access ATA ST340016A 3.75 PQ: 0 >> ANSI: 5 >> [ 26.504670] scsi 0:0:1:0: Direct-Access ATA Maxtor 6Y080L0 YAR4 >> PQ: 0 ANSI: 5 >> [ 26.509842] scsi 1:0:0:0: CD-ROMHL-DT-ST DVDRAM GSA-4165B >> DL05 PQ: 0 ANSI: 5 >> [ 26.511673] scsi 1:0:1:0: CD-ROME-IDECD-950E/AKU A4Q PQ: >> 0 ANSI: 5 > [...] >> [ 60.216113] sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT >> driverbyte=DRIVER_OK,SUGGEST_OK >> [ 60.216124] end_request: I/O error, dev sda, sector 16460 > > I think this one's quite easy: PATA devices in libata are queue depth 1 > (since they don't do NCQ). Thus, they're peculiarly sensitive to the > bug where we fail over queue depth requests. > > On the other hand, I don't see how a filesystem request is getting > REQ_FAILFAST ... unless there's a bio or readahead issue involved. > Anyway, could you try this patch: > > http://marc.info/?l=linux-scsi&m=119592627425498 > > Which should fix the queue depth issue, and see if the errors go away? No, this one doesn't help... -- laurent - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3-mm1: I/O error, system hangs
Le 24.11.2007 14:26, James Bottomley a écrit : > On Sat, 2007-11-24 at 13:57 +0100, Laurent Riffard wrote: >> Le 24.11.2007 07:42, James Bottomley a écrit : >>> On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote: >>>> Le 23.11.2007 12:38, Hannes Reinecke a écrit : [snip] >>>> I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 >>>> does fix the problem. >>>> >>>>>> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an >>>>>> error where >>>>>> I shouldn't. Checking ... >>>>>> >>>>> Ok, found it. We are blocking even special commands (ie requests with >>>>> PREEMPT not set) >>>>> when FAILFAST is set. Which is clearly wrong. The attached patch fixes >>>>> this. >>>> Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O >>>> errors. >>> I think the problem is the way we treat BLOCKED and QUIESCED (the latter >>> is the state that the domain validation uses and which we cannot kill >>> fastfail on). It's definitely wrong to kill fastfail requests when the >>> state is QUIESCE. >>> >>> This patch (which is applied on top of Hannes original) separates the >>> BLOCK and QUIESCE states correctly ... does this fix the problem? >> >> No, it doesn't help... (2.6.24-rc3-mm1 + your patch still has problems) > > OK, could you post dmesgs again, please. I actually tested this with an > aic79xx card, and for me it does cause Domain Validation to succeed > again. James, Here is a dmesg produced by 2.6.24-rc3-mm1 + your patch "separates the BLOCK and QUIESCE states correctly" (http://lkml.org/lkml/2007/11/24/8). How to reproduce : - boot - switch to a text console - capture dmesg in a file, sync, etc. There are 3 I/O errors, but the system does work. - switch to X console, log in the Gnome Desktop, the system partially hangs. - switch back to a text console: dmesg(1) still works, it shows some additonal I/O errors. At this point, any disk access makes the system completely hung. Additionnal data: - the I/O errors always happen on the same blocks. -- laurent [0.00] Linux version 2.6.24-rc3-mm1 ([EMAIL PROTECTED]) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #122 PREEMPT Fri Nov 23 18:47:58 CET 2007 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000f - 0010 (reserved) [0.00] BIOS-e820: 0010 - 1ffec000 (usable) [0.00] BIOS-e820: 1ffec000 - 1ffef000 (ACPI data) [0.00] BIOS-e820: 1ffef000 - 1000 (reserved) [0.00] BIOS-e820: 1000 - 2000 (ACPI NVS) [0.00] BIOS-e820: - 0001 (reserved) [0.00] 511MB LOWMEM available. [0.00] Entering add_active_range(0, 0, 131052) 0 entries of 256 used [0.00] sizeof(struct page) = 32 [0.00] Zone PFN ranges: [0.00] DMA 0 -> 4096 [0.00] Normal 4096 -> 131052 [0.00] Movable zone start PFN for each node [0.00] early_node_map[1] active PFN ranges [0.00] 0:0 -> 131052 [0.00] On node 0 totalpages: 131052 [0.00] Node 0 memmap at 0xC100 size 4194304 first pfn 0xC100 [0.00] DMA zone: 32 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 4064 pages, LIFO batch:0 [0.00] Normal zone: 991 pages used for memmap [0.00] Normal zone: 125965 pages, LIFO batch:31 [0.00] Movable zone: 0 pages used for memmap [0.00] DMI 2.3 present. [0.00] ACPI: RSDP 000F6A80, 0014 (r0 ASUS ) [0.00] ACPI: RSDT 1FFEC000, 002C (r1 ASUS A7V133-C 30303031 MSFT 31313031) [0.00] ACPI: FACP 1FFEC080, 0074 (r1 ASUS A7V133-C 30303031 MSFT 31313031) [0.00] ACPI: DSDT 1FFEC100, 2CE1 (r1 ASUS A7V133-C 1000 MSFT 10B) [0.00] ACPI: FACS 1000, 0040 [0.00] ACPI: BOOT 1FFEC040, 0028 (r1 ASUS A7V133-C 30303031 MSFT 31313031) [0.00] ACPI: PM-Timer IO Port: 0xe408 [0.00] Allocating PCI resources starting at 3000 (gap: 2000:dfff) [0.00] swsusp: Registered nosave memory region: 0009f000 - 000a [0.00] swsusp: Registered nosave memory region: 000a - 000f [0.00] swsusp: Registered nosave memory region: 000f - 0010 [0.
Re: 2.6.24-rc3-mm1: I/O error, system hangs
Le 24.11.2007 07:42, James Bottomley a écrit : > On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote: >> Le 23.11.2007 12:38, Hannes Reinecke a écrit : >>> Hannes Reinecke wrote: >>>> Laurent Riffard wrote: >>>>> Le 21.11.2007 23:41, Andrew Morton a écrit : >>>>>> On Wed, 21 Nov 2007 22:45:22 +0100 >>>>>> Laurent Riffard <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>>> Le 21.11.2007 05:45, Andrew Morton a écrit : >>>>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/ >>>>>>> Hello, >>>>>>> >>>>>>> My system hangs shortly after I logged in Gnome desktop. SysRq-W shows >>>>>>> that a bunch of task are blocked in "D" state, they seem to wait for >>>>>>> some I/O completion. I can try to hand-copy some data if requested. >>>>>>> >>>>>>> I found these messages in dmesg: >>>>>>> >>>>>>> ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 >>>>>>> EXT3-fs: mounted filesystem with ordered data mode. >>>>>>> sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT >>>>>>> driverbyte=DRIVER_OK,SUGGEST_OK >>>>>>> end_request: I/O error, dev sda, sector 16460 >>>>>>> ReiserFS: sda7: found reiserfs format "3.6" with standard journal >>>>>>> ReiserFS: sda7: using ordered data mode >>>>>>> -- >>>>>>> ReiserFS: sda7: Using r5 hash to sort names >>>>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT >>>>>>> driverbyte=DRIVER_OK,SUGGEST_OK >>>>>>> end_request: I/O error, dev sdb, sector 19632 >>>>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT >>>>>>> driverbyte=DRIVER_OK,SUGGEST_OK >>>>>>> end_request: I/O error, dev sdb, sector 40037363 >>>>>>> Adding 1048568k swap on /dev/mapper/vglinux1-lvswap. Priority:-1 >>>>>>> extents:1 across:1048568k >>>>>>> lp0: using parport0 (interrupt-driven). >>>>>>> >>>>>>> These errors occur *only* with 2.6.24-rc3-mm1, they are 100% >>>>>>> reproducible. >>>>>>> 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine. >>>>>>> >>>>>>> Maybe something is broken in pata_via driver ? >>>>>>> >>>>>> Could be - >>>>>> libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch >>>>>> and >>>>>> pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch >>>>>> touch pata_via.c. >>>>> None of the above... >>>>> >>>>> I did a bisection, it spotted git-scsi-misc.patch. >>>>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine. >>>>> >>>>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not >>>>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other >>>>> commits are touching documentation or drivers I don't use. I'll try >>>>> to revert only this one this evening. >> I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 >> does fix the problem. >> >>>> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an >>>> error where >>>> I shouldn't. Checking ... >>>> >>> Ok, found it. We are blocking even special commands (ie requests with >>> PREEMPT not set) >>> when FAILFAST is set. Which is clearly wrong. The attached patch fixes this. >> Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O >> errors. > > I think the problem is the way we treat BLOCKED and QUIESCED (the latter > is the state that the domain validation uses and which we cannot kill > fastfail on). It's definitely wrong to kill fastfail requests when the > state is QUIESCE. > > This patch (which is applied on top of Hannes original) separates the > BLOCK and QUIESCE states correctly ... does this fix the problem? No, it doesn't help... (2.6.24-rc3-mm1 + your patch still has problems) > James > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 13e7e09..a7cf23a 100644 >
Re: 2.6.24-rc3-mm1: I/O error, system hangs
Le 23.11.2007 12:38, Hannes Reinecke a écrit : > Hannes Reinecke wrote: >> Laurent Riffard wrote: >>> Le 21.11.2007 23:41, Andrew Morton a écrit : >>>> On Wed, 21 Nov 2007 22:45:22 +0100 >>>> Laurent Riffard <[EMAIL PROTECTED]> wrote: >>>> >>>>> Le 21.11.2007 05:45, Andrew Morton a écrit : >>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/ >>>>> Hello, >>>>> >>>>> My system hangs shortly after I logged in Gnome desktop. SysRq-W shows >>>>> that a bunch of task are blocked in "D" state, they seem to wait for >>>>> some I/O completion. I can try to hand-copy some data if requested. >>>>> >>>>> I found these messages in dmesg: >>>>> >>>>> ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 >>>>> EXT3-fs: mounted filesystem with ordered data mode. >>>>> sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT >>>>> driverbyte=DRIVER_OK,SUGGEST_OK >>>>> end_request: I/O error, dev sda, sector 16460 >>>>> ReiserFS: sda7: found reiserfs format "3.6" with standard journal >>>>> ReiserFS: sda7: using ordered data mode >>>>> -- >>>>> ReiserFS: sda7: Using r5 hash to sort names >>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT >>>>> driverbyte=DRIVER_OK,SUGGEST_OK >>>>> end_request: I/O error, dev sdb, sector 19632 >>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT >>>>> driverbyte=DRIVER_OK,SUGGEST_OK >>>>> end_request: I/O error, dev sdb, sector 40037363 >>>>> Adding 1048568k swap on /dev/mapper/vglinux1-lvswap. Priority:-1 >>>>> extents:1 across:1048568k >>>>> lp0: using parport0 (interrupt-driven). >>>>> >>>>> These errors occur *only* with 2.6.24-rc3-mm1, they are 100% reproducible. >>>>> 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine. >>>>> >>>>> Maybe something is broken in pata_via driver ? >>>>> >>>> Could be - >>>> libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch >>>> and >>>> pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch >>>> touch pata_via.c. >>> None of the above... >>> >>> I did a bisection, it spotted git-scsi-misc.patch. >>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine. >>> >>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not >>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other >>> commits are touching documentation or drivers I don't use. I'll try >>> to revert only this one this evening. I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 does fix the problem. >> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an error >> where >> I shouldn't. Checking ... >> > Ok, found it. We are blocking even special commands (ie requests with PREEMPT > not set) > when FAILFAST is set. Which is clearly wrong. The attached patch fixes this. Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O errors. -- laurent - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3-mm1: I/O error, system hangs
Le 21.11.2007 23:41, Andrew Morton a écrit : > On Wed, 21 Nov 2007 22:45:22 +0100 > Laurent Riffard <[EMAIL PROTECTED]> wrote: > >> Le 21.11.2007 05:45, Andrew Morton a écrit : >>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/ >> Hello, >> >> My system hangs shortly after I logged in Gnome desktop. SysRq-W shows >> that a bunch of task are blocked in "D" state, they seem to wait for >> some I/O completion. I can try to hand-copy some data if requested. >> >> I found these messages in dmesg: >> >> ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 >> EXT3-fs: mounted filesystem with ordered data mode. >> sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT >> driverbyte=DRIVER_OK,SUGGEST_OK >> end_request: I/O error, dev sda, sector 16460 >> ReiserFS: sda7: found reiserfs format "3.6" with standard journal >> ReiserFS: sda7: using ordered data mode >> -- >> ReiserFS: sda7: Using r5 hash to sort names >> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT >> driverbyte=DRIVER_OK,SUGGEST_OK >> end_request: I/O error, dev sdb, sector 19632 >> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT >> driverbyte=DRIVER_OK,SUGGEST_OK >> end_request: I/O error, dev sdb, sector 40037363 >> Adding 1048568k swap on /dev/mapper/vglinux1-lvswap. Priority:-1 extents:1 >> across:1048568k >> lp0: using parport0 (interrupt-driven). >> >> These errors occur *only* with 2.6.24-rc3-mm1, they are 100% reproducible. >> 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine. >> >> Maybe something is broken in pata_via driver ? >> > > Could be - > libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch > and > pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch > touch pata_via.c. None of the above... I did a bisection, it spotted git-scsi-misc.patch. I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine. I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not requeue requests if REQ_FAILFAST is set" is the real culprit. The other commits are touching documentation or drivers I don't use. I'll try to revert only this one this evening. -- laurent - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 7667] New: BUG at drivers/scsi/scsi_lib.c:1118 caused by "pktsetup dvd /dev/sr0"
This BUG (http://bugzilla.kernel.org/show_bug.cgi?id=7667) still happens with 2.6.20-rc2-mm1. Fortunately, Christoph Hellwig's patch (http://bugzilla.kernel.org/show_bug.cgi?id=7667#c5) still fix it. -- laurent - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 7667] New: BUG at drivers/scsi/scsi_lib.c:1118 caused by "pktsetup dvd /dev/sr0"
Le 12.12.2006 15:21, Boaz Harrosh a écrit : Christoph Hellwig wrote: On Tue, Dec 12, 2006 at 11:38:42AM +0100, Christoph Hellwig wrote: This is because the packet driver tries to send down read/write BLOCK_PC commands that don't use a bio and do not use sg lists. As part of the patch you mentioned I added strict assertations for that case because the scsi layer doesn't handle those anymore. The right fix is to replace all the packet_command stuff in the packet driver by scsi_execute() which needs to be lifted from scsi code to the block code for that. I'll prepare a patch this weekend unless someone beets me in doing that work. Please try the patch below to fix the bug for now. It's not the full way to a generic execute block pc infrastcuture but should fix the bug for the time beeing: Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]> Index: linux-2.6/drivers/block/pktcdvd.c === --- linux-2.6.orig/drivers/block/pktcdvd.c 2006-12-12 11:30:45.0 +0100 +++ linux-2.6/drivers/block/pktcdvd.c 2006-12-12 14:23:37.0 +0100 @@ -765,47 +765,34 @@ */ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *cgc) { - char sense[SCSI_SENSE_BUFFERSIZE]; - request_queue_t *q; + request_queue_t *q = bdev_get_queue(pd->bdev); struct request *rq; - DECLARE_COMPLETION_ONSTACK(wait); - int err = 0; + int ret = 0; - q = bdev_get_queue(pd->bdev); + rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ? +WRITE : READ, __GFP_WAIT); + + if (cgc->buflen) { + if (blk_rq_map_kern(q, rq, cgc->buffer, cgc->buflen, __GFP_WAIT)) + goto out; + } + + rq->cmd_len = COMMAND_SIZE(rq->cmd[0]); + memcpy(rq->cmd, cgc->cmd, CDROM_PACKET_SIZE); + if (sizeof(rq->cmd) > CDROM_PACKET_SIZE) + memset(rq->cmd + CDROM_PACKET_SIZE, 0, sizeof(rq->cmd) - CDROM_PACKET_SIZE); - rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ? WRITE : READ, -__GFP_WAIT); - rq->errors = 0; - rq->rq_disk = pd->bdev->bd_disk; - rq->bio = NULL; - rq->buffer = NULL; rq->timeout = 60*HZ; - rq->data = cgc->buffer; - rq->data_len = cgc->buflen; - rq->sense = sense; - memset(sense, 0, sizeof(sense)); - rq->sense_len = 0; rq->cmd_type = REQ_TYPE_BLOCK_PC; rq->cmd_flags |= REQ_HARDBARRIER; if (cgc->quiet) rq->cmd_flags |= REQ_QUIET; - memcpy(rq->cmd, cgc->cmd, CDROM_PACKET_SIZE); - if (sizeof(rq->cmd) > CDROM_PACKET_SIZE) - memset(rq->cmd + CDROM_PACKET_SIZE, 0, sizeof(rq->cmd) - CDROM_PACKET_SIZE); - rq->cmd_len = COMMAND_SIZE(rq->cmd[0]); - - rq->ref_count++; - rq->end_io_data = &wait; - rq->end_io = blk_end_sync_rq; - elv_add_request(q, rq, ELEVATOR_INSERT_BACK, 1); - generic_unplug_device(q); - wait_for_completion(&wait); - - if (rq->errors) - err = -EIO; + blk_execute_rq(rq->q, pd->bdev->bd_disk, rq, 0); + ret = rq->errors; +out: blk_put_request(rq); - return err; + return ret; } /* - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I'm afraid this might not be enough because of code in drivers/ide/ide-cd.c. It does IO off of request->data. [background] pkt_generic_packet and ton of other places mainly cd(s), floppy(s), and other ide code paths, are using what I call BLACK requests. They put some data on request->buffer or request->data stick it in the Q and than advance on them later down the ladder. Remove of "buffer" and "data" from struct request will reveal all these places. At one time I had plans to do just that. But 1/2 way through I gave up, it is too risky, too much Hardware that I don't have, that needs checking. below patch combined with your patch might get a bit closer for this code path. At struct request I have changed the name of "data" member to "user_data". than changed the code paths that used "data" as IO to use request->buffer instead. This is just as bad but is a more common practice. I suspect there is a problem with what I did in scsi_lib.c Christoph please check me out with the new BUG_ON. Mainly what you need from below is only the code in ide-cd.c. (And there are 3-4 places that do exactly like pkt_generic_packet, though I'm not sure they end up through SCSI. At first I thought this code doesn't either) [patch snipped] Christoph's patch fixed the BUG, while Boaz's patch didn't fix anything (both tested with kernel 2.6.16-rc6-mm2). Please note I don't use ide-cd, I use libata+pata_via+sr_mod. Boaz, when you wrote "below patch combined with y