Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-28 Thread Laurent Riffard
Le 25.11.2007 21:39, Laurent Riffard a écrit :
> Le 25.11.2007 08:37, James Bottomley a écrit :
>> On Sat, 2007-11-24 at 23:59 +0100, Laurent Riffard wrote:
>>> Le 24.11.2007 14:26, James Bottomley a écrit :
>>>> OK, could you post dmesgs again, please.  I actually tested this
>>> with an
>>>> aic79xx card, and for me it does cause Domain Validation to succeed
>>>> again.
>>> James, 
>>>
>>> Here is a dmesg produced by 2.6.24-rc3-mm1 + your patch "separates
>>> the 
>>> BLOCK and QUIESCE states
>>> correctly" (http://lkml.org/lkml/2007/11/24/8).
>>>
[...]
>>> [   25.521256] scsi0 : pata_via
>>> [   25.521711] scsi1 : pata_via
>>> [   25.524089] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xb800 irq 
>>> 14
>>> [   25.524176] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xb808 irq 
>>> 15
>>> [   25.683141] ata1.00: ATA-5: ST340016A, 3.75, max UDMA/100
>>> [   25.683208] ata1.00: 78165360 sectors, multi 16: LBA 
>>> [   25.683475] ata1.01: ATA-7: Maxtor 6Y080L0, YAR41BW0, max UDMA/133
>>> [   25.684116] ata1.01: 160086528 sectors, multi 16: LBA 
>>> [   25.691127] ata1.00: configured for UDMA/100
>>> [   25.699142] ata1.01: configured for UDMA/100
>>> [   26.170860] ata2.00: ATAPI: HL-DT-ST DVDRAM GSA-4165B, DL05, max UDMA/33
>>> [   26.171562] ata2.01: ATAPI: CD-950E/AKU, A4Q, max MWDMA2, CDB intr
>>> [   26.330839] ata2.00: configured for UDMA/33
>>> [   26.490828] ata2.01: configured for MWDMA2
>>> [   26.503014] scsi 0:0:0:0: Direct-Access ATA  ST340016A 3.75 PQ: 
>>> 0 ANSI: 5
>>> [   26.504670] scsi 0:0:1:0: Direct-Access ATA  Maxtor 6Y080L0 YAR4 
>>> PQ: 0 ANSI: 5
>>> [   26.509842] scsi 1:0:0:0: CD-ROMHL-DT-ST DVDRAM GSA-4165B 
>>> DL05 PQ: 0 ANSI: 5
>>> [   26.511673] scsi 1:0:1:0: CD-ROME-IDECD-950E/AKU A4Q  
>>> PQ: 0 ANSI: 5
>> [...]
>>> [   60.216113] sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
>>> driverbyte=DRIVER_OK,SUGGEST_OK
>>> [   60.216124] end_request: I/O error, dev sda, sector 16460
>> I think this one's quite easy:  PATA devices in libata are queue depth 1
>> (since they don't do NCQ).  Thus, they're peculiarly sensitive to the
>> bug where we fail over queue depth requests.
>>
>> On the other hand, I don't see how a filesystem request is getting
>> REQ_FAILFAST ... unless there's a bio or readahead issue involved.
>> Anyway, could you try this patch:
>>
>> http://marc.info/?l=linux-scsi&m=119592627425498
>>
>> Which should fix the queue depth issue, and see if the errors go away?
> 
> No, this one doesn't help...
 
still happens with 2.6.24-rc3-mm2...
-- 
laurent
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-25 Thread Laurent Riffard
Le 25.11.2007 08:37, James Bottomley a écrit :
> On Sat, 2007-11-24 at 23:59 +0100, Laurent Riffard wrote:
>> Le 24.11.2007 14:26, James Bottomley a écrit :
>>> OK, could you post dmesgs again, please.  I actually tested this
>> with an
>>> aic79xx card, and for me it does cause Domain Validation to succeed
>>> again.
>> James, 
>>
>> Here is a dmesg produced by 2.6.24-rc3-mm1 + your patch "separates
>> the 
>> BLOCK and QUIESCE states
>> correctly" (http://lkml.org/lkml/2007/11/24/8).
>>
>> How to reproduce :
>> - boot
>> - switch to a text console
>> - capture dmesg in a file, sync, etc. There are 3 I/O errors, but the 
>>   system does work.
>> - switch to X console, log in the Gnome Desktop, the system partially 
>>   hangs.
>> - switch back to a text console: dmesg(1) still works, it shows some 
>>   additonal I/O errors. At this point, any disk access makes the system 
>>   completely hung.
>>
>> Additionnal data:
>> - the I/O errors always happen on the same blocks.
>>
>> plain text document attachment (dmesg-2.6.24-rc3-mm1-patched)
> [...]
>> [   25.521256] scsi0 : pata_via
>> [   25.521711] scsi1 : pata_via
>> [   25.524089] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xb800 irq 
>> 14
>> [   25.524176] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xb808 irq 
>> 15
>> [   25.683141] ata1.00: ATA-5: ST340016A, 3.75, max UDMA/100
>> [   25.683208] ata1.00: 78165360 sectors, multi 16: LBA 
>> [   25.683475] ata1.01: ATA-7: Maxtor 6Y080L0, YAR41BW0, max UDMA/133
>> [   25.684116] ata1.01: 160086528 sectors, multi 16: LBA 
>> [   25.691127] ata1.00: configured for UDMA/100
>> [   25.699142] ata1.01: configured for UDMA/100
>> [   26.170860] ata2.00: ATAPI: HL-DT-ST DVDRAM GSA-4165B, DL05, max UDMA/33
>> [   26.171562] ata2.01: ATAPI: CD-950E/AKU, A4Q, max MWDMA2, CDB intr
>> [   26.330839] ata2.00: configured for UDMA/33
>> [   26.490828] ata2.01: configured for MWDMA2
>> [   26.503014] scsi 0:0:0:0: Direct-Access ATA  ST340016A 3.75 PQ: 0 
>> ANSI: 5
>> [   26.504670] scsi 0:0:1:0: Direct-Access ATA  Maxtor 6Y080L0 YAR4 
>> PQ: 0 ANSI: 5
>> [   26.509842] scsi 1:0:0:0: CD-ROMHL-DT-ST DVDRAM GSA-4165B 
>> DL05 PQ: 0 ANSI: 5
>> [   26.511673] scsi 1:0:1:0: CD-ROME-IDECD-950E/AKU A4Q  PQ: 
>> 0 ANSI: 5
> [...]
>> [   60.216113] sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
>> driverbyte=DRIVER_OK,SUGGEST_OK
>> [   60.216124] end_request: I/O error, dev sda, sector 16460
> 
> I think this one's quite easy:  PATA devices in libata are queue depth 1
> (since they don't do NCQ).  Thus, they're peculiarly sensitive to the
> bug where we fail over queue depth requests.
> 
> On the other hand, I don't see how a filesystem request is getting
> REQ_FAILFAST ... unless there's a bio or readahead issue involved.
> Anyway, could you try this patch:
> 
> http://marc.info/?l=linux-scsi&m=119592627425498
> 
> Which should fix the queue depth issue, and see if the errors go away?

No, this one doesn't help...

-- 
laurent
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-24 Thread Laurent Riffard
Le 24.11.2007 14:26, James Bottomley a écrit :
> On Sat, 2007-11-24 at 13:57 +0100, Laurent Riffard wrote:
>> Le 24.11.2007 07:42, James Bottomley a écrit :
>>> On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote:
>>>> Le 23.11.2007 12:38, Hannes Reinecke a écrit :
[snip]
>>>> I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 
>>>> does fix the problem.
>>>>
>>>>>> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an 
>>>>>> error where
>>>>>> I shouldn't. Checking ...
>>>>>>
>>>>> Ok, found it. We are blocking even special commands (ie requests with 
>>>>> PREEMPT not set)
>>>>> when FAILFAST is set. Which is clearly wrong. The attached patch fixes 
>>>>> this.
>>>> Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O 
>>>> errors.
>>> I think the problem is the way we treat BLOCKED and QUIESCED (the latter
>>> is the state that the domain validation uses and which we cannot kill
>>> fastfail on).  It's definitely wrong to kill fastfail requests when the
>>> state is QUIESCE.
>>>
>>> This patch (which is applied on top of Hannes original) separates the
>>> BLOCK and QUIESCE states correctly ... does this fix the problem?
>>
>> No, it doesn't help... (2.6.24-rc3-mm1 + your patch still has problems)
> 
> OK, could you post dmesgs again, please.  I actually tested this with an
> aic79xx card, and for me it does cause Domain Validation to succeed
> again.

James, 

Here is a dmesg produced by 2.6.24-rc3-mm1 + your patch "separates the 
BLOCK and QUIESCE states correctly" (http://lkml.org/lkml/2007/11/24/8).

How to reproduce :
- boot
- switch to a text console
- capture dmesg in a file, sync, etc. There are 3 I/O errors, but the 
  system does work.
- switch to X console, log in the Gnome Desktop, the system partially 
  hangs.
- switch back to a text console: dmesg(1) still works, it shows some 
  additonal I/O errors. At this point, any disk access makes the system 
  completely hung.

Additionnal data:
- the I/O errors always happen on the same blocks.

-- 
laurent
[0.00] Linux version 2.6.24-rc3-mm1 ([EMAIL PROTECTED]) (gcc version 
4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #122 PREEMPT Fri Nov 23 
18:47:58 CET 2007
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 1ffec000 (usable)
[0.00]  BIOS-e820: 1ffec000 - 1ffef000 (ACPI data)
[0.00]  BIOS-e820: 1ffef000 - 1000 (reserved)
[0.00]  BIOS-e820: 1000 - 2000 (ACPI NVS)
[0.00]  BIOS-e820:  - 0001 (reserved)
[0.00] 511MB LOWMEM available.
[0.00] Entering add_active_range(0, 0, 131052) 0 entries of 256 used
[0.00] sizeof(struct page) = 32
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   Normal   4096 ->   131052
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[1] active PFN ranges
[0.00] 0:0 ->   131052
[0.00] On node 0 totalpages: 131052
[0.00] Node 0 memmap at 0xC100 size 4194304 first pfn 0xC100
[0.00]   DMA zone: 32 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 4064 pages, LIFO batch:0
[0.00]   Normal zone: 991 pages used for memmap
[0.00]   Normal zone: 125965 pages, LIFO batch:31
[0.00]   Movable zone: 0 pages used for memmap
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000F6A80, 0014 (r0 ASUS  )
[0.00] ACPI: RSDT 1FFEC000, 002C (r1 ASUS   A7V133-C 30303031 MSFT 
31313031)
[0.00] ACPI: FACP 1FFEC080, 0074 (r1 ASUS   A7V133-C 30303031 MSFT 
31313031)
[0.00] ACPI: DSDT 1FFEC100, 2CE1 (r1   ASUS A7V133-C 1000 MSFT  
10B)
[0.00] ACPI: FACS 1000, 0040
[0.00] ACPI: BOOT 1FFEC040, 0028 (r1 ASUS   A7V133-C 30303031 MSFT 
31313031)
[0.00] ACPI: PM-Timer IO Port: 0xe408
[0.00] Allocating PCI resources starting at 3000 (gap: 
2000:dfff)
[0.00] swsusp: Registered nosave memory region: 0009f000 - 
000a
[0.00] swsusp: Registered nosave memory region: 000a - 
000f
[0.00] swsusp: Registered nosave memory region: 000f - 
0010
[0.

Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-24 Thread Laurent Riffard


Le 24.11.2007 07:42, James Bottomley a écrit :
> On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote:
>> Le 23.11.2007 12:38, Hannes Reinecke a écrit :
>>> Hannes Reinecke wrote:
>>>> Laurent Riffard wrote:
>>>>> Le 21.11.2007 23:41, Andrew Morton a écrit :
>>>>>> On Wed, 21 Nov 2007 22:45:22 +0100
>>>>>> Laurent Riffard <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> Le 21.11.2007 05:45, Andrew Morton a écrit :
>>>>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/
>>>>>>> Hello, 
>>>>>>>
>>>>>>> My system hangs shortly after I logged in Gnome desktop. SysRq-W shows
>>>>>>> that a bunch of task are blocked in "D" state, they seem to wait for
>>>>>>> some I/O completion. I can try to hand-copy some data if requested.
>>>>>>>
>>>>>>> I found these messages in dmesg:
>>>>>>>
>>>>>>> ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 
>>>>>>> EXT3-fs: mounted filesystem with ordered data mode.
>>>>>>> sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
>>>>>>> driverbyte=DRIVER_OK,SUGGEST_OK
>>>>>>> end_request: I/O error, dev sda, sector 16460
>>>>>>> ReiserFS: sda7: found reiserfs format "3.6" with standard journal
>>>>>>> ReiserFS: sda7: using ordered data mode
>>>>>>> --
>>>>>>> ReiserFS: sda7: Using r5 hash to sort names
>>>>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
>>>>>>> driverbyte=DRIVER_OK,SUGGEST_OK
>>>>>>> end_request: I/O error, dev sdb, sector 19632
>>>>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
>>>>>>> driverbyte=DRIVER_OK,SUGGEST_OK
>>>>>>> end_request: I/O error, dev sdb, sector 40037363
>>>>>>> Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 
>>>>>>> extents:1 across:1048568k
>>>>>>> lp0: using parport0 (interrupt-driven).
>>>>>>>
>>>>>>> These errors occur *only* with 2.6.24-rc3-mm1, they are 100% 
>>>>>>> reproducible.
>>>>>>> 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine.
>>>>>>>
>>>>>>> Maybe something is broken in pata_via driver ?
>>>>>>>
>>>>>> Could be - 
>>>>>> libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch
>>>>>> and 
>>>>>> pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch
>>>>>> touch pata_via.c.
>>>>> None of the above...
>>>>>
>>>>> I did a bisection, it spotted git-scsi-misc.patch. 
>>>>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine.
>>>>>
>>>>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not 
>>>>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other 
>>>>> commits are touching documentation or drivers I don't use. I'll try 
>>>>> to revert only this one this evening.
>> I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 
>> does fix the problem.
>>
>>>> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an 
>>>> error where
>>>> I shouldn't. Checking ...
>>>>
>>> Ok, found it. We are blocking even special commands (ie requests with 
>>> PREEMPT not set)
>>> when FAILFAST is set. Which is clearly wrong. The attached patch fixes this.
>> Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O 
>> errors.
> 
> I think the problem is the way we treat BLOCKED and QUIESCED (the latter
> is the state that the domain validation uses and which we cannot kill
> fastfail on).  It's definitely wrong to kill fastfail requests when the
> state is QUIESCE.
> 
> This patch (which is applied on top of Hannes original) separates the
> BLOCK and QUIESCE states correctly ... does this fix the problem?


No, it doesn't help... (2.6.24-rc3-mm1 + your patch still has problems)


> James
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 13e7e09..a7cf23a 100644
>

Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-23 Thread Laurent Riffard
Le 23.11.2007 12:38, Hannes Reinecke a écrit :
> Hannes Reinecke wrote:
>> Laurent Riffard wrote:
>>> Le 21.11.2007 23:41, Andrew Morton a écrit :
>>>> On Wed, 21 Nov 2007 22:45:22 +0100
>>>> Laurent Riffard <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Le 21.11.2007 05:45, Andrew Morton a écrit :
>>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/
>>>>> Hello, 
>>>>>
>>>>> My system hangs shortly after I logged in Gnome desktop. SysRq-W shows
>>>>> that a bunch of task are blocked in "D" state, they seem to wait for
>>>>> some I/O completion. I can try to hand-copy some data if requested.
>>>>>
>>>>> I found these messages in dmesg:
>>>>>
>>>>> ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 
>>>>> EXT3-fs: mounted filesystem with ordered data mode.
>>>>> sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
>>>>> driverbyte=DRIVER_OK,SUGGEST_OK
>>>>> end_request: I/O error, dev sda, sector 16460
>>>>> ReiserFS: sda7: found reiserfs format "3.6" with standard journal
>>>>> ReiserFS: sda7: using ordered data mode
>>>>> --
>>>>> ReiserFS: sda7: Using r5 hash to sort names
>>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
>>>>> driverbyte=DRIVER_OK,SUGGEST_OK
>>>>> end_request: I/O error, dev sdb, sector 19632
>>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
>>>>> driverbyte=DRIVER_OK,SUGGEST_OK
>>>>> end_request: I/O error, dev sdb, sector 40037363
>>>>> Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 
>>>>> extents:1 across:1048568k
>>>>> lp0: using parport0 (interrupt-driven).
>>>>>
>>>>> These errors occur *only* with 2.6.24-rc3-mm1, they are 100% reproducible.
>>>>> 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine.
>>>>>
>>>>> Maybe something is broken in pata_via driver ?
>>>>>
>>>> Could be - 
>>>> libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch
>>>> and 
>>>> pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch
>>>> touch pata_via.c.
>>> None of the above...
>>>
>>> I did a bisection, it spotted git-scsi-misc.patch. 
>>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine.
>>>
>>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not 
>>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other 
>>> commits are touching documentation or drivers I don't use. I'll try 
>>> to revert only this one this evening.

I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 
does fix the problem.

>> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an error 
>> where
>> I shouldn't. Checking ...
>>
> Ok, found it. We are blocking even special commands (ie requests with PREEMPT 
> not set)
> when FAILFAST is set. Which is clearly wrong. The attached patch fixes this.

Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O errors.

-- 
laurent

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-22 Thread Laurent Riffard

Le 21.11.2007 23:41, Andrew Morton a écrit :
> On Wed, 21 Nov 2007 22:45:22 +0100
> Laurent Riffard <[EMAIL PROTECTED]> wrote:
> 
>> Le 21.11.2007 05:45, Andrew Morton a écrit :
>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/
>> Hello, 
>>
>> My system hangs shortly after I logged in Gnome desktop. SysRq-W shows
>> that a bunch of task are blocked in "D" state, they seem to wait for
>> some I/O completion. I can try to hand-copy some data if requested.
>>
>> I found these messages in dmesg:
>>
>> ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 
>> EXT3-fs: mounted filesystem with ordered data mode.
>> sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
>> driverbyte=DRIVER_OK,SUGGEST_OK
>> end_request: I/O error, dev sda, sector 16460
>> ReiserFS: sda7: found reiserfs format "3.6" with standard journal
>> ReiserFS: sda7: using ordered data mode
>> --
>> ReiserFS: sda7: Using r5 hash to sort names
>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
>> driverbyte=DRIVER_OK,SUGGEST_OK
>> end_request: I/O error, dev sdb, sector 19632
>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
>> driverbyte=DRIVER_OK,SUGGEST_OK
>> end_request: I/O error, dev sdb, sector 40037363
>> Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 extents:1 
>> across:1048568k
>> lp0: using parport0 (interrupt-driven).
>>
>> These errors occur *only* with 2.6.24-rc3-mm1, they are 100% reproducible.
>> 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine.
>>
>> Maybe something is broken in pata_via driver ?
>>
> 
> Could be - 
> libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch
> and 
> pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch
> touch pata_via.c.

None of the above...

I did a bisection, it spotted git-scsi-misc.patch. 
I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine.

I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not 
requeue requests if REQ_FAILFAST is set" is the real culprit. The other 
commits are touching documentation or drivers I don't use. I'll try 
to revert only this one this evening.

-- 
laurent


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7667] New: BUG at drivers/scsi/scsi_lib.c:1118 caused by "pktsetup dvd /dev/sr0"

2006-12-28 Thread Laurent Riffard


This BUG (http://bugzilla.kernel.org/show_bug.cgi?id=7667) still happens with 
2.6.20-rc2-mm1.


Fortunately, Christoph Hellwig's patch 
(http://bugzilla.kernel.org/show_bug.cgi?id=7667#c5) still fix it.


--
laurent
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7667] New: BUG at drivers/scsi/scsi_lib.c:1118 caused by "pktsetup dvd /dev/sr0"

2006-12-12 Thread Laurent Riffard

Le 12.12.2006 15:21, Boaz Harrosh a écrit :

Christoph Hellwig wrote:

On Tue, Dec 12, 2006 at 11:38:42AM +0100, Christoph Hellwig wrote:

This is because the packet driver tries to send down read/write
BLOCK_PC commands that don't use a bio and do not use sg lists.
As part of the patch you mentioned I added strict assertations for that
case because the scsi layer doesn't handle those anymore.

The right fix is to replace all the packet_command stuff in the packet
driver by scsi_execute() which needs to be lifted from scsi code to
the block code for that.  I'll prepare a patch this weekend unless
someone beets me in doing that work.

Please try the patch below to fix the bug for now.  It's not the
full way to a generic execute block pc infrastcuture but should fix
the bug for the time beeing:


Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

Index: linux-2.6/drivers/block/pktcdvd.c
===
--- linux-2.6.orig/drivers/block/pktcdvd.c  2006-12-12 11:30:45.0 
+0100
+++ linux-2.6/drivers/block/pktcdvd.c   2006-12-12 14:23:37.0 +0100
@@ -765,47 +765,34 @@
  */
 static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command 
*cgc)
 {
-   char sense[SCSI_SENSE_BUFFERSIZE];
-   request_queue_t *q;
+   request_queue_t *q = bdev_get_queue(pd->bdev);
struct request *rq;
-   DECLARE_COMPLETION_ONSTACK(wait);
-   int err = 0;
+   int ret = 0;
 
-	q = bdev_get_queue(pd->bdev);

+   rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ?
+WRITE : READ, __GFP_WAIT);
+
+   if (cgc->buflen) {
+   if (blk_rq_map_kern(q, rq, cgc->buffer, cgc->buflen, 
__GFP_WAIT))
+   goto out;
+   }
+
+   rq->cmd_len = COMMAND_SIZE(rq->cmd[0]);
+   memcpy(rq->cmd, cgc->cmd, CDROM_PACKET_SIZE);
+   if (sizeof(rq->cmd) > CDROM_PACKET_SIZE)
+   memset(rq->cmd + CDROM_PACKET_SIZE, 0, sizeof(rq->cmd) - 
CDROM_PACKET_SIZE);
 
-	rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ? WRITE : READ,

-__GFP_WAIT);
-   rq->errors = 0;
-   rq->rq_disk = pd->bdev->bd_disk;
-   rq->bio = NULL;
-   rq->buffer = NULL;
rq->timeout = 60*HZ;
-   rq->data = cgc->buffer;
-   rq->data_len = cgc->buflen;
-   rq->sense = sense;
-   memset(sense, 0, sizeof(sense));
-   rq->sense_len = 0;
rq->cmd_type = REQ_TYPE_BLOCK_PC;
rq->cmd_flags |= REQ_HARDBARRIER;
if (cgc->quiet)
rq->cmd_flags |= REQ_QUIET;
-   memcpy(rq->cmd, cgc->cmd, CDROM_PACKET_SIZE);
-   if (sizeof(rq->cmd) > CDROM_PACKET_SIZE)
-   memset(rq->cmd + CDROM_PACKET_SIZE, 0, sizeof(rq->cmd) - 
CDROM_PACKET_SIZE);
-   rq->cmd_len = COMMAND_SIZE(rq->cmd[0]);
-
-   rq->ref_count++;
-   rq->end_io_data = &wait;
-   rq->end_io = blk_end_sync_rq;
-   elv_add_request(q, rq, ELEVATOR_INSERT_BACK, 1);
-   generic_unplug_device(q);
-   wait_for_completion(&wait);
-
-   if (rq->errors)
-   err = -EIO;
 
+	blk_execute_rq(rq->q, pd->bdev->bd_disk, rq, 0);

+   ret = rq->errors;
+out:
blk_put_request(rq);
-   return err;
+   return ret;
 }
 
 /*

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

I'm afraid this might not be enough because of code in drivers/ide/ide-cd.c. It 
does IO off of request->data.

[background]
pkt_generic_packet and ton of other places mainly cd(s), floppy(s), and other 
ide code paths,
are using what I call BLACK requests. They put some data on request->buffer or 
request->data
stick it in the Q and than advance on them later down the ladder. Remove of "buffer" and 
"data" from
struct request will reveal all these places. At one time I had plans to do just 
that. But 1/2 way through I gave
up, it is too risky, too much Hardware that I don't have, that needs checking.

below patch combined with your patch might get a bit closer for this code path.
At struct request I have changed the name of "data" member to "user_data". than 
changed the code paths that used
"data" as IO to use request->buffer instead. This is just as bad but is a more 
common practice.

I suspect there is a problem with what I did in scsi_lib.c Christoph please 
check me out with the new BUG_ON.
Mainly what you need from below is only the code in ide-cd.c.
(And there are 3-4 places that do exactly like pkt_generic_packet, though I'm 
not sure they end up through SCSI.
At first I thought this code doesn't either)


[patch snipped]

Christoph's patch fixed the BUG, while Boaz's patch didn't fix anything 
(both tested with kernel 2.6.16-rc6-mm2). 


Please note I don't use ide-cd, I use libata+pata_via+sr_mod.

Boaz, when you wrote "below patch combined with y