date:20071210

Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-10 Thread Tejun Heo

Bill Davidsen wrote:
> Jan Engelhardt wrote:
>> On Dec 1 2007 06:26, Justin Piszcz wrote:
>>> I ran the following:
>>>
>>> dd if=/dev/zero of=/dev/sdc
>>> dd if=/dev/zero of=/dev/sdd
>>> dd if=/dev/zero of=/dev/sde
>>>
>>> (as it is always a very good idea to do this with any new disk)
>>
>> Why would you care about what's on the disk? fdisk, mkfs and
>> the day-to-day operation will overwrite it _anyway_.
>>
>> (If you think the disk is not empty, you should look at it
>> and copy off all usable warez beforehand :-)
>>
> Do you not test your drive for minimum functionality before using them?

I personally don't.

> Also, if you have the tools to check for relocated sectors before and
> after doing this, that's a good idea as well. S.M.A.R.T is your friend.
> And when writing /dev/zero to a drive, if it craps out you have less
> emotional attachment to the data.

Writing all zero isn't too useful tho.  Drive failing reallocation on
write is catastrophic failure.  It means that the drive wanna relocate
but can't because it used up all its extra space which usually indicates
something else is seriously wrong with the drive.  The drive will have
to go to the trash can.  This is all serious and bad but the catch is
that in such cases the problem usually stands like a sore thumb so
either vendor doesn't ship such drive or you'll find the failure very
early.  I personally haven't seen any such failure yet.  Maybe I'm lucky.

Most data loss occurs when the drive fails to read what it thought it
wrote successfully and the opposite - reading and dumping the whole disk
to /dev/null periodically is probably much better than writing zeros as
it allows the drive to find out deteriorating sector early while it's
still readable and relocate.  But then again I think it's an overkill.

Writing zeros to sectors is more useful as cure rather than prevention.
 If your drive fails to read a sector, write whatever value to the
sector.  The drive will forget about the data on the damaged sector and
reallocate and write new data to it.  Of course, you lose data which was
originally on the sector.

I personally think it's enough to just throw in an extra disk and make
it RAID0 or 5 and rebuild the array if read fails on one of the disks.
If write fails or read fail continues, replace the disk.  Of course, if
you wanna be extra cautious, good for you.  :-)

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-10 Thread Tejun Heo

Justin Piszcz wrote:
> The badblocks did not do anything; however, when I built a software raid
> 5 and the performed a dd:
> 
> /usr/bin/time dd if=/dev/zero of=fill_disk bs=1M
> 
> [42332.936615] ata5.00: exception Emask 0x2 SAct 0x7000 SErr 0x0 action
> 0x2 frozen
> [42332.936706] ata5.00: spurious completions during NCQ issue=0x0
> SAct=0x7000 FIS=004040a1:0800
> 
> Next test, I will turn off NCQ and try to make the problem re-occur.
> If anyone else has any thoughts here..?
> I ran long smart tests on all 3 disks, they all ran successfully.
> 
> Perhaps these drives need to be NCQ BLACKLISTED with the P35 chipset?

That was me being stupid.  Patches for both upstream and -stable
branches are posted.  These will go away.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ATA ACPI (was Re: Linux 2.6.24-rc4)

2007-12-10 Thread Tejun Heo

Maciej Rutecki wrote:
> 2007/12/5, Jeff Garzik <[EMAIL PROTECTED]>:
> 
>> _If_ libata is built into the kernel, and not a kernel module, then you
>> can supply "libata.noacpi=1" on the kernel command line.  I don't think
>> that works with modules.

JFYI: fix for this and other ACPI issues is being tested.  Please take a
look at the following bug.

  http://bugzilla.kernel.org/show_bug.cgi?id=9320

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Comments - using the qc_defer hook for serializing controllers ?

2007-12-10 Thread Tejun Heo

Alan Cox wrote:
> +/**
> + *   sl82c105_qc_defer   -   implement serialization
> + *   @qc: command
> + *
> + *   We must issue one command per host not per channel because
> + *   of the reset bug.
> + *
> + *   Q: is the scsi host lock sufficient ?
> + */
> +
> +static int sl82c105_qc_defer(struct ata_queued_cmd *qc)
> +{
> + struct ata_host *host = qc->ap->host;
> + int rc;
> +
> + /* First apply the usual rules */   
> + rc = ata_std_qc_defer(qc);
> + if (rc != 0)
> + return rc;
> +
> + /* Now apply serialization rules. Only allow a command if the
> +other channel state machine is idle */
> + if (host->port[0] != qc->ap && 
> + host->port[0]->hsm_task_state != HSM_ST_IDLE)
> + return  ATA_DEFER_PORT;
> + if (host->port[1] != qc->ap && 
> + host->port[1]->hsm_task_state != HSM_ST_IDLE)
> + return  ATA_DEFER_PORT;
> + return 0;
> +}

hsm_task_state is not necessarily protected by host lock.  I think
testing ap->qc_active would be better.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: PROBLEM: WARNING: at kernel/irq/manage.c:158 enable_irq() during boot

2007-12-10 Thread Tejun Heo

(cc'ing Bartlomiej)
Hello,

[EMAIL PROTECTED] wrote:
> Dec  6 11:58:23 titanium kernel: WARNING: at kernel/irq/manage.c:158 
> enable_irq()
> Dec  6 11:58:23 titanium kernel:  [] enable_irq+0x6e/0xa2
> Dec  6 11:58:23 titanium kernel:  [] probe_hwif+0x6d8/0x7c7 
> [ide_core]
> Dec  6 11:58:23 titanium kernel:  [] 
> probe_hwif_init_with_fixup+0xc/0x80 [ide_core]
> Dec  6 11:58:23 titanium kernel:  [] elf_core_dump+0x627/0xb60
> Dec  6 11:58:23 titanium kernel:  [] ide_setup_pci_device+0x6f/0x9c 
> [ide_core]
> Dec  6 11:58:23 titanium kernel:  [] pdc202new_init_one+0xf/0x10 
> [pdc202xx_new]
> Dec  6 11:58:23 titanium kernel:  [] pci_device_probe+0x36/0x55
> Dec  6 11:58:23 titanium kernel:  [] driver_probe_device+0xc8/0x14b
> Dec  6 11:58:23 titanium kernel:  [] __driver_attach+0x52/0x87
> Dec  6 11:58:23 titanium kernel:  [] bus_for_each_dev+0x35/0x57
> Dec  6 11:58:23 titanium kernel:  [] driver_attach+0x16/0x18
> Dec  6 11:58:23 titanium kernel:  [] __driver_attach+0x0/0x87
> Dec  6 11:58:23 titanium kernel:  [] bus_add_driver+0x6d/0x153
> Dec  6 11:58:23 titanium kernel:  [] __pci_register_driver+0x4b/0x77
> Dec  6 11:58:23 titanium kernel:  [] sys_init_module+0x1525/0x15fb
> Dec  6 11:58:23 titanium kernel:  [] 
> ide_config_drive_speed+0x0/0x314 [ide_core]
> Dec  6 11:58:23 titanium kernel:  [] syscall_call+0x7/0xb
> Dec  6 11:58:23 titanium kernel:  [] 
> wireless_nlevent_process+0x15/0x31
> Dec  6 11:58:23 titanium kernel:  ===

That means IRQ is being enabled more times than it should be.  Can you
please give a shot at 2.6.24-rc4 and see whether the problem is still there?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Hard drives only detected when booting from CD on nVidia MCP67 SATA

2007-12-10 Thread Tejun Heo

Chuck Ebbert wrote:
> On 12/05/2007 06:53 PM, Chuck Ebbert wrote:
>> With kernel 2.6.23 on an Acer 7220 notebook using nVidia MCP67 SATA,
>> hard drives are only detected after first booting from a CD.
>>
>> Boot from hard drive  No drives detected
>>
>> Boot live CD  Detected
>>
>> Boot CD to GRUB menu, Detected
>> then warm-boot from hard
>> drive
>>
>>
>> Non-detect case:
>>
>> ahci :00:09.0: version 2.3
>> ACPI: PCI Interrupt Link [LSI0] enabled at IRQ 23
>> ACPI: PCI Interrupt :00:09.0[A] -> Link [LSI0] -> GSI 23 (level, low) -> 
>> IRQ 16
>> input: ImPS/2 Generic Wheel Mouse as /class/input/input2
>> ahci :00:09.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl IDE mode
>> ahci :00:09.0: flags: 64bit sntf led clo pmp pio slum part
>> PCI: Setting latency timer of device :00:09.0 to 64
>> scsi0 : ahci
>> scsi1 : ahci
>> scsi2 : ahci
>> scsi3 : ahci
>> ata1: SATA max UDMA/133 cmd 0xf8854100 ctl 0x bmdma 0x irq 
>> 221
>> ata2: SATA max UDMA/133 cmd 0xf8854180 ctl 0x bmdma 0x irq 
>> 221
>> ata3: SATA max UDMA/133 cmd 0xf8854200 ctl 0x bmdma 0x irq 
>> 221
>> ata4: SATA max UDMA/133 cmd 0xf8854280 ctl 0x bmdma 0x irq 
>> 221
>> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> ata2: SATA link down (SStatus 0 SControl 300)
>> ata3: SATA link down (SStatus 0 SControl 300)
>> ata4: SATA link down (SStatus 0 SControl 300)
>> Waiting for driver initialization
>>
>> But, if a LiveCD is used to boot or if a LiveCD was used before an hot 
>> reboot 
>> (without a power off), disks are correctly found :
>>
>> Loading ahci.ko
>> ahci :00:09.0: version 2.3
>> ACPI: PCI Interrupt Link [LSI0] enabled at IRQ 23
>> ACPI: PCI Interrupt :00:09.0[A] -> Link [LSI0] -> GSI 23 (level, low) -> 
>> IRQ 16
>> input: ImPS/2 Generic Wheel Mouse as /class/input/input2
>> ahci :00:09.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl IDE mode
>> ahci :00:09.0: flags: 64bit sntf led clo pmp pio slum part
>> PCI: Setting latency timer of device :00:09.0 to 64
>> scsi0 : ahci
>> scsi1 : ahci
>> scsi2 : ahci
>> scsi3 : ahci
>> ata1: SATA max UDMA/133 cmd 0xf8854100 ctl 0x bmdma 0x irq 
>> 221
>> ata2: SATA max UDMA/133 cmd 0xf8854180 ctl 0x bmdma 0x irq 
>> 221
>> ata3: SATA max UDMA/133 cmd 0xf8854200 ctl 0x bmdma 0x irq 
>> 221
>> ata4: SATA max UDMA/133 cmd 0xf8854280 ctl 0x bmdma 0x irq 
>> 221
>> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> ata1.00: ATA-7: Hitachi HTS541612J9SA00, SBDOC70P, max UDMA/100
>> ata1.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 0/32)
>> ata1.00: configured for UDMA/100
>> ata2: SATA link down (SStatus 0 SControl 300)
>> ata3: SATA link down (SStatus 0 SControl 300)
>> ata4: SATA link down (SStatus 0 SControl 300)
>>
>>
> 
> Possibly fixed by:
> 
> Commit: 3cc3eb1148e4b2dfabf7a1dcf36fd8be1331ca95
> [libata] AHCI: enable AHCI mode, before using AHCI reset
> 
> Plus:
> 
> Commit: ab6fc95f609b372a19e18ea689986846ab1ba29c
> [libata] AHCI: fix newly introduced host-reset bug
> 
> ??

Is it fixed by the above two patches?  Or are you saying that the above
two patches look like they might fix the problem?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Failure with SATA DVD-RW

2007-12-10 Thread Tejun Heo

Andrew Morton wrote:
> (argh, shit, resent.  Please don't massage the cc list.  Do reply-to-all)
> 
> On Thu, 6 Dec 2007 01:33:16 + (UTC)
> Parag Warudkar <[EMAIL PROTECTED]> wrote:
> 
>> Tom Lanyon  gmail.com> writes:
>>
>>> scsi4: ahci
>>> ata5: SATA link up at 1.5 Gbps (SStatus 113 SControl 300)
>>> ata5.00: ATAPI, max UDMA/66
>>> ata5.00: qc timeout (cmd 0xef)
>>> ata5.00: failed to set xfermode (err_mask=0x104)
>>> ata5.00: limiting speed to UDMA/44
>>> ata5: failed to recover some devices, retrying in 5 secs
>>> ata5: port is slow to respond, please be patient (Status 0x80)
>>> ata5: port failed to respond (30 secs, status 0x80)
>>> ata5: COMRESET failed (device not ready)
>>> ata5: hardreset failed, retrying in 5 secs
>>> ata5: SATA link up at 1.5 Gbps (SStatus 113 SControl 300)
>>> ata5.00: ATAPI, max UDMA/66
>>> ata5.00: qc timeout (cmd 0xef)
>>> ata5.00: failed to set xfermode (err_mask=0x104)
>>> ata5.00: limiting speed to PIO0
>>> ata5: failed to recover some devices, retrying in 5 secs
>>> ata5: port is slow to respond, please be patient (Status 0x80)
>>> ata5: port failed to respond (30 secs, status 0x80)
>>> ata5: COMRESET failed (device not ready)
>>> ata5: hardreset failed, retrying in 5 secs
>>> ata5.00: ATAPI, max UDMA/66
>>> ata5.00: qc timeout (cmd 0xef)
>>> ata5.00: failed to set xfermode (err_mask=0x104)
>>> ata5.00: disabled
>>>
>> Looks like it is trying to set transfer mode to UDMA/66 and failing. After 
>> that it tried UDMA/44 and failed again. Next UDMA/66 again with unsurprising 
>> result - failed. After that PIO0 which seems to cause some kind of trouble, 
>> then it tries UDMA/66 again, and I am not stating the result again :) ! 
>>
>>> Any ideas what to try to get it working under AHCI?
>>>
>> I recall reading somewhere - the Pioneer drive needs UDMA/33 which it did 
>> not 
>> try in your case - need to some how have it try UDMA/33 but I don't find a 
>> boot parameter which will do that. So may be adding a quirk for this device 
>> to 
>> limit the xfer mode to 33 may work. 
>>
>> What does your dmesg output for the drives look like when you run in IDE 
>> compat mode? (Particularly the DMA for this drive?)
>>

Also, does irqpoll help?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ata_piix init/performance regression ?

2007-12-10 Thread Tejun Heo

[EMAIL PROTECTED] wrote:
> Hello !
> 
> on my old fujitsu-siemens lifebook, booting is at least 20 seconds slower as 
> before.
> 
> on ata_piix init i can see 2 longer delays of ~10 seconds each, which didn`t 
> happen before.
> 
> i`m using SuSE kernel of the day from 
> http://ftp.suse.com/pub/projects/kernel/kotd/
> 
> problem exists with these kernels:
> kernel-vanilla-2.6.24_rc3_git3-20071201095839
> kernel-default-2.6.24_rc3_git3-20071201095839
> kernel-default-2.6.24_rc4_git3-20071206172629
> kernel-vanilla-2.6.24_rc4_git3-20071206172629
> 
> problem didn`t exist with this one:
> kernel-default-2.6.22.12-0.1 (opensuse 10.3 default)
> 
> please see https://bugzilla.novell.com/show_bug.cgi?id=345442 for more 
> details.

That's a TSC problem on suse kernel.  I'll follow up on the bugzilla.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] hpt366: fix HPT37x PIO mode timings (take 2)

2007-12-10 Thread Sergei Shtylyov


Hello.

Bartlomiej Zolnierkiewicz wrote:


---
Many PIO modes at 55/66 MHz (as well as MDDMA modes at all clocks) are also
underclocked but I decided not to touch them, at least for the time being.
The patch is against the Linus' tree, with PIO0 setup time correct this time...



Since it is against Linus' tree I assume that it is safe enough to be merged
for 2.6.24, right?


   Yes, I think so. This is however, at your discretion.

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Possibly SATA related freeze killed networking and RAID

2007-12-10 Thread noah

2007/11/21, noah <[EMAIL PROTECTED]>:
> 2007/11/21, Alan Cox <[EMAIL PROTECTED]>:
> > > I've had other freezes before but this was the first time I was able
> > > to see what was actually going on.
> > > IRQ 21 appears to be shared between sata_nv and ethernet.
> > >
> > > Does this mean my hardware/BIOS is broken somehow?
> >
> > Not neccessarily. It could a bug in one of the drivers using IRQ 21
> > (sata_nv or the nvidia ethernet), it could be another inactive device, or
> > it could be a hardware funny.
>
> How can I tell if there's an inactive device?
>
> > Nvidia stuff can be quite hard to diagnose as we have no documentation
> > but we can try. The first question is whether it is network or disk
> > triggered - seeing if heavy loads to one or the other trigger the problem
> > might be a first plan.
>
> I haven't managed to trigger it again yet but at the time the CPU was
> heavily loaded and I was re-indexing a database which caused a lot of
> disk activity. I'm quite confident the network was pretty much idle at
> the time.

The same thing has happened twice now, both during the weekly check of
the md0 and md1 RAID1-arrays. That is, networking on the primary
interface is dead. It's interrupt (irq 21) is shared between sata_nv
and forcedeth.

Is there anything I can do to debug this problem?

I don't have access to the logs right now but will have later.

  -- noah
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH pata-2.6] hpt366: change timing register masks

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


Since PIO autotuning is now done always, there's no need anymore to program
the taskfile timings also on DMA modes, so chenge the IDE timing register
masks accordingly, "inverting the polarity" of the masks while at it...



Signed-off-by: Sergei Shtylyov <[EMAIL PROTECTED]>



applied with a whitespace fix to shut up checkpatch.pl:



ERROR: use tabs not spaces
#66: FILE: drivers/ide/pci/hpt366.c:718:
+^I^I^I^I ^I^I^I  0x303c);$


   Sorry for not running chackpatch.pl... :-<

WBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/20] ide: fix ->io_32bit race in set_io_32bit()

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


set_io_32bit() (ide_procset_t function) can race against running
PIO transfers.  Fix it by using ide_spin_wait_hwgroup().



Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>


Acked-by: Sergei Shtylyov <[EMAIL PROTECTED]>

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/20] ide: fix final status check in task_in_intr()

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


Check for DRQ bit being cleared on the final status check.



Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>


Acked-by: Sergei Shtylyov <[EMAIL PROTECTED]>

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/20] ide: clear HOB bit for REQ_TYPE_ATA_CMD requests in ide_end_drive_cmd()

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


ide_dump_status() may set HOB bit before ide_end_drive_cmd() is called.



Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>


Acked-by: Sergei Shtylyov <[EMAIL PROTECTED]>

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/20] ide: kill DATA_READY define

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>


Acked-by: Sergei Shtylyov <[EMAIL PROTECTED]>

MBR, Sergei

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 12/20] ide: use wait_drive_not_busy() in drive_cmd_intr()

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>


Acked-by: Sergei Shtylyov <[EMAIL PROTECTED]>


Index: b/drivers/ide/ide-taskfile.c
===
--- a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -260,7 +260,7 @@ static ide_startstop_t task_no_data_intr
return ide_stopped;
 }
 
-static u8 wait_drive_not_busy(ide_drive_t *drive)

+u8 wait_drive_not_busy(ide_drive_t *drive)
 {
ide_hwif_t *hwif = HWIF(drive);
int retries;


  I think you should remove the comment line below:

* (drive_cmd_intr waits that long).

MBR, Sergei

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/20] ide: initialize rq->cmd_type in ide_init_drive_cmd() callers

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


* Initialize rq->cmd_type in ide_wait_cmd(), ide_cmd_ioctl() and
  set_pio_mode() (other callers were aleady over-riding rq->cmd_type).



* Remove no longer needed rq->cmd_type setup from ide_init_drive_cmd().



Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>


   Despite the patch hardly seems necessary to me

Acked-by: Sergei Shtylyov <[EMAIL PROTECTED]>

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 16/20] ide: check BUSY and ERROR status bits before reading data in drive_cmd_intr()

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>


Acked-by: Sergei Shtylyov <[EMAIL PROTECTED]>

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 15/20] ide: don't enable local IRQs for PIO-in in driver_cmd_intr()

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


Don't enable local IRQs for PIO-in protocol in driver_cmd_intr().



While at it:



* Remove redundant rq->cmd_type check.



* Read status register after enabling local IRQs for no-data protocol.



Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>



Index: b/drivers/ide/ide-io.c
===
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -638,17 +638,18 @@ static ide_startstop_t drive_cmd_intr (i
 {
struct request *rq = HWGROUP(drive)->rq;
ide_hwif_t *hwif = HWIF(drive);
-   u8 *args = (u8 *) rq->buffer;
-   u8 stat = hwif->INB(IDE_STATUS_REG);
+   u8 *args = (u8 *)rq->buffer, pio_in = (args && args[3]) ? 1 : 0, stat;
 
-	local_irq_enable_in_hardirq();

-   if (rq->cmd_type == REQ_TYPE_ATA_CMD &&
-   (stat & DRQ_STAT) && args && args[3]) {
+   if (pio_in) {
u8 io_32bit = drive->io_32bit;
+   stat = hwif->INB(IDE_STATUS_REG);


   You've lost DRQ=1 check (which is returned in the next patch however)...

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recent kernel hosing partition

2007-12-10 Thread Tejun Heo

Business Kid wrote:
> Yeah, dd will do that but I'm not too sure whether that would be
> helpful.  
> 
> That's a bit rough! Hexedit with style?  :-).

:-)

> The drive is triggering all sorts of errors.  Can you post the
> result of 'smartctl -a /dev/sdX' where sdX is the offending drive.  Also
> please restore cc to linux-ide@vger.kernel.org
> .
> 
> 
> Attached smartctl -a /dev/sda > smartctl.out
> 
> I see where you're going, and I think you're wrong. The drive is only 2
> months old. I had heavy toolchain compiles and massive copies/ deletions
> pass of without incident on sda8 while F7's root partition (sda3) was
> lightly loaded by comparison. sda3 picked up _all_ the errors.  I never
> hit an error on the hard work - no dodgy exits. The console stuff on
> sda3 was all fine. It only screwed every application I was running under
> X - Firefox particularly. I could still compile with the tools & libs on
> sda3 when X was screwed. Badblocks never found a thing (e2fsck -cf).
> Lost+found is empty on every other partition.

Right, it doesn't look like your harddrive is bad.

> Sadly, "errors all over the place" is common enough with Via chipsets
> and Seagate disks.  I've seen it before. I'm stuck with Via in this box.
> I would not have bought Seagate, but when someone gives it to you and
> you're unemployed...

I'm not aware of any specific issues with via + Segate drives.  Have
pointers?

> Another issue here is that the old ide driver could get through the
> mess, whereas the newer one cannot. I get "Drive reset: success" and the
> old ide driver recovers, whereas the new one goes out to lunch. The log
> snippets show a 60 seconds gap between errors. That's a 60 second freeze.

Hmmm...

1. So, the IDE driver suffers from error conditions too?  Do you have
logs around?

2. Do you have logs of libata driver goes out to lunch?

3. Can you post boot log from you current setup?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 17/20] ide: fix final status check in drive_cmd_intr()

2007-12-10 Thread Sergei Shtylyov


Bartlomiej Zolnierkiewicz wrote:


Don't check for READY_STAT bit being set for PIO-in protocol (makes the
final status check in drive_cmd_intr() match the one in task_in_intr()).



Also fix function name reported by ide_error() call while at it.



Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>


Acked-by: Sergei Shtylyov <[EMAIL PROTECTED]>

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: laptop reboots right after hibernation

2007-12-10 Thread Tejun Heo

Kjartan Maraas wrote:
> ma., 10.12.2007 kl. 10.03 +0900, skrev Tejun Heo:
>> Kjartan Maraas wrote:
 Hmmm... Ah.. okay.  Wrongly splitted patch.  Can you please do it one
 more time?

>>> Attached.
>> Alright, it works now but it seems both dmesgs are from no-filter patch.
>>  I'm pretty sure it works too because one of your previous dmesgs showed
>> it worked.  Please double check.
>>
> Hmm, not sure what happened there. Attaching the filter dmesg output
> here.

Cool, thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: laptop reboots right after hibernation

2007-12-10 Thread Kjartan Maraas


ma., 10.12.2007 kl. 10.03 +0900, skrev Tejun Heo:
> Kjartan Maraas wrote:
> >> Hmmm... Ah.. okay.  Wrongly splitted patch.  Can you please do it one
> >> more time?
> >>
> > Attached.
> 
> Alright, it works now but it seems both dmesgs are from no-filter patch.
>  I'm pretty sure it works too because one of your previous dmesgs showed
> it worked.  Please double check.
> 
Hmm, not sure what happened there. Attaching the filter dmesg output
here.

Cheers
Kjartan

Initializing cgroup subsys cpuset
Linux version 2.6.24-rc4 ([EMAIL PROTECTED]) (gcc version 4.1.2 20071124 (Red 
Hat 4.1.2-35)) #6 SMP Sun Dec 9 21:36:41 CET 2007
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - bf7d (usable)
 BIOS-e820: bf7d - bf7e5600 (reserved)
 BIOS-e820: bf7e5600 - bf7f8000 (ACPI NVS)
 BIOS-e820: bf7f8000 - bf80 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fed2 - fed9b000 (reserved)
 BIOS-e820: feda - fedc (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: ffb0 - ffc0 (reserved)
 BIOS-e820: fff0 - 0001 (reserved)
2167MB HIGHMEM available.
896MB LOWMEM available.
Entering add_active_range(0, 0, 784336) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   229376
  HighMem229376 ->   784336
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0:0 ->   784336
On node 0 totalpages: 784336
  DMA zone: 56 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4040 pages, LIFO batch:0
  Normal zone: 3080 pages used for memmap
  Normal zone: 00 pages, LIFO batch:31
  HighMem zone: 7587 pages used for memmap
  HighMem zone: 547373 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
DMI 2.4 present.
Using APIC driver default
ACPI: RSDP 000F78B0, 0024 (r2 HP)
ACPI: XSDT BF7E57C8, 007C (r1 HPQOEM SLIC-MPC1 HP  1)
ACPI: FACP BF7E5684, 00F4 (r4 HP 30AD3 HP  1)
ACPI: DSDT BF7E5ACC, FE7B (r1 HP   nc64001 MSFT  10E)
ACPI: FACS BF7F7E80, 0040
ACPI: SLIC BF7E5844, 0176 (r1 HPQOEM SLIC-MPC1 HP  1)
ACPI: HPET BF7E59BC, 0038 (r1 HP 30AD1 HP  1)
ACPI: APIC BF7E59F4, 0068 (r1 HP 30AD1 HP  1)
ACPI: MCFG BF7E5A5C, 003C (r1 HP 30AD1 HP  1)
ACPI: TCPA BF7E5A98, 0032 (r2 HP 30AD1 HP  1)
ACPI: SSDT BF7F5947, 0059 (r1 HP   HPQNLP1 MSFT  10E)
ACPI: SSDT BF7F59A0, 032D (r1 HP   HPQSAT1 MSFT  10E)
ACPI: SSDT BF7F64E0, 025F (r1 HP  Cpu0Tst 3000 INTL 20060317)
ACPI: SSDT BF7F673F, 00A6 (r1 HP  Cpu1Tst 3000 INTL 20060317)
ACPI: SSDT BF7F67E5, 04D7 (r1 HPCpuPm 3000 INTL 20060317)
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 6:15 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 6:15 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
ACPI: HPET id: 0x8086a201 base: 0xfed0
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at c000 (gap: bf80:3f40)
swsusp: Registered nosave memory region: 0009f000 - 000a
swsusp: Registered nosave memory region: 000a - 000e
swsusp: Registered nosave memory region: 000e - 0010
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 773613
Kernel command line: ro root=LABEL=/1 pci=assign-busses selinux=off
mapped APIC to b000 (fee0)
mapped IOAPIC to a000 (fec0)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0821000 soft=c0801000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 1828.865 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:8
... MAX_LOCK_DEPTH:  30
... MAX_LOCKDEP_KEYS:2048
... CLASSHASH_SIZE:   1024
... MAX_LOCKDEP_ENTRIES: 8192
... MAX_LOCKDEP_CHAINS:

Re: SAS v SATA interface performance

2007-12-10 Thread Jens Axboe

On Mon, Dec 10 2007, Tejun Heo wrote:
> There's one thing we can do to improve the situation tho.  Several
> drives including raptors and 7200.11s suffer serious performance hit if
> sequential transfer is performed by multiple NCQ commands.  My 7200.11
> can do > 100MB/s if non-NCQ command is used or only upto two NCQ
> commands are issued; however, if all 31 (maximum currently supported by
> libata) are used, the transfer rate drops to miserable 70MB/s.
> 
> It seems that what we need to do is not issuing too many commands to one
> sequential stream.  In fact, there isn't much to gain by issuing more
> than two commands to one sequential stream.

Well... CFQ wont go to deep queue depths across processes if they are
doing streaming IO, but it wont stop a single process from doing so. I'd
like to know what real life process would issue a streaming IO in some
async manner as to get 31 pending commands sequentially? Not very likely
:-)

So I'd consider your case above a microbenchmark results. I'd also claim
that the firmware is very crappy, if it performs like described.

There's another possibility as well - that the queueing by the drive
generates a worse issue IO pattern, and that is why the performance
drops. Did you check with blktrace what the generated IO looks like?

> Both raptors and 7200.11 perform noticeably better on random workload
> with NCQ enabled.  So, it's about time to update IO schedulers
> accordingly, it seems.

Definitely. Again microbenchmarks are able to show 30-40% improvements
when I last tested. That's a pure random workload though, again not
something that you would see in real life.

I tend to always run with a depth around 4 here. It seems to be a good
value, you get some benefits from NCQ but you don't allow the drive
firmware to screw you over.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SAS v SATA interface performance

2007-12-10 Thread James Bottomley

On Mon, 2007-12-10 at 16:33 +0900, Tejun Heo wrote:
> There's one thing we can do to improve the situation tho.  Several
> drives including raptors and 7200.11s suffer serious performance hit if
> sequential transfer is performed by multiple NCQ commands.  My 7200.11
> can do > 100MB/s if non-NCQ command is used or only upto two NCQ
> commands are issued; however, if all 31 (maximum currently supported by
> libata) are used, the transfer rate drops to miserable 70MB/s.
> 
> It seems that what we need to do is not issuing too many commands to one
> sequential stream.  In fact, there isn't much to gain by issuing more
> than two commands to one sequential stream.

You're entering an area of perennial debate even for SCSI drives.  What
we know is that for drives whose firmware elevator doesn't perform very
well is that a lower TCQ depth (2-4) is better than a high one, the only
use tags have being to saturate the transport.  For high end arrays and
better performing firmware drives, the situation is much more murky.  It
boils down to whose elevator do you trust, the drive/array's or the
kernel's.  If the latter, then you want a depth of around 4 and if the
former, you want a depth as high as possible (arrays like 64-128).

Given the way IDE drives are made, I'd bet they fall into the category
of firmware elevator that doesn't perform very well, so you probably
want a low NCQ depth with them (just sufficient to saturate the
transport, but not high enough to allow the drive to make too many head
scheduling decisions).

James

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SAS v SATA interface performance

2007-12-10 Thread Mark Lord


Tejun Heo wrote:

..
Mark, how is marvell PMP support going?

..

It will be good once it happens -- the newer 6042/7042 chips support
full FIS-based switching, as well as command-based switching,
with large queues and all of the trimmings.

Currently stuck in legalese, though.

Cheers


-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SAS v SATA interface performance

2007-12-10 Thread Mark Lord


Jens Axboe wrote:

On Mon, Dec 10 2007, Tejun Heo wrote:

There's one thing we can do to improve the situation tho.  Several
drives including raptors and 7200.11s suffer serious performance hit if
sequential transfer is performed by multiple NCQ commands.  My 7200.11
can do > 100MB/s if non-NCQ command is used or only upto two NCQ
commands are issued; however, if all 31 (maximum currently supported by
libata) are used, the transfer rate drops to miserable 70MB/s.

It seems that what we need to do is not issuing too many commands to one
sequential stream.  In fact, there isn't much to gain by issuing more
than two commands to one sequential stream.


Well... CFQ wont go to deep queue depths across processes if they are
doing streaming IO, but it wont stop a single process from doing so. I'd
like to know what real life process would issue a streaming IO in some
async manner as to get 31 pending commands sequentially? Not very likely

..

In the case of the WD Raptors, their firmware has changed slightly over
the years.  The ones I had here would *disable* internal read-ahead
for TCQ/NCQ commands, effectively killing any hope of sequential throughput
even for a queuesize of "1".   This was acknowledged by people with inside
knowledge of the firmware at the time.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SAS v SATA interface performance

2007-12-10 Thread Mark Lord


Tejun Heo wrote:
..

NCQ is not more advanced than SCSI TCQ.  NCQ is "native" and "advanced"
compared to old IDE style bus-releasing queueing support which was one
ugly beast which no one really supported well.  The only example I can
remember which actually worked was first gen raptors paired with
specific controller with custom driver on windows.

..

I wrote PATA drivers for some chipsets that had hardware support for TCQ,
and it did make a very impressive throughput difference when enabled.
The IBM/Hitachi Deathst.. err.. Deskstar.. drives always had the best
support in firmware.  I believe we also used some WD drives, though there
firmware didn't perform as well.

ISTR that NCQ wins over TCQ (ATA) because multiple drives can interleave
their data transfers on the bus -- with TCQ, a drive took over the bus
at the start of data transfer and never released it until the command completed.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-10 Thread Mark Lord


Linus Torvalds wrote:


On Sun, 9 Dec 2007, Robert Hancock wrote:
The obvious suspect with a filesystem problem would be the disk 
controller driver, AHCI here. However, the controller appears to set the 
flag to indicate that it supports 64-bit DMA, so it should be fine, 
unless it lies of course (which we know that ATI SB600 chipset does, but 
I don't believe Intel is known to).


Could still be a DMA mapping bug that only shows up when IOMMU is used. 
However, AHCI is a pretty well tested driver..


AHCI is a pretty well tested driver, but 99%+ of all testers still tend to 
have less than 4GB of memory. So I do *not* believe that the highmem bits 
are all that well tested at all. 

Can somebody who knows the driver send Marco a test-patch to just limit 
DMA to the low 32 bits, and then Marco can at least verify that yes, that 
that it. While it looks like DMA problems, there could obviously be some 
other subtle issue with big-memory machines (ie the PCI allocations etc 
tend to change too!)

..

We have another outstanding bug report of a Marvell chipset being
used in a funky 32-bit PPC situation with memory above the 4GB mark.

Possibly related, possibly not.

The Marvell SATA driver is still VERY EXPERIMENTAL right now,
missing some errata and stuff.  This should improve over the next few months.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Revisiting - 2.6.23.8 - Hang with sata_mv (7042) + Flat 4Gig (no holes) Memory

2007-12-10 Thread Morrison, Tom

I'll try - it will take me a little while to get back to this - had to
reconfigure my target for a different test...more later~!

Tom


-Original Message-
From: Mark Lord [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 10, 2007 11:44 AM
To: Morrison, Tom
Cc: Jeff Garzik; linux-ide@vger.kernel.org; Tejun Heo; [EMAIL PROTECTED]
Subject: Re: Revisiting - 2.6.23.8 - Hang with sata_mv (7042) + Flat
4Gig (no holes) Memory

Morrison, Tom wrote:
..
> To re-state the problem
> 
> Hardware/Configuration:
>MPC8548E with a 7042 (rev 2 - connected internal via a PEX switch) 
>2.6.23.8 (using PHYS_64BIT & PTE_64BIT - for 36 bit addressing
>  & MSI is NOT compiled in)
>Flat 4Gig Memory Map (no holes - 0 - 0x0__ defined -
special
>  low reserve memory is also used)
> 
>Local Bus & PCI Express IOMem mapped to unique space in 
>   0xC__ with extensions to the ioremap routines 
>   to create the appropriate requested physical address...
>   This is (and should be) transparent to the requesting 
>   function that calls ioremap.
>   
>2 SATA hard drives connected.
> 
> To recreate:
>Write a large file (now greater than >310Mbytes) - hangs
>and soft lockup is detected by kernel - no useful info 
>in stack trace...
> 
> Of interest:
>a) Replace sata_mv.c - with the 'old' Marvell's reference 
>   driver and it works perfectly!!
> 
>b) Also, sata_mv works perfectly in all conditions - if we boot
with 
>   less than the ~3750M from the command line (which I note is
~below
>   where its PEX IOmemory space is located).
..

Tom:  could you please try and follow the recent thread here (linux-ide)
entitled "Bug: get EXT3-fs error Allocating block in system zone".

Somebody there has a similar problem on x86-32 with RAM above 4GB
using a standard AHCI SATA controller.

Jens has posted a couple of debugging patches there to try and isolate
things.
(patches attached here for convenience, though you'll have to
modify/hack
 the first one for sata_mv instead of ahci).

-ml

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-10 Thread Marco Gatti


Mark Lord schrieb:


AHCI is a pretty well tested driver, but 99%+ of all testers still 
tend to have less than 4GB of memory. So I do *not* believe that the 
highmem bits are all that well tested at all.
Can somebody who knows the driver send Marco a test-patch to just 
limit DMA to the low 32 bits, and then Marco can at least verify that 
yes, that that it. While it looks like DMA problems, there could 
obviously be some other subtle issue with big-memory machines (ie the 
PCI allocations etc tend to change too!)

..

We have another outstanding bug report of a Marvell chipset being
used in a funky 32-bit PPC situation with memory above the 4GB mark.

Possibly related, possibly not.

The Marvell SATA driver is still VERY EXPERIMENTAL right now,
missing some errata and stuff.  This should improve over the next few 
months.




Hello,

I don't have a Marvell chipset. I didn't compiled in this. I have a 
"Intel Corporation PT IDER Controller" and an intel matrix storage 
controller. So I think it doesn't concern my case...


Greets
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-10 Thread Marco Gatti


Jens Axboe schrieb:

Hello Jens,
Thanks for help. I just applied the patch. Unfortunately it doesn't work.

Can you try and additionally boot with iommu=off as a boot parameter?

Yes. This is the end of getting any sata devices. See screenshots for 
errors. It continued untill ata4. At the end no root device was found.


Hmm, even though the address is set to 0x we still seem to
receive requests outside that range. Lets assume it's the scsi logic,
can you test this? IOW, patch + iommu=off + this patch.

I probably wont see any more mails tonight, we can continue this
tomorrow (or someone else can step in, whatever comes first :-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2faced6..769ce3a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1572,7 +1572,9 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host 
*shost,
 #endif
 
 	blk_queue_max_sectors(q, shost->max_sectors);

+#if 0
blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
+#else
blk_queue_segment_boundary(q, shost->dma_boundary);
 
 	if (!shost->use_clustering)
I applied the path. Got Hunk #1 succeeded at 1562 with fuzz 2 (offset 
-10 lines).


I didn't compile completly.

drivers/scsi/scsi_lib.c:1565:1: error: unterminated #else
make[2]: *** [drivers/scsi/scsi_lib.o] Error 1
make[1]: *** [drivers/scsi] Error 2
make: *** [drivers] Error 2


-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-10 Thread Jens Axboe

On Mon, Dec 10 2007, Marco Gatti wrote:
> Jens Axboe schrieb:
> Hello Jens,
> Thanks for help. I just applied the patch. Unfortunately it doesn't 
> work.
> >>>Can you try and additionally boot with iommu=off as a boot parameter?
> >>>
> >>Yes. This is the end of getting any sata devices. See screenshots for 
> >>errors. It continued untill ata4. At the end no root device was found.
> >
> >Hmm, even though the address is set to 0x we still seem to
> >receive requests outside that range. Lets assume it's the scsi logic,
> >can you test this? IOW, patch + iommu=off + this patch.
> >
> >I probably wont see any more mails tonight, we can continue this
> >tomorrow (or someone else can step in, whatever comes first :-)
> >
> >diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >index 2faced6..769ce3a 100644
> >--- a/drivers/scsi/scsi_lib.c
> >+++ b/drivers/scsi/scsi_lib.c
> >@@ -1572,7 +1572,9 @@ struct request_queue *__scsi_alloc_queue(struct 
> >Scsi_Host *shost,
> > #endif
> > 
> > blk_queue_max_sectors(q, shost->max_sectors);
> >+#if 0
> > blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
> >+#else
> > blk_queue_segment_boundary(q, shost->dma_boundary);
> > 
> > if (!shost->use_clustering)
> I applied the path. Got Hunk #1 succeeded at 1562 with fuzz 2 (offset 
> -10 lines).
> 
> I didn't compile completly.
> 
> drivers/scsi/scsi_lib.c:1565:1: error: unterminated #else
> make[2]: *** [drivers/scsi/scsi_lib.o] Error 1
> make[1]: *** [drivers/scsi] Error 2
> make: *** [drivers] Error 2

Doh sorry, that #else wants to be an #endif

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-10 Thread Linus Torvalds

On Mon, 10 Dec 2007, Marco Gatti wrote:
> 
> I didn't compile completly.
> 
> drivers/scsi/scsi_lib.c:1565:1: error: unterminated #else

Heh. That #else should be an #endif, of course.

It is a bit strange that it still tries to do IO to high memory. Either 
the whole "64 bit capability" thing in AHCI is broken, or the bounce 
buffering doesn't work right. Or maybe you tried the "iommu=off" without 
the original patch that tried to turn off 64-bit DMA?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recent kernel hosing partition

2007-12-10 Thread For Junk Mail

Business.kid using this address because gmail keeps bouncing linux-ide
for silly reasons.

On Mon, 2007-12-10 at 22:39 +0900, Tejun Heo wrote:

> 
> > The drive is triggering all sorts of errors.  Can you post the
> > result of 'smartctl -a /dev/sdX' where sdX is the offending drive.  Also
> > please restore cc to linux-ide@vger.kernel.org
> > .
> > 
> > 
> > Attached smartctl -a /dev/sda > smartctl.out
> > 
> > I see where you're going, and I think you're wrong. The drive is only 2
> > months old. I had heavy toolchain compiles and massive copies/ deletions
> > pass of without incident on sda8 while F7's root partition (sda3) was
> > lightly loaded by comparison. sda3 picked up _all_ the errors.  I never
> > hit an error on the hard work - no dodgy exits. The console stuff on
> > sda3 was all fine. It only screwed every application I was running under
> > X - Firefox particularly. I could still compile with the tools & libs on
> > sda3 when X was screwed. Badblocks never found a thing (e2fsck -cf).
> > Lost+found is empty on every other partition.
> 
> Right, it doesn't look like your harddrive is bad.
> 
> > Sadly, "errors all over the place" is common enough with Via chipsets
> > and Seagate disks.  I've seen it before. I'm stuck with Via in this box.
> > I would not have bought Seagate, but when someone gives it to you and
> > you're unemployed...
> 
> I'm not aware of any specific issues with via + Segate drives.  Have
> pointers?

Remember the infamous via 'hardware error' which via insist is a
configuration error from the MPV3 chipset? This 8235 southbridge is the
same southbridge basically, shrunk down and sped up. They never liked
Seagate drives, which seem to use non standard dma - fine with a windows
driver, but dodgy in linux. I did some crashtesting for mandrake on disk
optimizing scripts in times (far) past. They built a database of drives
and how fast they could set safely them, and Seagate never got past PIO
4. So I never bought Seagate.
> 
> > Another issue here is that the old ide driver could get through the
> > mess, whereas the newer one cannot. I get "Drive reset: success" and the
> > old ide driver recovers, whereas the new one goes out to lunch. The log
> > snippets show a 60 seconds gap between errors. That's a 60 second freeze.
> 
> Hmmm...
> 
> 1. So, the IDE driver suffers from error conditions too?  Do you have
> logs around?
> 
There is only IDE. No SATA. 80 ribbon cable. But Fedora only uses ATA
driver so it's sda, and not hda as per normal. Sorry for the confusion.
This is not a new box (2004/2005)

> 2. Do you have logs of libata driver goes out to lunch?
> 
Catch 22. Did you see the film? I've only one hard disk. Reset to get
out of trouble, so how does it log the disk going out to lunch?. Where
would I log it to?
https://bugzilla.redhat.com/attachment.cgi?id=281341 is the output of 
grep -C10 frozen /var/log/messages > errors.out which gives context. I
have the whole /var/log/messages. The recorded errors are mainly in the
bootup phase, as sda3 was unmountable every time there after an
'out-to-lunch' episode.

Typically, in an 'out to lunch' period, the line beginning 'exception
Emask' down as far as 'DPO or FUA' would repeat on stdout. Some disk
error would precede it, e.g. '/usr/lib/something.so: no such file or
directory'. That file would probably migrate to lost+found on the next
e2fsck pass and when I went to check it 2 reboots later it was indeed
missing. Then we got to the stage where the
entire /usr/lib/firefox/  directory migrated and we departed
from reality at that point.

Somewhere, I actually have the datasheet for the actual chip, the Via
vt8235 southbridge. I acquired it around kernel 2.6.19 and did the test
work here on one of the dodgiest boxes in the universe to rid the
usb-2.0 driver of syslog spam about overcurrent change.

What was done then worked quite well. A patch was written to log the
values of certain registers to syslog. Then what was going wrong could
be seen, and it became evident the via hardware broke standards on 2 usb
ports. Via's solution was to disable those 2 ports :-/, but I had the
early rev of the chipset where they were in.


> 3. Can you post boot log from you current setup?
I presume you want the dmesg output - boot.log is dhcp stuff here. This
is the last dmesg from that kernel, which is clean. Just checking inside
the initrd, these are preloaded
/tmp/temp/lib/ata_generic.ko  /tmp/temp/lib/libata.ko
/tmp/temp/lib/scsi_mod.ko
/tmp/temp/lib/ehci-hcd.ko /tmp/temp/lib/mbcache.ko   
/tmp/temp/lib/scsi_wait_scan.ko
/tmp/temp/lib/ext3.ko /tmp/temp/lib/ohci-hcd.ko  /tmp/temp/lib/sd_mod.ko
/tmp/temp/lib/jbd.ko  /tmp/temp/lib/pata_via.ko  
/tmp/temp/lib/uhci-hcd.ko


If we can provoke the error, I feel the way to trap it is
1. make intelligent recoverable changes to ide partition /dev/sda3 on
firefox files.
2. Directly or indirectly, Mount my 1 gig usb disk on /var

Re: [PATCH 27/28] blk_end_request: changing scsi mid-layer for bidi (take 3)

2007-12-10 Thread Kiyoshi Ueda

Hi Boaz,

On Sun, 09 Dec 2007 11:43:31 +0200, Boaz Harrosh <[EMAIL PROTECTED]> wrote:
> >>> Index: 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c
> 
> >> No I don't like it. The only client left for blk_end_request_callback()
> >> is bidi,
> > 
> > ide-cd (cdrom_newpc_intr) is another client.
> > So I can't drop blk_end_request_callback() even if bidi doesn't use it.
>
> It looks like all is needed for the ide-cd case is a flag that says
> "don't complete the request". And the call-back is not actually used.
> (Inspecting the last: [PATCH 26/28] blk_end_request: changing ide-cd (take 3))
> The same exact flag could also help the bidi case. Perhaps have an API
> for that?

Thank you for looking at the ide-cd part, too.

But, no, I don't want to add the "don't complete" flag to the API.
It could be an alternative, but having such a flag explicitly may blur
the purpose of the blk_end_reuqest interfaces, which drivers give over
the ownership of the request to the block-layer when calling
blk_end_request interfaces.
(With such a flag, the API looks like explicitly allowing drivers
 to have the ownership of the request even after the API is called,
 and it is like end_that_request_chunk().)
And the API also looks easy to use for drivers and it might help
to make other tricky drivers in the future.

I would like to go with the callback API, since the usege in ide-cd
shows that the driver is very tricky and the API can be used for
other requirements like "a driver wants to do something after data
completion and before request completion."

> > Index: 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c
> > ===
> > --- 2.6.24-rc3-mm2.orig/drivers/scsi/scsi_lib.c
> > +++ 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c
> > @@ -629,28 +629,6 @@ void scsi_run_host_queues(struct Scsi_Ho
> > scsi_run_queue(sdev->request_queue);
> >  }
> >  
snip
> > +int blk_end_request(struct request *rq, int uptodate, int nr_bytes)
> > +{
> > +   return blk_end_io(rq, uptodate, nr_bytes, 0, NULL);
> > +}
> >  EXPORT_SYMBOL_GPL(blk_end_request);
> >  
> 
> All above looks fine, thanks.

OK, I'll update the patch-set using the bidi API and blk_end_io().
I'm currently updating the patch-set based on 2.6.24-rc4, not -mm.
So the new patch-set will include the blk_end_bidi_request() API
but not the scsi bidi changes.

Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bugme-new] [Bug 9533] New: 2.6.24-rc4: some ahci/acpi interaction causes delays during boot

2007-12-10 Thread Andrew Morton

On Mon, 10 Dec 2007 05:55:20 -0800 (PST)
[EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9533

Another box-killing regression to track, please.  Either ATA or ACPI.
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bugme-new] [Bug 9533] New: 2.6.24-rc4: some ahci/acpi interaction causes delays during boot

2007-12-10 Thread Andrew Morton

On Mon, 10 Dec 2007 12:52:43 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Mon, 10 Dec 2007 05:55:20 -0800 (PST)
> [EMAIL PROTECTED] wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=9533
> 
> Another box-killing regression to track, please.  Either ATA or ACPI.

er, no, not box-killing - just scary warnings.

It's not clear what "kernel doesn't get to userland yet" is referring to - 
something else I guess.

Your desire to avoid doing a bisection search is a good one - I've been
trying to do one for a couple of days on and off and there are so many
fatally buggy bisection points between 2.6.23 and 2.6.24-rc1 that I've
given up on the attempt.
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bugme-new] [Bug 9533] New: 2.6.24-rc4: some ahci/acpi interaction causes delays during boot

2007-12-10 Thread Chuck Ebbert

On 12/10/2007 03:52 PM, Andrew Morton wrote:
> On Mon, 10 Dec 2007 05:55:20 -0800 (PST)
> [EMAIL PROTECTED] wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=9533
> 
> Another box-killing regression to track, please.  Either ATA or ACPI.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Quite possibly the BIOS is buggy and handed a bad taskfile to the driver.

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-10 Thread Robert Hancock


Linus Torvalds wrote:


On Mon, 10 Dec 2007, Marco Gatti wrote:

I didn't compile completly.

drivers/scsi/scsi_lib.c:1565:1: error: unterminated #else


Heh. That #else should be an #endif, of course.

It is a bit strange that it still tries to do IO to high memory. Either 
the whole "64 bit capability" thing in AHCI is broken, or the bounce 
buffering doesn't work right. Or maybe you tried the "iommu=off" without 
the original patch that tried to turn off 64-bit DMA?


Linus



From what I can see, it appears that iommu=off disables the IOMMU but 
doesn't actually do anything to prevent attempts to DMA above 4GB. If 
you try to map something over 4GB it just chokes with that mask overflow 
(in arch/x86/kernel/pci-nommu_64.c).


The iommu=off option actually seems rather useless, as it's the default 
in the only case where it will actually work (no memory above 4GB)..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Iomega ZIP-100 drive unsupported with jmicron JMB361 chip?

2007-12-10 Thread Robert Hancock


(linux-ide cc'ed)

trash can wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have tolerated this problem for a year and do not post to this list in
haste. I have posted on forums and searched the community over the past
year. I have looked at the list archive on gossamer-threads.com for
solutions. With Fedora Core 6 unsupported (the last kernel for which my
zip drive worked), it is time for my last attempt at a solution. Please
CC: any response as I have not joined the list. I have compiled a
kernel-debug RPM and can run this if its output would help. Thank you
for any time you might devote to this problem.

motherboard: MSI P965 Platinum/Intel P965 Express Chipset Based (MS-7238
series)
Fedora 8 : kernel 2.6.23.1-42.fc8
Iomega Zip drive internal Model Z100ATAPI

lspci
03:00.0 SATA controller: JMicron Technologies, Inc. JMB361 AHCI/IDE (rev 02)
03:00.1 IDE interface: JMicron Technologies, Inc. JMB361 AHCI/IDE (rev 02)

# lsmod | grep ata
pata_jmicron8257  0
ata_generic 8901  0
ata_piix   16709  0
libata 99633  4 ahci,pata_jmicron,ata_generic,ata_piix
scsi_mod  119757  4 sr_mod,sg,libata,sd_mod

I have recently changed the BIOS setting for the SATA#1 Controller from
[IDE] to [AHCI] with no effect. I assume AHCI is correct?


AHCI is better, yes. It shouldn't be relevant this this problem though.



Text below attached as text.txt for readability.
from dmesg:
libata version 2.21 loaded.
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: [EMAIL PROTECTED]
PCI: Enabling device :03:00.1 ( -> 0001)
ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device :03:00.1 to 64
scsi0 : pata_jmicron
scsi1 : pata_jmicron
ata1: PATA max UDMA/100 cmd 0x0001cc00 ctl 0x0001c882 bmdma 0x0001c400 irq 17
ata2: PATA max UDMA/100 cmd 0x0001c800 ctl 0x0001c482 bmdma 0x0001c408 irq 17
ata1.00: ATAPI: LITE-ON DVDRW SOHW-1693S, KS0B, max UDMA/66
ata1.01: ATAPI: IOMEGA  ZIP 100   ATAPI, 05.H, max MWDMA1, CDB intr
ata1.00: configured for UDMA/66
ata1.01: configured for MWDMA1
scsi 0:0:0:0: CD-ROMLITE-ON  DVDRW SOHW-1693S KS0B PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access IOMEGA   ZIP 100  05.H PQ: 0 ANSI: 5
sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 00 40 00 00
sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 00 40 00 00
sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sda:<6>sd 0:0:1:0: [sda] Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 0:0:1:0: [sda] Sense Key : Hardware Error [current]
sd 0:0:1:0: [sda] Add. Sense: Scsi parity error
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0

If a disk is inserted into the drive (/var/log/messages)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Spinning up disk.<5>sd 
0:0:1:0: [sda] Spinning up diskready
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] 196608 512-byte hardware 
sectors (101 MB)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write Protect is off
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] 196608 512-byte hardware 
sectors (101 MB)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write Protect is off
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
Dec 10 14:22:53 localhost kernel:  sda:<6>sd 0:0:1:0: [sda] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Sense Key : Hardware Error 
[current]
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Add. Sense: Scsi parity 
error
Dec 10 14:22:53 localhost kernel: end_request: I/O error, dev sda, sector 0
Dec 10 14:22:53 localhost kernel: printk: 42 messages suppressed.
Dec 10 14:22:53 localhost kernel: Buffer I/O error on device sda, logical block 0


That is rather curious. There's no sign of any libata error handling 
going on.. Maybe the drive is actually returning that error code in the 
ATAPI CDB, or at least we think it is?


You are sure that this drive still works with older kernels using 
drivers/ide, and that the hardware didn't break at some point, I assume?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recent kernel hosing partition

2007-12-10 Thread Tejun Heo

Hello,

For Junk Mail wrote:
>> I'm not aware of any specific issues with via + Segate drives.  Have
>> pointers?
> 
> Remember the infamous via 'hardware error' which via insist is a
> configuration error from the MPV3 chipset? This 8235 southbridge is the
> same southbridge basically, shrunk down and sped up. They never liked
> Seagate drives, which seem to use non standard dma - fine with a windows
> driver, but dodgy in linux. I did some crashtesting for mandrake on disk
> optimizing scripts in times (far) past. They built a database of drives
> and how fast they could set safely them, and Seagate never got past PIO
> 4. So I never bought Seagate.

AFAIK, there currently isn't any known problem specific to VIA - Seagate
combination.  sata_via surely has some issues on error conditions tho.

>>> Another issue here is that the old ide driver could get through the
>>> mess, whereas the newer one cannot. I get "Drive reset: success" and the
>>> old ide driver recovers, whereas the new one goes out to lunch. The log
>>> snippets show a 60 seconds gap between errors. That's a 60 second freeze.
>> Hmmm...
>>
>> 1. So, the IDE driver suffers from error conditions too?  Do you have
>> logs around?
>>
> There is only IDE. No SATA. 80 ribbon cable. But Fedora only uses ATA
> driver so it's sda, and not hda as per normal. Sorry for the confusion.
> This is not a new box (2004/2005)

I meant the old driver/ide/* drivers.

>> 2. Do you have logs of libata driver goes out to lunch?
>>
> Catch 22. Did you see the film? I've only one hard disk. Reset to get
> out of trouble, so how does it log the disk going out to lunch?. Where
> would I log it to?

Ah.. Catch 22 is name of a film.  I knew what it meant but never knew
where the expression came from.  Anyways, in such cases, log is usually
collected via serial or net console, usb or other storage if you have
quasi working userland or digital cameras as a last resort.

> https://bugzilla.redhat.com/attachment.cgi?id=281341 is the output of 
> grep -C10 frozen /var/log/messages > errors.out which gives context. I
> have the whole /var/log/messages. The recorded errors are mainly in the
> bootup phase, as sda3 was unmountable every time there after an
> 'out-to-lunch' episode.
> 
> Typically, in an 'out to lunch' period, the line beginning 'exception
> Emask' down as far as 'DPO or FUA' would repeat on stdout. Some disk
> error would precede it, e.g. '/usr/lib/something.so: no such file or
> directory'. That file would probably migrate to lost+found on the next
> e2fsck pass and when I went to check it 2 reboots later it was indeed
> missing. Then we got to the stage where the
> entire /usr/lib/firefox/  directory migrated and we departed
> from reality at that point.

Ah... I'd really like to see the log.

> If we can provoke the error, I feel the way to trap it is
> 1. make intelligent recoverable changes to ide partition /dev/sda3 on
> firefox files.
> 2. Directly or indirectly, Mount my 1 gig usb disk on /var/log :-D.
> Would that get around the Catch-22? I can stick in another (old) disk if
> needed, but I only have ide, and we freeze, so that will hardly be much
> good.

Usually the best way is serial or net console.

> 3. Go browsing and hope that trouble starts. 
> 
> Looking at the lost+found files in detail, I was struck by the #numbers.
> There are a number of strings there: At least 3 from Firefox; at least
> one each from openoffice, /etc/rc.d, and one I think from Evolution. 

There are other reports of sata_via freezing up after transport errors
and sadly there isn't too much to do about it.  The controller hangs
while holding the PCI bus and no software can recover from that.  I'm
currently not sure whether the controller locks up on transmission
errors or as a response to libata's error handling sequence.  If latter,
we may be able to avoid it by changing EH sequence but unfortunately I
don't have access to affected hardware or time at the moment.

What worries me is that your case actually resulted in data corruption.
 libata's EH is safe.  Another possibility is that your filesystem got
corrupted while going through several lockup - reboot sequences in which
case data sure is lost.  But still journaling and barrier should be able
to avoid filesystem corruption.  You have barrier enabled, right?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

SATA drive keeps hard resetting

2007-12-10 Thread Matías Alejandro Torres


Hi all,

I have a SATA drive in a motherboard with an ATI SB600 chipset that 
until now worked just fine.
The disk seems fine but after a while it start making noises (like 
spinning up) and the computer freezes during 2 or 3 seconds. I think my 
SATA drive is broken, or maybe the motherboard. This is what dmesg says:


[  229.096000] ata4: exception Emask 0x10 SAct 0x0 SErr 0x90200 action 
0x6 frozen

[  229.096000] ata4: (irq_stat 0x0040, PHY RDY changed)
[  229.096000] ata4: hard resetting port
[  233.40] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  233.404000] ata4.00: configured for UDMA/133
[  233.404000] ata4: EH complete
[  233.404000] sd 3:0:0:0: [sda] 160836480 512-byte hardware sectors 
(82348 MB)

[  233.404000] sd 3:0:0:0: [sda] Write Protect is off
[  233.404000] sd 3:0:0:0: [sda] Mode Sense: 00 3a 00 00
[  233.404000] sd 3:0:0:0: [sda] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA



This happens randomly, maybe after a few seconds, maybe after an hour. 
Sometimes if the coimputer case receives a little bump or it is moved 
the problem appears inmediatly.
What's the kernel complaining about? Something is broken isn't it? I 
hope it's the cable...


I'm using Ubuntu 7.10 with kernel 2.6.22-14-generic in a dual processor 
computer. Below there's some hardware information. The output of dmesg 
is attached. Thanks!


Matías.



lspci:

00:00.0 Host bridge: ATI Technologies Inc RS480 Host Bridge (rev 10)
00:02.0 PCI bridge: ATI Technologies Inc RS480 PCI-X Root Port
00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0)
00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1)
00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2)
00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3)
00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4)
00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI)
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 13)
00:14.1 IDE interface: ATI Technologies Inc SB600 IDE
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia
00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation G71 [GeForce 7300 
GS] (rev a1)
02:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL-8110SC/8169SC Gigabit Ethernet (rev 10)




lshw:

*-pci:0
 description: Host bridge
 product: RS480 Host Bridge
 vendor: ATI Technologies Inc
 physical id: 100
 bus info: [EMAIL PROTECTED]:00:00.0
 version: 10
 width: 32 bits
 clock: 66MHz
  
  *-storage

  description: SATA controller
  product: SB600 Non-Raid-5 SATA
  vendor: ATI Technologies Inc
  physical id: 12
  bus info: [EMAIL PROTECTED]:00:12.0
  version: 00
  width: 32 bits
  clock: 66MHz
  capabilities: storage ahci_1.0 bus_master cap_list
  configuration: driver=ahci latency=64 module=ahci


-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

43 matches

Mail list logo