2.6.24 found 5 exceptions in dmesg after not using the system for while

2008-02-09 Thread Andrew Paprocki
I haven't used the system with these errors in a day or two and I came
back and noticed 5 exceptions in dmesg. These are all the same:

ata1: exception Emask 0x10 SAct 0x0 SErr 0x90200 action 0xe frozen
ata1: irq_stat 0x0040, PHY RDY changed
ata1: SError: { Persist PHYRdyChg 10B8B }
ata1: hard resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA

Any idea what caused these? This is on a ATI SB600 SATA controller:

00:12.0 SATA controller [0106]: ATI Technologies Inc SB600 Non-Raid-5
SATA [1002:4380] (prog-if 01 [AHCI 1.0])
Subsystem: Albatron Corp. Unknown device [17f2:5999]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- SERR- From boot time:

ata1: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfe02f100 irq 18
ata2: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfe02f180 irq 18
ata3: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfe02f200 irq 18
ata4: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfe02f280 irq 18
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: HDT722525DLA380, V44OA96A, max UDMA/133
ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 0 SControl 300)
ata4: SATA link down (SStatus 0 SControl 300)
scsi 0:0:0:0: Direct-Access ATA  HDT722525DLA380  V44O PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO
 or FUA
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO
 or FUA
 sda: sda1
sd 0:0:0:0: [sda] Attached SCSI disk

Thanks,
-Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hitachi 7K1000 1tb drives and sata_sil 3114 chipset

2008-01-31 Thread Andrew Paprocki
I've been encountering many, many problems with Hitachi 1tb drives
under a Sil3114 chipset and I'm wondering if there could be something
wrong with the driver/chipset in relation to these drives.
Statistically, I've had 6 out of 7 drives exhibit very strange failure
conditions while being used under this controller.

Some symptoms:
- Clicking noises while in operation
- What appears to sound/feel like the drive spins down quickly and
back up again with no console output
- SMART reporting seek read errors which mysteriously appear/disappear
completely
- Failed I/O requests

I most recently swapped out the 1tb drives for 500gb Hitachi models
and have not experienced any of the problems above. The most recent
failed I/O requests output lots of messages, which I've pasted below.
I triggered the I/O errors by setting up lots of simultaneous copies
of large files between two drives to test the configuration.

If anyone has any ideas whether this could be some kind of
incompatibility or bug, let me know. If anyone has any
positive/negative experiences with these drives on this controller, it
would also help.

Thanks,
-Andrew

sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 1559281795
sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 1563089767
sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 1563115585
sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 7775
printk: 124 messages suppressed.
Buffer I/O error on device sdb1, logical block 3856
Buffer I/O error on device sdb1, logical block 3857
Buffer I/O error on device sdb1, logical block 3858
Buffer I/O error on device sdb1, logical block 3859
EXT3-fs error (device sdb1): ext3_readdir: directory #2 contains a
hole at offset 0

# lspci -vvnnxxx -s 00:11.0
00:11.0 RAID bus controller [0104]: Silicon Image, Inc. SiI 3114
[SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)
Subsystem: Silicon Image, Inc. SiI 3114 SATARaid Controller [1095:6114]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- SERR- http://vger.kernel.org/majordomo-info.html


Re: 2.6.24 sata_sil Sil3114 drive clicking / restarting?

2008-01-27 Thread Andrew Paprocki
Both drives already had PM disabled, visible in hdparm -i:
"AdvancedPM=yes: disabled (255) WriteCache=enabled"

Looking at the smart reporting, it is showing both drives have a
FAILING_NOW condition for Seek_Error_Rate. I don't know what to
believe, because it seems like whatever drives I attach to this system
are chewed up and start showing Seek_Error_Rate failure conditions.

/dev/sda:
 7 Seek_Error_Rate 0x000b   046   046   067Pre-fail
Always   FAILING_NOW 393853
/dev/sdb:
 7 Seek_Error_Rate 0x000b   044   044   067Pre-fail
Always   FAILING_NOW 2556544

I swapped in 2 more drives of the same model, and one exhibits the
same Seek_Error_Rate FAILING_NOW condition. I now have 4 out of 5 of
this same model drive which are failing. They appear to be from the
same batch, so I'm not ruling out some kind of manufacturing defect,
but this definitely seems strange. I guess I'm just fishing to see if
there is anything on the system that could have damaged the drives.

Thanks,
-Andrew

On Jan 27, 2008 1:33 PM, Jim Paris <[EMAIL PROTECTED]> wrote:
> Andrew Paprocki wrote:
> > I've been noticing something strange on an AMD Geode LX board that I
> > have.. I have two SATA drives connected to the onboard Sil3114 chip,
> > and the drives appear to be continually restarting (soft resetting?)
> > during normal operation when nothing at all is happening on the
> > machine. You can hear the drives doing it as well as feel it
> > physically if you touch the drive. They are spinning down and back up
> > again over and over again. All the while the OS never prints out any
> > ata/scsi problems. The only manifestation of this in the kernel is
> > that if you're doing something w/ the drives, it pauses momentarily
> > while this happens (for instance, during an ext3 format).
>
> It could be drive power management.  Try "hdparm -B 255" or "hdparm -B
> 254" to turn that off.  The output of "smartctl -A" output can also be
> helpful to figure out what's causing it.
>
> -jim
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.24 sata_sil Sil3114 drive clicking / restarting?

2008-01-27 Thread Andrew Paprocki
I've been noticing something strange on an AMD Geode LX board that I
have.. I have two SATA drives connected to the onboard Sil3114 chip,
and the drives appear to be continually restarting (soft resetting?)
during normal operation when nothing at all is happening on the
machine. You can hear the drives doing it as well as feel it
physically if you touch the drive. They are spinning down and back up
again over and over again. All the while the OS never prints out any
ata/scsi problems. The only manifestation of this in the kernel is
that if you're doing something w/ the drives, it pauses momentarily
while this happens (for instance, during an ext3 format).

I thought this might be a bad drive because smartctl listed some
errors, but I have a stack of drives here and after swapping out the
drive doing this, the replacement is doing it as well. These drives
are all new 1TB Hitachi drives less than 6 months old. Now, I'm
wondering if this is some Sil3114 problem w/ libata. Has anyone else
seen this type of behavior before with no errors showing up in the
console?

Thanks,
-Andrew

Some info (unknown partition tables are because these are an md RAID1 pair):

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
Probing IDE interface ide0...
hda: , ATA DISK drive
Probing IDE interface ide1...
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: max request size: 128KiB
hda: 256000 sectors (131 MB) w/0KiB Cache, CHS=500/16/32
 hda: hda1
Driver 'sd' needs updating - please use bus_type methods
sata_sil :00:11.0: version 2.3
ACPI: PCI Interrupt Link [LNKD] BIOS reported IRQ 0, using IRQ 10
ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
ACPI: PCI Interrupt :00:11.0[A] -> Link [LNKD] -> GSI 10 (level,
low) -> IRQ 10
sata_sil :00:11.0: Applying R_ERR on DMA activate FIS errata fix
PCI: Setting latency timer of device :00:11.0 to 64
scsi0 : sata_sil
scsi1 : sata_sil
scsi2 : sata_sil
scsi3 : sata_sil
ata1: SATA max UDMA/100 mmio [EMAIL PROTECTED] tf 0xefffb080 irq 10
ata2: SATA max UDMA/100 mmio [EMAIL PROTECTED] tf 0xefffb0c0 irq 10
ata3: SATA max UDMA/100 mmio [EMAIL PROTECTED] tf 0xefffb280 irq 10
ata4: SATA max UDMA/100 mmio [EMAIL PROTECTED] tf 0xefffb2c0 irq 10
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: ATA-7: Hitachi HDS721010KLA330, GKAOA70F, max UDMA/133
ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/100
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: ATA-7: Hitachi HDS721010KLA330, GKAOA70F, max UDMA/133
ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata2.00: configured for UDMA/100
ata3: SATA link down (SStatus 0 SControl 310)
ata4: SATA link down (SStatus 0 SControl 310)
scsi 0:0:0:0: Direct-Access ATA  Hitachi HDS72101 GKAO PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors (1000205 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors (1000205 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sda: unknown partition table
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 1:0:0:0: Direct-Access ATA  Hitachi HDS72101 GKAO PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 1:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdb: unknown partition table
sd 1:0:0:0: [sdb] Attached SCSI disk

# hdparm -i /dev/sda

/dev/sda:
hdparm: ioctl 0x304 failed: Inappropriate ioctl for device

 Model=Hitachi HDS721010KLA330 , FwRev=GKAOA70F,
SerialNo=  GTJ000PAG2L50C
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=52
 BuffType=(3) DualPortCache, BuffSize=31157kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-7 T13 1532D rev.1:  ATA/ATAPI-2
ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

 * current active mode

# smartctl -a /dev/sda
...
=== START OF INFORMATION SECTION ===
Device Model: Hitachi HDS721010KLA330
Serial Number:GTJ000PAG2L50C
Firmware Version:

Re: About forcing 32bit DMA patch for AMD690G(SB600)

2008-01-25 Thread Andrew Paprocki
I'll try to get that configuration together.. right now I only have 2
1gb sticks installed on the board, so I would need to track down 2gb
ones. If I can find some laying around, I'll let you know.

Thanks,
-Andrew

On Jan 25, 2008 12:50 AM, Tejun Heo <[EMAIL PROTECTED]> wrote:
> Andrew Paprocki wrote:
> > I have an SB600/RS690 here with SATA drives connected. I haven't been
> > following this thread, but I can help test something if it would help.
>
> We're trying to determine whether SB600 ahci controller can do 64bit DMA
> or not.  Srihari's couldn't but Shane's test result tells a different
> story.  Do you have memory mapped over 4G (if you have 4G some of them
> will be over 4G, you can know this by looking at the e820 map printed
> during boot)?
>
> --
> tejun
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About forcing 32bit DMA patch for AMD690G(SB600)

2008-01-24 Thread Andrew Paprocki
Tejun,

I have an SB600/RS690 here with SATA drives connected. I haven't been
following this thread, but I can help test something if it would help.

Thanks,
-Andrew

On Jan 24, 2008 7:21 PM, Tejun Heo <[EMAIL PROTECTED]> wrote:
> Hello, Shane.  Sorry about the delay.  Got caught up in other stuff.
>
> Shane Huang wrote:
> > Quoting Tejun:
> >> Uh-oh, wait a bit. Nope. Until we figure out what the something
> >> else
> > is
> >> and positively verify 64bit DMA works fine, the quirk stays in.
> >
> > Our HW engineer has confirmed that our SB600 SATA controller indeed
> > has some MSI issue, and we do not have any workaround.
> >
> > The workaround "quirk_msi_intx_disable_bug" to SB700 SATA controller
> > can NOT work to SB600 SATA controller in my debug, while disablement
> > to RS690 MSI in kernel source can fix it.
>
> Hmmm... Okay.  Is the SB600 SATA controller culprit or the north bridge
> - RS690?  If the former is the case, proper way to work around it is to
> add AHCI_HFLAG_NO_MSI for SB600 AHCI.
>
> > As to the SB600 64 bit DMA capacity, do you have any methods to do
> > further verification? I do NOT find any problem in my debug after I
> > disabled RS690 MSI in kernel 2.6.24-rc7.
>
> The problem is that we didn't actually prove anything.  In the tests
> you've done, pci=nomsi didn't fix the problem but disable_all_msi quirk
> did.  pci=nomsi and disable_all_msi quirk are identical.  Also,
> Srihari's problem was not reproduced, so currently we can't say much
> from the test results.  Srihari, do you still have the system around?
>
> Thanks.
>
> --
> tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pata_cs5536: ATA driver for Geode companion chip

2007-10-14 Thread Andrew Paprocki
Long story short -- this has nothing to do with pata_cs5536.

It isn't in that list! I just patched my kernel to print the reason
why it is being blacklisted and this turned up:

ata5.00: ATA-0: , 060729DA, max MWDMA2
ata5.00: 256000 sectors, multi 1: LBA
ata5.00: device is on DMA blacklist, disabling DMA
ata5.00: device matched DMA blacklist, model: WDC AC11000H
ata5.00: configured for PIO4

The strn_pattern_cmp function does not handle blank model names. I
would like to give this Taiwanese manufacturer a bug for not bothering
to put a model name on their device, but I don't think they'll care
too much.. :)

I just submitted a patch to fix strn_pattern_cmp to handle the
strlen(name)==0 case appropriately (ie only match it against "*" or
"").

With that change, it is properly detected as MWDMA2 again:

ata5.00: ATA-0: , 060729DA, max MWDMA2
ata5.00: 256000 sectors, multi 1: LBA
ata5.00: configured for MWDMA2
ata5.00: configured for MWDMA2

Thanks,
-Andrew

On 10/14/07, Alan Cox <[EMAIL PROTECTED]> wrote:
> On Sun, 14 Oct 2007 15:42:19 -0400
> "Andrew Paprocki" <[EMAIL PROTECTED]> wrote:
>
> > Just noticed something.. I'm not sure if this is due to a libata-dev
> > change or me switching to pata_cs5536, but my 128MB DOM on the PATA
> > port is hitting the ata_dma_blacklisted() case and it was not
> > previously. This did not happen under 2.6.22.6 using pata_amd. The
> > system is noticeably slower when forced to use PIO4 (as you would
> > expect).
> >
> > Is this expected in the newer code, or is it a bug?
>
> Sounds like someone added it wrongly to the blacklist. Remove the
> blacklist entry, test again and if DMA is working we need to get that
> fixed ASAP in .2 and 2.6.24.
>
> Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libata: prevent devices with blank model names from being DMA blacklisted

2007-10-14 Thread Andrew Paprocki
The strn_pattern_cmp routine does not handle a blank name parameter
properly. The only patterns which should match a blank name are "*"
and an explicit "". If the function is passed a blank name in current
code, it will always match against the patt parameter. The bug manifests
itself as the device with the empty model name always matching the first
device in the DMA blacklist, forcing it to revert to PIO mode.

Signed-off-by: Andrew Paprocki <[EMAIL PROTECTED]>
---
 drivers/ata/libata-core.c |   13 -
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 4e11e39..e73b7b4 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4013,8 +4013,19 @@ int strn_pattern_cmp(const char *patt, const
char *name, int wildchar)
p = strchr(patt, wildchar);
if (p && ((*(p + 1)) == 0))
len = p - patt;
-   else
+   else {
len = strlen(name);
+   /* If the model name parameter is empty, it should not match
+* against anything other than "*" or "".
+*/
+   if (unlikely(len == 0)) {
+   /* In the rare case your pattern is "". */
+   if (strlen(patt) == 0)
+   return 0;
+   else
+   return -1;
+   }
+   }

return strncmp(patt, name, len);
 }
--
1.5.3.4.g58ba4
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pata_cs5536: ATA driver for Geode companion chip

2007-10-14 Thread Andrew Paprocki
Just noticed something.. I'm not sure if this is due to a libata-dev
change or me switching to pata_cs5536, but my 128MB DOM on the PATA
port is hitting the ata_dma_blacklisted() case and it was not
previously. This did not happen under 2.6.22.6 using pata_amd. The
system is noticeably slower when forced to use PIO4 (as you would
expect).

Is this expected in the newer code, or is it a bug?

Previous 2.6.22.6 kernel using pata_amd:

pata_amd :00:0f.2: version 0.3.8
scsi4 : pata_amd
scsi5 : pata_amd
ata5: PATA max UDMA/100 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001ff00 irq 14
ata6: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001ff08 irq 15
ata5.00: ATA-0: , 060729DA, max MWDMA2
ata5.00: 256000 sectors, multi 1: LBA
ata5.00: configured for MWDMA2
ata6: port disabled. ignoring.

Up-to-date libata-dev using pata_cs5536:

scsi4 : pata_cs5536
scsi5 : pata_cs5536
ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xff00 irq 14
ata6: DUMMY
ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xff00 irq 14
ata5.00: ATA-0: , 060729DA, max MWDMA2
ata5.00: 256000 sectors, multi 1: LBA
ata5.00: device is on DMA blacklist, disabling DMA
ata5.00: configured for PIO4
ata5.00: device is on DMA blacklist, disabling DMA
ata5.00: configured for PIO4
ata5: EH complete

Thanks,
-Andrew

On 10/14/07, Andrew Paprocki <[EMAIL PROTECTED]> wrote:
> On 10/11/07, Alan Cox <[EMAIL PROTECTED]> wrote:
> > On Thu, 11 Oct 2007 03:38:19 -0400
> > "Martin K. Petersen" <[EMAIL PROTECTED]> wrote:
> >
> > >
> > > This is a driver for the ATA controller on the Geode CS5536 companion
> > > chip.  The PCI device ID for this device was previously claimed by
> > > pata_amd.c but the PIO timings were not correct.  This driver also
> > > works around a bug in some BIOSes that handle unaligned access to the
> > > PCI config registers poorly.  Finally, the driver allows fallback to
> > > using MSR registers for configuration on BIOSes that are truly
> > > broken.
> > >
> > > Signed-off-by: Martin K. Petersen <[EMAIL PROTECTED]>
> >
> > Acked-by: Alan Cox <[EMAIL PROTECTED]>
>
> I've been using the driver (boot drive on the port) since Martin's
> post and haven't experienced any problems.
>
> Tested-by: Andrew Paprocki <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pata_cs5536: ATA driver for Geode companion chip

2007-10-13 Thread Andrew Paprocki
On 10/11/07, Alan Cox <[EMAIL PROTECTED]> wrote:
> On Thu, 11 Oct 2007 03:38:19 -0400
> "Martin K. Petersen" <[EMAIL PROTECTED]> wrote:
>
> >
> > This is a driver for the ATA controller on the Geode CS5536 companion
> > chip.  The PCI device ID for this device was previously claimed by
> > pata_amd.c but the PIO timings were not correct.  This driver also
> > works around a bug in some BIOSes that handle unaligned access to the
> > PCI config registers poorly.  Finally, the driver allows fallback to
> > using MSR registers for configuration on BIOSes that are truly
> > broken.
> >
> > Signed-off-by: Martin K. Petersen <[EMAIL PROTECTED]>
>
> Acked-by: Alan Cox <[EMAIL PROTECTED]>

I've been using the driver (boot drive on the port) since Martin's
post and haven't experienced any problems.

Tested-by: Andrew Paprocki <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH #upstream 2/2] libata: track SLEEP state and issue SRST to wake it up

2007-10-13 Thread Andrew Paprocki
Tejun,

This patch applied on top of your set works for me. It clears the
error mask and completes any ATA_CMD_SLEEP when the drive is already
sleeping. I tried `hdparm -Y` twice and it didn't loop like before.

Thanks,
-Andrew

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 45b781b..7e0627f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5763,6 +5763,16 @@ void ata_qc_issue(struct ata_queued_cmd *qc)

/* if device is sleeping, schedule softreset and abort the link */
if (unlikely(qc->dev->flags & ATA_DFLAG_SLEEPING)) {
+   if (unlikely(qc->tf.command == ATA_CMD_SLEEP)) {
+   /* to prevent a loop, do not wake up if sleeping
+* and a sleep cmd is sent. instead, simply clear
+* the error mask and complete as if it was
+* successful.
+*/
+   qc->err_mask = 0;
+   ata_qc_complete(qc);
+   return;
+   }
link->eh_info.action |= ATA_EH_SOFTRESET;
ata_ehi_push_desc(&link->eh_info, "waking up from sleep");
ata_link_abort(link);

On 10/13/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> Jeff, please forget about this patchset.  I'll re-post updated version.
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH #upstream 2/2] libata: track SLEEP state and issue SRST to wake it up

2007-10-12 Thread Andrew Paprocki
Tejun,

I'm able to break my system using this patch. I had a hunch this might
be possible.. :) In short, if you issue a sleep command while the
drive is already sleeping, it puts libata into an infinite loop
resetting the port. I've illustrated the working test and the evil
hunch below. The sleep command itself will need a short-circuit out of
this logic in order to prevent this loop.

Also, in the working case below the hddtemp command actually blocked
until the drive was spun up before returning a valid temp. While
testing, I was able to get hddtemp to trigger the drive wake-up when
it was sleeping, but hddtemp then returned stating the drive was
sleeping. Re-running hddtemp until the drive was fully spun up
(another 5 seconds) kept returning that it was sleeping. I'll see if I
can reproduce this reliably. Am I correct in assuming the process
which triggers the wake-up should block?

-Andrew

Working case:

# hddtemp /dev/sdb
/dev/sdb: Hitachi HDS721010KLA330 : 35 C
# hdparm -Y /dev/sdb

/dev/sdb:
 issuing sleep command
# time hddtemp /dev/sdb
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata2.00: waking up from sleep
ata2: soft resetting link
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: configured for UDMA/100
ata2: EH complete
sd 1:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
/dev/sdb: Hitachi HDS721010KLA330 : 34 C
real0m 10.89s
user0m 0.00s
sys 0m 0.00s
# time hddtemp /dev/sdb
/dev/sdb: Hitachi HDS721010KLA330 : 34 C
real0m 0.26s
user0m 0.00s
sys 0m 0.00s

Evil DoS case:

# hddtemp /dev/sdb
/dev/sdb: Hitachi HDS721010KLA330 : 35 C
# hdparm -Y /dev/sdb

/dev/sdb:
 issuing sleep command
# hdparm -Y /dev/sdb
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata2.00: waking up from sleep
ata2: soft resetting link
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: configured for UDMA/100
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata2.00: waking up from sleep
ata2: soft resetting link
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: configured for UDMA/100
ata2: EH complete

to infinity

On 10/12/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> ATA devices in SLEEP mode don't respond to any commands.  SRST is
> necessary to wake it up.  Till now, when a command is issued to a
> device in SLEEP mode, the command times out, which makes EH reset the
> device and retry the command after that, causing a long delay.
>
> This patch makes libata track SLEEP state and issue SRST automatically
> if a command is about to be issued to a device in SLEEP.
>
> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> Cc: Bruce Allen <[EMAIL PROTECTED]>
> Cc: Andrew Paprocki <[EMAIL PROTECTED]>
> ---
>  drivers/ata/libata-core.c |   12 
>  drivers/ata/libata-eh.c   |4 +++-
>  include/linux/ata.h   |1 +
>  include/linux/libata.h|1 +
>  4 files changed, 17 insertions(+), 1 deletion(-)
>
> Index: work/include/linux/ata.h
> ===
> --- work.orig/include/linux/ata.h
> +++ work/include/linux/ata.h
> @@ -179,6 +179,7 @@ enum {
> ATA_CMD_VERIFY  = 0x40,
> ATA_CMD_VERIFY_EXT  = 0x42,
> ATA_CMD_STANDBYNOW1 = 0xE0,
> +   ATA_CMD_SLEEP   = 0xE6,
> ATA_CMD_IDLEIMMEDIATE   = 0xE1,
> ATA_CMD_INIT_DEV_PARAMS = 0x91,
> ATA_CMD_READ_NATIVE_MAX = 0xF8,
> Index: work/include/linux/libata.h
> ===
> --- work.orig/include/linux/libata.h
> +++ work/include/linux/libata.h
> @@ -145,6 +145,7 @@ enum {
> ATA_DFLAG_PIO   = (1 << 12), /* device limited to PIO mode */
> ATA_DFLAG_NCQ_OFF   = (1 << 13), /* device limited to non-NCQ 
> mode */
> ATA_DFLAG_SPUNDOWN  = (1 << 14), /* XXX: for spindown_compat */
> +   ATA_DFLAG_SLEEPING  = (1 << 15), /* device is sleeping */
> ATA_DFLAG_INIT_MASK = (1 << 16) - 1,
>
> ATA_DFLAG_DETACH= (1 << 16),
> Index: work/drivers/ata/libata-core.c
> ===
> --- work.orig/drivers/ata/libata-core.c
> +++ work/drivers/ata/libata-core.c
> @@ -5553,6 +5553,10 @@ void __ata_qc_complete(struct ata_queued
> case ATA_CMD_SET_MULTI: /* multi_count changed */
> eh_action |= ATA_EH_REVALIDATE;
> break;
> +

Re: pata_cs5536: ATA driver for Geode companion chip

2007-10-11 Thread Andrew Paprocki
Martin,

Just wanted to report that the 2.6.23 libata-dev with the MSR
pata_cs5536 applied to it is working fine with my LX board. I am using
the first PATA port as my boot/root drive right now.

Thanks,
-Andrew

scsi4 : pata_cs5536
scsi5 : pata_cs5536
ata5: PATA max UDMA/100 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001ff00 irq 14
ata6: DUMMY
ata5.00: ATA-0: , 060729DA, max MWDMA2
ata5.00: 256000 sectors, multi 1: LBA
ata5.00: configured for MWDMA2
scsi 4:0:0:0: Direct-Access ATA   0607 PQ: 0 ANSI: 5
sd 4:0:0:0: [sdc] 256000 512-byte hardware sectors (131 MB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't
support DPO or FUA
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: smartd causing SATA timeouts on sleeping drives

2007-10-10 Thread Andrew Paprocki
Bruce/Tejun,

Just so you both know, even when specifying '-n standby,q' in smartd,
it still triggers timeouts on my system. The timeouts are no longer
coming from the default half-hour checks, but from my configured
self-test times with the '-s' option. It appears smartd overrides the
'-n' parameter in this case, triggering the libata soft reset. This is
another case that would be fixed if libata does the SRST
automatically.

Thanks,
-Andrew

Oct 11 02:16:52 (none) daemon.info smartd[23848]: Device: /dev/sdb,
STANDBY mode ignored due to scheduled self test (47 checks skipped)
Oct 11 02:17:03 (none) user.err kernel: ata2.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x2 frozen
Oct 11 02:17:03 (none) user.err kernel: ata2.00: cmd
b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
Oct 11 02:17:03 (none) user.warn kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 11 02:17:08 (none) user.warn kernel: ata2: port is slow to
respond, please be patient (Status 0xd0)
Oct 11 02:17:10 (none) user.info kernel: ata2: soft resetting port
Oct 11 02:17:10 (none) user.info kernel: ata2: SATA link up 1.5 Gbps
(SStatus 113 SControl 310)
Oct 11 02:17:10 (none) user.info kernel: ata2.00: configured for UDMA/100
Oct 11 02:17:10 (none) user.info kernel: ata2: EH complete

On 10/10/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> Maybe what should be done is to track sleep mode in libata and issue
> SRST automatically if a command is issued to a sleeping drive.  I'll
> work on it.
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: smartd causing SATA timeouts on sleeping drives

2007-10-10 Thread Andrew Paprocki
On 10/10/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> Maybe what should be done is to track sleep mode in libata and issue
> SRST automatically if a command is issued to a sleeping drive.  I'll
> work on it.

Another tidbit of info.. I just went through the pain of tracking down
everything in my system (system apps as well as my own code)
responsible for waking up sleeping drives. My end goal was to make
sure sleeping drives stayed asleep to reduce power consumption and
wear due to unnecessary spin-ups. I'm sure distros targeting laptops
or embedded systems that use live disks go through this pain
frequently.

Would all SRST cmds sent from libata come from the ata_std_softreset()
call? Could something like SystemTap be used without modifying libata
to track all pids which cause that function to be called? If that
would work, it could be an easy way to do what I did manually. That
is, unless someone knows of an easier way that I'm overlooking.. :) I
might give that a try to see if it works well and document the result.

-Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Permanent disk shutdown instead of soft/hard reset?

2007-10-10 Thread Andrew Paprocki
I'm currently running into a situation where I have 4 SATA drives in a
striped array where one of the drives is failing (/ has failed). The
single drive failure manifests itself as ext3 errors and libata SCSI
media errors which occur non-stop as software attempts to read/write
to the mounted array. Because libata is seeing media errors, the bad
drive endlessly soft resets while the software is still running and
attempting to access the drive. This winds up hanging the entire
system because the software (consider it a 'find' command running on
the drive) occurs in the init.d boot scripts. The end result is that a
login prompt is never reached until the software finishes what it is
doing and hours of soft resets have occurred.

Is there any way that this behavior can be stopped by permanently
disconnecting the drive after a configurable number of errors that
would otherwise soft reset? Does the layer allow for the concept of a
full disk shutdown rather than a reset? I assume this would have to
forcefully unmount any active mounts which use the drive/array to
ensure that no subsequent cmds would cause libata to attempt to
reconnect to the bad drive(s). Is this even possible?

Using smartd is invaluable for detecting failing drives, but when the
failed drive prevents the system from booting, it is hard to recover
remotely. It may not be possible to "recover" (e.g. If the failed
drive is the boot drive), but that should be up to the system
designer. In my case, I would still want to boot into the system (I do
not boot from the array), establish network connectivity, and "phone
home" that a permanent hardware failure has occurred in the array.

Thanks,
-Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pata_cs5536: ATA driver for Geode companion chip

2007-10-10 Thread Andrew Paprocki
On 10/10/07, Jordan Crouse <[EMAIL PROTECTED]> wrote:
> I never heard back from anybody - so either nobody is using pata_amd
> (which I suspect), or they didn't have any problems.  Go ahead and merge,
> and I'll do a sanity check on a few boards next week just to make sure.

I am using the CS5536 currently with pata_amd and it worked without
issue. I only used the PATA port for a short time, though, before
switching to USB + SATA. I will now be going back to using the PATA
this week, so I'll try out the new driver in place of pata_amd and
write back if there are any problems.

Thanks,
-Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: smartd causing SATA timeouts on sleeping drives

2007-10-07 Thread Andrew Paprocki
Yes, the drives were in sleep mode. That is the only case where these
timeouts/resets occur. It seems like the "-n never" mode of smartd
should send the SRST if the drive is truly sleeping, otherwise libata
will soft reset the drive when it sees the timeout. The "-n standby"
option sounds like a more sane default, but there might be legacy
reasons why it isn't configured that way.

On 10/8/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> Andrew Paprocki wrote:
> > I found out after posting that this is governed by the -n parameter to
> > smartd. The default behavior is "-n never" which means smartd will
> > send the cmds regardless of the drive status. The man page indicates
> > that may cause the drive to spin-up to answer the cmds. It appears for
> > some drives (?) the cmds just timeout and libata performs a soft
> > reset. I'm going to change my setup to "-n standby", but it seems
> > strange to me that "-n never" is the default if it has this drastic of
> > a result (at least under Linux). Is there any way to know if the drive
> > will actually spin up as a result of the cmd instead of timing out?
>
> If in standby mode, the drive would automatically spin up to process
> command.  If in sleep mode, it needs SRST to spin back up.  Was your
> drive in sleep mode?
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: smartd causing SATA timeouts on sleeping drives

2007-10-07 Thread Andrew Paprocki
I found out after posting that this is governed by the -n parameter to
smartd. The default behavior is "-n never" which means smartd will
send the cmds regardless of the drive status. The man page indicates
that may cause the drive to spin-up to answer the cmds. It appears for
some drives (?) the cmds just timeout and libata performs a soft
reset. I'm going to change my setup to "-n standby", but it seems
strange to me that "-n never" is the default if it has this drastic of
a result (at least under Linux). Is there any way to know if the drive
will actually spin up as a result of the cmd instead of timing out?

On 10/6/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> smartd should probably issue CHECK POWER MODE (0xe5) before issuing
> other commands.  Bruce?
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


smartd causing SATA timeouts on sleeping drives

2007-10-05 Thread Andrew Paprocki
Tejun/Bruce,

I tracked down the source of timeouts I have been frequently getting.
It appears smartd is not properly handling drives that are spun down
by the BIOS ACPI settings. I have SATA timeouts which occur every half
hour (the default -i 1800 in smartd) that do not occur when smartd is
not running. The drives smartd is configured to look at have a sleep
time configured in the BIOS. When the drives are asleep, I get a soft
reset every half hour as smartd attempts to access the drives. While
in this state, smartd also reports bad state to syslog (e.g.
temperature changes to 200C). Just for comparison, hddtemp knows the
drives are sleeping:

# hddtemp /dev/sda
/dev/sda: Hitachi HDS721010KLA330 : drive is sleeping
# ls /storage
... wakes up the drives ...
# hddtemp /dev/sda
/dev/sda: Hitachi HDS721010KLA330 :  29 C or  F

I'm pasting the example cmd / timeout error / soft reset below. Also,
I'm pasting the invalid settings which smartd detects when in this
state. What needs to change for smartd to recognize drives are
sleeping and either not perform its checks, or forcefully wake them up
to perform them? (Should that be a configuration parameter in smartd?)

Thanks,
-Andrew

# uname -a
Linux (none) 2.6.22.6 #5 Mon Sep 10 02:15:22 EDT 2007 i586 unknown
(Using sata_sil on 3114 chips)

# smartctl -V
smartmontools release 5.38 dated 2006/12/20 at 20:37:59 UTC
...
smartctl compile dated Sep 17 2007 at 13:47:25
(repository code checked out on Sep 17th)

# cat /var/run/smartd.conf
/dev/sda -d ata -a -S on -s (S/../.././02|L/../../6/03)
/dev/sdb -d ata -a -S on -s (S/../.././02|L/../../6/03)

What happens every 30 minutes when drives are sleeping:

Oct  6 01:05:48 (none) user.err kernel: ata2.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x2 frozen
Oct  6 01:05:48 (none) user.err kernel: ata2.00: cmd
b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
Oct  6 01:05:48 (none) user.warn kernel:  res
40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct  6 01:05:53 (none) user.warn kernel: ata2: port is slow to
respond, please be patient (Status 0xd0)
Oct  6 01:05:55 (none) user.info kernel: ata2: soft resetting port
Oct  6 01:05:56 (none) user.info kernel: ata2: SATA link up 1.5 Gbps
(SStatus 113 SControl 310)
Oct  6 01:05:56 (none) user.info kernel: ata2.00: configured for UDMA/100
Oct  6 01:05:56 (none) user.info kernel: ata2: EH complete
Oct  6 01:05:56 (none) user.notice kernel: sd 1:0:0:0: [sdb]
1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 01:05:56 (none) user.notice kernel: sd 1:0:0:0: [sdb] Write
Protect is off
Oct  6 01:05:56 (none) user.debug kernel: sd 1:0:0:0: [sdb] Mode
Sense: 00 3a 00 00
Oct  6 01:05:56 (none) user.notice kernel: sd 1:0:0:0: [sdb] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA

Invalid attribute values:

Oct  2 22:35:21 (none) daemon.info smartd[585]: Device: /dev/sda,
SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 87 to 86
Oct  2 23:35:21 (none) daemon.info smartd[585]: Device: /dev/sda,
SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 86 to 85
Oct  5 20:05:56 (none) daemon.info smartd[585]: Device: /dev/sdb,
SMART Prefailure Attribute: 3 Spin_Up_Time changed from 84 to 85
Oct  6 01:05:38 (none) daemon.info smartd[585]: Device: /dev/sda,
SMART Usage Attribute: 194 Temperature_Celsius changed from 200 to 206
Oct  6 01:05:56 (none) daemon.info smartd[585]: Device: /dev/sdb,
SMART Usage Attribute: 194 Temperature_Celsius changed from 193 to 200

Once the drives are started up, those values report:

  3 Spin_Up_Time0x0007   085   085   024Pre-fail
Always   -   821 (Average 820)
  7 Seek_Error_Rate 0x000b   100   100   067Pre-fail
Always   -   0
194 Temperature_Celsius 0x0002   193   193   000Old_age
Always   -   31 (Lifetime Min/Max 24/67)
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Chipset selection for NCQ md RAID

2007-09-25 Thread Andrew Paprocki
Given the current kernel (2.6.22.8), which SATA chipset would you feel
most comfortable plugging in to achieve:

- 16 1TB Hitachi 7K1000 drives
- NCQ support
- Kernel md RAID (ie hardware RAID not necessary)
- Max transfer rate to OS (PCIe should handle 16 drives given their
current transfer rate?)

(Assuming the required number of chips of any chipset could be made
available on the bus.)
If support for SiI chips is the most robust, would multiple
SiI3124/SiI3132 chips be most reliable? Or is support for something
like a Marvell 88SX6081 good enough so that only 2 chips are needed?.
I have not owned any of these chips, and my impression is that the SiI
chips have some of the most robust support. The comments at the top of
sata_mv.c scare me, even if they might be out of date... :)

What about port multiplier support? Do they ever introduce stability
problems with the drivers?

Is native libata support for high-density Areca cards planned? Does
anyone know if the manufacturer driver is good enough to rely on for
the above config to "just work"? I'd personally like to stay within
libata. (Comments from any Areca Linux users with an ARC-1261ML?
http://www.areca.com.tw/products/pcie341.htm)

Thanks in advance, -Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22.6 sata_sil device errors & timeouts

2007-09-18 Thread Andrew Paprocki
It appears to be the '-o on' causing the problem. If I remove that,
the errors go away. The strange part is that according to the smartctl
documentation, my drives support it:

# smartctl -c /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-7 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (4797) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:(  80) minutes.

Thanks, -Andrew

On 9/18/07, Bruce Allen <[EMAIL PROTECTED]> wrote:
> Does removing '-o on' and/or '-S on' eliminate the errors?
>
>
> On Mon, 17 Sep 2007, Andrew Paprocki wrote:
>
> > Bruce,
> >
> > Just built it -- it eliminated the HSM violations, but I still get the
> > device errors:
> >
> > smartmontools release 5.38 dated 2006/12/20 at 20:37:59 UTC
> > (I see the above date, even though I verified it is built from CVS head)
> >
> > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > ata2.00: cmd b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
> > res 51/04:f8:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
> > ata2.00: configured for UDMA/100
> > ata2: EH complete
> >
> > This is what it is in smartd.conf:
> > /dev/sda -d ata -a -o on -S on
> > /dev/sdb -d ata -a -o on -S on
> > /dev/sdc -d ata -a -o on -S on
> >
> > Thanks, -Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22.6 sata_sil device errors & timeouts

2007-09-17 Thread Andrew Paprocki
Bruce,

Just built it -- it eliminated the HSM violations, but I still get the
device errors:

smartmontools release 5.38 dated 2006/12/20 at 20:37:59 UTC
(I see the above date, even though I verified it is built from CVS head)

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: cmd b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
 res 51/04:f8:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
ata2.00: configured for UDMA/100
ata2: EH complete

This is what it is in smartd.conf:
/dev/sda -d ata -a -o on -S on
/dev/sdb -d ata -a -o on -S on
/dev/sdc -d ata -a -o on -S on

Thanks, -Andrew

On 9/17/07, Bruce Allen <[EMAIL PROTECTED]> wrote:
> Hi Andrew,
>
> Please build the CVS version (unreleased) of smartmontools.  The versions
> below are dated 2006/12/20 and 2006/04/12.  You need to build a code
> version based on the past few weeks of code.
>
> Cheers,
> Bruce
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22.6 sata_sil device errors & timeouts

2007-09-17 Thread Andrew Paprocki
On 9/17/07, Andrew Paprocki <[EMAIL PROTECTED]> wrote:
> On 9/17/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> > Upgrading smartd should fix it.  Which version are you using?
>
> smartmontools release 5.36 dated 2006/04/12 at 17:39:01 UTC
> smartmontools configure arguments: '--prefix=/opt/smartmontools'
>
> I see a newer experimental 5.37 is out. I'll give it a go and see if
> the trace goes away.

Upgrading made it worse.. I now receive the same device errors as well
as a slew of new "HSM violation" errors when smartd starts up:

smartmontools release 5.37 dated 2006/12/20 at 20:37:59 UTC
smartmontools configure arguments:  '--prefix=/opt/smartmontools'

ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: cmd b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 126976 in
 res 50/00:f8:00:4f:c2/00:00:00:00:00/a0 Emask 0x202 (HSM violation)
ata5: soft resetting port
ata5.00: configured for UDMA/100
ata5: EH complete

# smartctl -i /dev/sda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar T7K250 series
Device Model: HDT722525DLA380
Serial Number:VDK41GT5F3S4JK
Firmware Version: V44OA96A
User Capacity:250,059,350,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:Mon Sep 17 15:25:29 2007 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Thanks, -Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22.6 sata_sil device errors & timeouts

2007-09-17 Thread Andrew Paprocki
On 9/17/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> [cc'ing Bruce Allen]
>
> Andrew Paprocki wrote:
> > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > ata2.00: cmd b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
> >  res 51/04:f8:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
> > ata2.00: configured for UDMA/100
> > ata2: EH complete
>
> Upgrading smartd should fix it.  Which version are you using?

smartmontools release 5.36 dated 2006/04/12 at 17:39:01 UTC
smartmontools configure arguments: '--prefix=/opt/smartmontools'

I see a newer experimental 5.37 is out. I'll give it a go and see if
the trace goes away.

Thanks, -Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22.6 sata_sil device errors & timeouts

2007-09-17 Thread Andrew Paprocki
On 9/17/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> Andrew Paprocki wrote:
> > boot configuration more complicated if booting off the pata drive. Is
> > there any way to control which order the drives are assigned when not
> > building w/ modules?
>
> Please use mount-by-LABEL or UUID.

Thanks, wasn't aware of that functionality. Works like a charm.

> > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x240 action 0x2 frozen
> > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x28 action 0x0
>
> In both cases, SError is indicating transmission problem. Handshake
> error and Unrecognized FIS type in the first case, 10b to 8b decode
> error and CRC error on the second case.  I can't tell why but signals
> flying through those redish cables are getting corrupted.

I've replaced the cables with a different brand I had laying around,
and I haven't seen a problem yet. I'll need to test it heavily, though
to see if I can trigger anything to pop up.

I didn't mention it before, but I'm also getting these errors every
time I boot. I'm thinking they're related to the drive not supporting
cmds that smartd is sending it. If so, is there any way that
libata/smartd can handle this more gracefully? This stuff spews into
dmesg and gives a scare that there is a real hardware problem that may
cause data corruption. I get exactly 6 instances of each of these two
blocks of output prior to reaching the login prompt:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: cmd b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
 res 51/04:f8:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
ata1.00: configured for UDMA/100
ata1: EH complete

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: cmd b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
 res 51/04:f8:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
ata2.00: configured for UDMA/100
ata2: EH complete

Thanks, -Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.22.6 sata_sil device errors & timeouts

2007-09-17 Thread Andrew Paprocki
I have a sata_sil 3114 integrated chipset with 2 Hitachi 250gb sata
drives connected, and I'm seeing errors print out during use. The
problems seem to get much worse when I switch from these 250gb drives
to brand new Hitachi HDS721010KLA330 1tb drives, and eventually the
system hangs. With the 250gb drives, I haven't seen a hang, but I
still see the errors below.

Also, I'm seeing two other "issues":

1) When built with modules disabled, and libata handling the sata +
pata (AMD CS5536) connections, the pata drives come _after_ the sata
drives (i.e. w/ 2 sata drives, the first IDE drive is sdc). This makes
boot configuration more complicated if booting off the pata drive. Is
there any way to control which order the drives are assigned when not
building w/ modules?

2) The drives display that they support udma6 in hdparm -I, but only
udma5 is being used. And hdparm -i only shows up to udma2.. ?

Any ideas? Thanks, -Andrew


ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x240 action 0x2 frozen
ata2.00: cmd 35/00:00:80:31:54/00:04:02:00:00/e0 tag 0 cdb 0x0 data 524288 out
 res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2: soft resetting port
ata2: port is slow to respond, please be patient (Status 0xd1)
ata2: SRST failed (errno=-16)
ata2: hard resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: configured for UDMA/100
ata2: EH complete
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA


ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x28 action 0x0
ata1.00: (BMDMA2 stat 0x617d9009)
ata1.00: cmd 25/00:80:00:d6:bd/00:02:0b:00:00/e0 tag 0 cdb 0x0 data 327680 in
 res 51/04:e0:9f:d7:bd/00:00:0b:00:00/eb Emask 0x1 (device error)
ata1.00: configured for UDMA/100
ata1: EH complete
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA


# hdparm -i /dev/sda

/dev/sda:

 Model=HDT722525DLA380 , FwRev=V44OA96A,
SerialNo=  VDK41GT5F3S4JK
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=52
 BuffType=DualPortCache, BuffSize=7674kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-7 T13 1532D revision 1:  ATA/ATAPI-2
ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

# hdparm -I /dev/sda | grep udma
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6

# lspci -vv -d 1095:3114
:00:11.0 0180: 1095:3114 (rev 02)
Subsystem: 1095:3114
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- SERR- http://vger.kernel.org/majordomo-info.html


Re: JMicron JMB363 issue fixed / ICH8 RAID volume trace

2007-05-24 Thread Andrew Paprocki

Tejun, fdisk -l output is attached

Basically, in the ICH8 BIOS:
/dev/sda + sdb = 2 500GB drives in RAID1 configuration
/dev/sdc + sdd + sde + sdf = 4 320GB drives in RAID5 configuration

/dev/sdg is a 320GB boot drive connected to the JMB363 chipset

Is there some kind of problem when probing these partitions because
they are fake software RAID through the ICH8? The messages only spew
at boot time.

Thanks, -Andrew

On 5/24/07, Tejun Heo <[EMAIL PROTECTED]> wrote:

Andrew Paprocki wrote:
> Ethan, I believe my 2.6.22-rc2 kernel *is* working with respect to the
> libata problem. By removing CONFIG_IDE, the system now works fine. The
> reason why I thought that libata was still having a problem was
> because the system would hang after agpgart printed:
> "agpgart: detected an Intel 965G chipset."
>
> I *thought* the system was once again waiting for the root drive to
> become available, but it turns out it was actually hung. I found
> another user with a Gigabyte board with the same issue. I also have
> 4GB ram.. http://lists.opensuse.org/opensuse-amd64/2007-04/msg1.html
>
> I added "mem=4096M" to the boot line and now everything is working
> properly. The IDE subsystem is off and libata is handling everything.
> I'll post on the kernel mailing list to see if this is a known issue
> w/ agpgart or amd64+4gb.
>
> I do see some trace print out complaining about reads past the end of
> the device.. Does anyone have an idea if these are harmful? They are
> coming from my ICH8 RAID volumes:
>
> sda: sda1
> sda: p1 exceeds device capacity
> sdb: unknown partition table
> sdc: sdc1
> sdc: p1 exceeds device capacity
> sdf1
> sdf: p1 exceeds device capacity
> ...
> attempt to access beyond end of device
> sda: rw=0, want=1953533832, limit=976773168
> Buffer I/O error on device sda1, logical block 244191472
> (repeats about 25 times)
> attempt to access beyond end of device
> sdf: rw=0, want=1875410824, limit=625142448
> (repeats about 25 times)
> sdc: rw=0, want=1875410824, limit=625142448
> attempt to access beyond end of device
> (repeats about 25 times)

What does 'fdisk -l' say?

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/sdd doesn't contain a valid partition table
Disk /dev/sde doesn't contain a valid partition table

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   *   1  121602   9767659527  HPFS/NTFS

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Disk /dev/sdc: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1  116739   9377044487  HPFS/NTFS

Disk /dev/sdd: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Disk /dev/sde: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Disk /dev/sdf: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/sdf1   1  116739   9377044487  HPFS/NTFS

Disk /dev/sdg: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/sdg1   1   32680   2625003527  HPFS/NTFS
/dev/sdg2   32680   32713  262144   82  Linux swap / Solaris
/dev/sdg3   *   32713   3891449806336   83  Linux


Re: JMicron JMB363 issue fixed / ICH8 RAID volume trace

2007-05-23 Thread Andrew Paprocki

Ethan, I believe my 2.6.22-rc2 kernel *is* working with respect to the
libata problem. By removing CONFIG_IDE, the system now works fine. The
reason why I thought that libata was still having a problem was
because the system would hang after agpgart printed:
"agpgart: detected an Intel 965G chipset."

I *thought* the system was once again waiting for the root drive to
become available, but it turns out it was actually hung. I found
another user with a Gigabyte board with the same issue. I also have
4GB ram.. http://lists.opensuse.org/opensuse-amd64/2007-04/msg1.html

I added "mem=4096M" to the boot line and now everything is working
properly. The IDE subsystem is off and libata is handling everything.
I'll post on the kernel mailing list to see if this is a known issue
w/ agpgart or amd64+4gb.

I do see some trace print out complaining about reads past the end of
the device.. Does anyone have an idea if these are harmful? They are
coming from my ICH8 RAID volumes:

sda: sda1
sda: p1 exceeds device capacity
sdb: unknown partition table
sdc: sdc1
sdc: p1 exceeds device capacity
sdf1
sdf: p1 exceeds device capacity
...
attempt to access beyond end of device
sda: rw=0, want=1953533832, limit=976773168
Buffer I/O error on device sda1, logical block 244191472
(repeats about 25 times)
attempt to access beyond end of device
sdf: rw=0, want=1875410824, limit=625142448
(repeats about 25 times)
sdc: rw=0, want=1875410824, limit=625142448
attempt to access beyond end of device
(repeats about 25 times)

I've attached the full dmesg output as dmesg.052207.txt.

Thanks -Andrew

On 5/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> Everyone, I tried rebuilding 2.6.22-rc2 last night with CONFIG_IDE
> disabled, but it still produces the same problem. The relevant config
> options:
>
> # CONFIG_IDE is not set
> CONFIG_ATA=y
> CONFIG_ATA_ACPI=y
> CONFIG_SATA_AHCI=y
> CONFIG_ATA_PIIX=y
> CONFIG_PATA_JMICRON=y
>
> Ethan, you mention that this is a known issue.. I can't find any link
> to this problem. This is happening on a cleanly rebuilt kernel, so I'm
> not sure if this has to do with Debian 4.0r0 probing the module
> incorrectly. I already have the OS installed, and I figured a newer
> kernel would have resolved this issue. Unlike the bug report I linked
> to, I am not seeing the driver detect JMB363 & JMB361 in the same boot
> log. Even when everything works, it only detects a JMB361.

I've tried to reproduce this issue under GA-965P-DQ6.
JMB363 works fine in 2.6.22-rc2.
The dmesg log is attached.
It will not show any device name in pata_jmicron.

> A working boot looks like this:
>
> JMB361: IDE controller at PCI slot :04:00.1
> ACPI: PCI Interrupt :04:00.1[B] -> GSI 18 (level, low) -> IRQ 58
> JMB361: chipset revision 2
> JMB361: 100% native mode on irq 58
> ide0: BM-DMA at 0xa000-0xa007, BIOS settings: hda:pio, hdb:pio
> ide1: BM-DMA at 0xa008-0xa00f, BIOS settings: hdc:DMA, hdd:DMA

If you disabled the entire old-IDE driver, this message should not be
existed.
It should be like this:

ACPI: PCI Interrupt :03:00.1[B] -> GSI 18 (level, low) -> IRQ 19
PCI: Setting latency timer of device :03:00.1 to 64
scsi6 : pata_jmicron
scsi7 : pata_jmicron
ata7: PATA max UDMA/100 cmd 0x0001a000 ctl 0x0001a402 bmdma 0x0001b000 irq
0
ata8: PATA max UDMA/100 cmd 0x0001a800 ctl 0x0001ac02 bmdma 0x0001b008 irq
0

> No use of acpi=off, noapic, nolapic seems to affect the JMB361 from
> being detected properly or not.
>
> Any next steps? Ethan, do you have more information about this
> particular issue with the DQ6 motherboard?
>
> Thanks -Andrew


Linux version 2.6.22-rc2-plateado ([EMAIL PROTECTED]) (gcc version 4.1.2 
20061115 (prerelease) (Debian 4.1.1-21)) #3 SMP Mon May 21 19:38:38 EDT 2007
Command line: root=/dev/sdg3 ro mem=4096M
BIOS-provided physical RAM map:
 BIOS-e820:  - 00097c00 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - dfee (usable)
 BIOS-e820: dfee - dfee3000 (ACPI NVS)
 BIOS-e820: dfee3000 - dfef (ACPI data)
 BIOS-e820: dfef - dff0 (reserved)
 BIOS-e820: f000 - f400 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
 BIOS-e820: 0001 - 00012000 (usable)
Entering add_active_range(0, 0, 151) 0 entries of 256 used
Entering add_active_range(0, 256, 917216) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000F6E70, 0014 (r0 GBT   )
ACPI: RSDT DFEE3040, 0034 (r1 GBTGBTUACPI 42302E31 GBTU  1010101)
ACPI: FACP DFEE30C0, 0074 (r1 GBTGBTUACPI 42302E31 GBTU  1010101)
ACPI: DSDT DFEE3180, 49F4 (r1 GBTGBTUACPI 1000 MSFT  10C)
ACPI: FACS DFEE, 0040
ACPI: HPET DFEE7CC0, 0038 (r1 GBTGBTUACPI 42302E31 GBTU   98)
ACPI: MCFG DFEE7D40, 003C (r1 GBTGBTUACPI 42302E31 GBTU  1010101)

Re: JMicron JMB361 sporadically failing to initialize from at least 2.6.18.4 to 2.6.22-rc2

2007-05-22 Thread Andrew Paprocki

Everyone, I tried rebuilding 2.6.22-rc2 last night with CONFIG_IDE
disabled, but it still produces the same problem. The relevant config
options:

# CONFIG_IDE is not set
CONFIG_ATA=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_AHCI=y
CONFIG_ATA_PIIX=y
CONFIG_PATA_JMICRON=y

Ethan, you mention that this is a known issue.. I can't find any link
to this problem. This is happening on a cleanly rebuilt kernel, so I'm
not sure if this has to do with Debian 4.0r0 probing the module
incorrectly. I already have the OS installed, and I figured a newer
kernel would have resolved this issue. Unlike the bug report I linked
to, I am not seeing the driver detect JMB363 & JMB361 in the same boot
log. Even when everything works, it only detects a JMB361.

A working boot looks like this:

JMB361: IDE controller at PCI slot :04:00.1
ACPI: PCI Interrupt :04:00.1[B] -> GSI 18 (level, low) -> IRQ 58
JMB361: chipset revision 2
JMB361: 100% native mode on irq 58
   ide0: BM-DMA at 0xa000-0xa007, BIOS settings: hda:pio, hdb:pio
   ide1: BM-DMA at 0xa008-0xa00f, BIOS settings: hdc:DMA, hdd:DMA

No use of acpi=off, noapic, nolapic seems to affect the JMB361 from
being detected properly or not.

Any next steps? Ethan, do you have more information about this
particular issue with the DQ6 motherboard?

Thanks -Andrew

On 5/21/07, Alan Cox <[EMAIL PROTECTED]> wrote:

> Does anyone know what is causing this and if it is fixed in any dev
> branch? I've tried the stock Debian etch netinst 2.6.18.4 kernel, as
> well as my own build of 2.6.21.1 and 2.6.22-rc2 and they all exhibit
> the same problem.
>
> Let me know what I can do to help debug this on my end.

What configuration options have you got selected - in particular if you
have the libata support for SATA enabled then the kernel configures the
hardware to expose the AHCI interface for SATA and the PATA interface
separately. This requires you to be using the libata drivers for both the
SATA and PATA components of the hardware if you wish to use both.



-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


JMicron JMB361 sporadically failing to initialize from at least 2.6.18.4 to 2.6.22-rc2

2007-05-21 Thread Andrew Paprocki

I have a Gigabyte GA-965P-DQ6 motherboard which has onboard Intel ICH8
raid as well as a "Gigabyte" (rebranded JMicron) chipset for 2
separate SATA ports. When I boot the machine, it completely
sporadically fails to initialize the JMB361 chipset which it detects,
claiming "dma_base is invalid". When it works (~20% of the time), it
will correctly detect the chip and all drives connected to it. It
feels like a race condition..

I have a DVD-RW & my boot SATAII drive connected to the controller, so
this bug has the nasty side effect of hanging my machine for eternity
waiting for the root drive to appear.

The dma_base trace is listed below:

JMB361: IDE controller at PCI slot :03:00.0
ACPI: PCI Interrupt :03:00.0[A] -> GSI 17 (level, low) -> IRQ 177
JMB361: chipset revision 3
JMB361: 100% native mode on irq 177
JMB361: dma_base is invalid
ide0: JMB361 Bus-Master DMA disabled (BIOS)
JMB361: dma_base is invalid
ide1: JMB361 Bus-Master DMA disabled (BIOS)

This seems to be the same problem as reported here:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg339806.html

And numerous other people seem to be hitting this in newer kernels. A
Google search for 'jmb361 dma_base' turns up a lot of hits.

Does anyone know what is causing this and if it is fixed in any dev
branch? I've tried the stock Debian etch netinst 2.6.18.4 kernel, as
well as my own build of 2.6.21.1 and 2.6.22-rc2 and they all exhibit
the same problem.

Let me know what I can do to help debug this on my end.

Thanks
-Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html