Re: Promise SATA300 TX4: errors, oops in ext3 code

2007-10-02 Thread Clemens Koller

Alexander Sabourenkov schrieb:
 Have you checked your memory already (memtest86)?
 [...]
 Again... sounds like bad memory to me.
 Nightly memtest86  run : 11 hours, 23 passes, 0 errors.

Okay, I have no idea about any bugs there.
You have several options: Find a 100% working vanilla kernel for your
problem (minimal configuration, skip i.e. the sound stuff, ...).
And then git bisect with a known bad kernel.

Same thing in hardware: move components (Controllers + HDD) to/from a working
machine and verify...

Regards,

Clemens Koller
__
RD Imaging Devices
Anagramm GmbH
Rupert-Mayer-Straße 45/1
Linhof Werksgelände
D-81379 München
Tel.089-741518-50
Fax 089-741518-19
http://www.anagramm-technology.com
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Promise SATA300 TX4: errors, oops in ext3 code

2007-10-02 Thread Alexander Sabourenkov

Clemens Koller wrote:

Okay, I have no idea about any bugs there.
You have several options: Find a 100% working vanilla kernel for your
problem (minimal configuration, skip i.e. the sound stuff, ...).
And then git bisect with a known bad kernel.


I'm afraid there is no 100% working kernel. Problems were reported as 
far back as 2.6.11, and I never found a single thread in mailing lists 
ending with problem solved (not counting PSU and thermal issues).


Same thing in hardware: move components (Controllers + HDD) to/from a 
working

machine and verify...


Unfortunately right now I have no yet-untested machine - both I have 
show same problems.



Time permitting I'll test 2.6.23 kernel, libata-dev branch, 
SATA300/SATA150 modes and agressive card cooling as you suggested in 
your other email and  document all this on a separate page or maybe a wiki.



--

./lxnt



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Promise SATA300 TX4: errors, oops in ext3 code

2007-10-01 Thread Clemens Koller

Alexander Sabourenkov schrieb:
 Hardware:  Athlon64, Asus A8V, Promise SATA300 TX4, 2xSeagate 7200.10
 320G, jumper-limited to SATA150.
 Kernel : 2.6.22.9 amd64

 Problem:
 Heavy load causes errors and triggers oops.

Have you checked your memory already (memtest86)?

We have several applications with Promise controllers on strange
hardware and we never had integrity problems with i.e. not so standard
SATA connections over custom vaccum-tight connectors.

 Problems were blamed:
   - SATA300 being too 'hot'  (jumpered the drives)

Is this a common known problem with your harddrives or controller?
(ask google) Otherwise, it sounds like a problem with broken hardware.

   - cables (work perfectly on onboard controller)
   - interrupt sharing (found the only slot which does not share
 interrupt line)
   - cooling (3 fans installed, smartctl-reported temperature at max load
 dropped to 35C)

Try to heat up your memory a little (your wife's hair blower).
If it fails more often, your memory is most likely broken.

   - weak PSU (installed 600W FSP)
   - kernel bugs (upgraded to 2.6.22.9)

 All those measures significantly dropped error rate (from about 20 to
 2-4 per mirror rebuild) but did not eliminate the problem.

Again... sounds like bad memory to me.

Juat my $0.05.
Regards,

Clemens Koller
__
RD Imaging Devices
Anagramm GmbH
Rupert-Mayer-Straße 45/1
Linhof Werksgelände
D-81379 München
Tel.089-741518-50
Fax 089-741518-19
http://www.anagramm-technology.com


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Promise SATA300 TX4: errors, oops in ext3 code

2007-10-01 Thread Alexander Sabourenkov

Clemens Koller wrote:

Alexander Sabourenkov schrieb:
  Hardware:  Athlon64, Asus A8V, Promise SATA300 TX4, 2xSeagate 7200.10
  320G, jumper-limited to SATA150.
  Kernel : 2.6.22.9 amd64
 
  Problem:
  Heavy load causes errors and triggers oops.

Have you checked your memory already (memtest86)?


Last run was about a year ago.

This box gets regularly updated (rebuild of all installed software),
so I'm reasonably certain that memory is ok - gcc being almost as 
sensitive as memtest.


Will recheck anyway.



We have several applications with Promise controllers on strange
hardware and we never had integrity problems with i.e. not so standard
SATA connections over custom vaccum-tight connectors.


Judging from linux and freebsd mailing lists, the TX4 is now quite 
well-known for

intermittent problems, which are hard to reproduce on different hardware.

I have two machines with those controllers, one FreeBSD-6.2 on MSI 
K8Neo2 motherboard (ATI chipset),
 and this one. FreeBSD box does not exhibit this problem under the 
little load it gets, but
6-STABLE and 7-CURRENT branches do have similar symptoms since around 19 
April 2007,

with rare occurences (but not unheard of) before.

Thus I am unable to keep machines up to date, and before having to dump 
$140 worth of hardware,
I'd like to try to help fix this problem or at least be certain that 
those controllers are indeed unusable.




  Problems were blamed:
- SATA300 being too 'hot'  (jumpered the drives)

Is this a common known problem with your harddrives or controller?
(ask google) Otherwise, it sounds like a problem with broken hardware.


This is a common problem with at least VIA onboard controllers and 
Seagate disks,

and I think with SATA150 controllers and speed negotiation in general.

This step was suggested in some mailing list as a general precaution, but
actually made no difference to error rate.

I did not unjumper drivers back to SATA300 so that I can easily connect 
the drives

to the onboard controller.

--

./lxnt


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Promise SATA300 TX4: errors, oops in ext3 code

2007-10-01 Thread Alexander Sabourenkov




Have you checked your memory already (memtest86)?



[...]




Again... sounds like bad memory to me.



Nightly memtest86  run : 11 hours, 23 passes, 0 errors.


--

./lxnt


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html