[asterisk-users] Asterisk Server : IDE HDD frequent crash

2006-10-06 Thread Matthew Rubenstein
I partitioned/formatted a new WD2500 with NTFS on a WinXP machine,
filled it with data (mostly 10MB FLAC and SHN soundfiles). Then
transferred it to an AAH Asterisk server box with a Digium TDM400P
(1FXO/1FXS) and an Audigy2 soundcard. I installed it as hdb, booting off
hda (no other drives). I mounted that drive with ntfs-fuse, and then
remotely mounted it from another machine (Ubuntu) with sshfs. fuse
doesn't fully work, so when I removed some files from the NTFS volume it
failed to remove the last file specified for removal from some
directories (and therefore their directories). I then opened several of
the existing remote files from my local workstation.

After about 6 hours, I got a CentOS kernel panic from the AAH server
with the NTFS drive, indicating an IRQ conflict. When I rebooted, it
continued to kernel panic. Until I rebooted with the Audigy2 soundcard
removed, which forced CentOS to deinstall the driver. After which point
I deleted the AC97 module for the motherboard soundchip, just to be
safe, then shut down, reinserted the Audigy2, restarted, let CentOS
automatically remove the AC97 configs, add the Audigy2 configs, and
continue normally. Except the drive is now marked "dirty", requiring
"chkdsk", which doesn't run on Linux, and has no Linux equivalent. The
NTFS tools that come with fuse and fix the most basic state problems had
no effect. But if I force mount, the drive mounts and reads files fine
(I don't write to it in its dirty state).

Then I shut down, added another WD2500 to the IDE as hdc, booted, and
the kernel didn't find hdc when it probed the IDE, though it did see
that there was a device on IDE1. I shut down, moved both WD2500s to
IDE1, booted, and the kernel found neither hdc nor hdd. So I can't dd
the NTFS drive to an ext3 (etc) Linux drive. Even when I removed the
Audigy2, left the TDM400P, restored the AC97 module, the kernel is not
finding the second IDE drive on probe, no matter where I install it on
the IDE buses.

I can recover the drive with chkdsk on the WinXP machine that formatted
it, and either copy across the LAN or possibly mount in a USB enclosure
locally to the Ubuntu machine, then copy across USB to a locally mounted
Linux drive.

But it looks like an IRQ conflict, or maybe DMA, or other conflict at
that level, is interfering with the IDE. The conflict didn't happen with
Audigy2 + TDM400P + IDE0/hda, but it does happen when adding hdb/c/d to
the mix, unless I remove the soundcard. Maybe the Audigy2 conflicts with
the TDM400P in a way that interferes with the IDE. This problem seems
like it could destroy drives quicker than their MTBF, so I thought I'd
throw it out there.


On Fri, 2006-10-06 at 00:26 -0700,
[EMAIL PROTECTED] wrote:
> Date: Thu, 5 Oct 2006 16:44:10 +0000 (UTC)
> From: Dushyanth <[EMAIL PROTECTED]>
> Subject: [asterisk-users] Asterisk Server : IDE HDD frequent crash 
> To: asterisk-users@lists.digium.com
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii
> 
> Hey guys,
> 
> Iam having a peculiar problem with my asterisk installation. The
> specs 
> are..
> 
> [EMAIL PROTECTED] ~]# asterisk -V
> Asterisk 1.2.7.1
> 
> Wildcard: Digium Wildcard TE110P T1/E1
> Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 2 FXO, 2 FXS)
> Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 1 FXO, 3 FXS)
> Wildcard TDM: Wildcard TDM2400P Prototype (24 modules) (12 FXO's -
> rest 
> empty)
> 
> Total 15 FX0's, 5 FXS out of which 5 to 6 FXO/FXS are being used. We
> have 
> about 300 active SIP accounts. 
> 
> Queues, SIP extensions, Agents are in MySQL database using asterisk 
> realtime static.
> 
> CPU : Intel(R) Xeon(TM) CPU 3.06GHz with Hyper threading
> RAM : 1G
> Mobo : Intel SE7501HG2
> 
> The system is stable, however, the IDE disk crashes every 3/4 months.
> There 
> are DMA timeout errors for the IDE disk before it fails completely.
> The 
> same issue occured for the past three disks and I was doubting the 
> recommended hdparm setting 
> 
> 'hdparm -d 1 -X udma2 -c 3 /dev/IDE Device'
> 
> So, I removed this setting after the last crash and the system workd
> fine 
> for another 3 months. Yes'day, the disk failed again with same
> symptoms. 
> All the disks were seagate baraccuda IDE drives.
> 
> zttool doesnt show any IRQ misses even without the above hdparm
> setting and
> there is no noticeable problem in asterisk with the PRI line etc.
> Below is 
> my /proc/interrupts as well as /dev/hda settings.
> 
> [EMAIL PROTECTED] ~]# cat /proc/interrupts
>CPU0   CPU1
>   0:   24771857   24719039IO-APIC-edge  timer
>   1:102 62IO-APIC-edge  i8042
>   8:  1  0IO-APIC-

Re: [asterisk-users] Asterisk Server : IDE HDD frequent crash

2006-10-06 Thread Jay R. Ashworth
On Thu, Oct 05, 2006 at 11:41:32PM -0700, Sam Norris wrote:
> Heat = #1 cause of disk failure. If they are roasting to the touch they 
> will fail in 2-3 months.

One word: "smartd".

I didn't know it existed, and I'm amazed I didn't.  Everyone on this
list should be running smartd, and know what it's saying.

Cheers,
-- jra
-- 
Jay R. Ashworth[EMAIL PROTECTED]
Designer  Baylink RFC 2100
Ashworth & AssociatesThe Things I Think'87 e24
St Petersburg FL USA  http://baylink.pitas.com +1 727 647 1274

"That's women for you; you divorce them, and 10 years later,
  they stop having sex with you."  -- Jennifer Crusie; _Fast_Women_
___
--Bandwidth and Colocation provided by Easynews.com --

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [asterisk-users] Asterisk Server : IDE HDD frequent crash

2006-10-05 Thread Sam Norris
Heat = #1 cause of disk failure. If they are roasting to the touch they will 
fail in 2-3 months.


- Original Message - 
From: "Dushyanth" <[EMAIL PROTECTED]>

To: 
Sent: 10/05/2006 9:44 AM
Subject: [asterisk-users] Asterisk Server : IDE HDD frequent crash



Hey guys,

Iam having a peculiar problem with my asterisk installation. The specs
are..

[EMAIL PROTECTED] ~]# asterisk -V
Asterisk 1.2.7.1

Wildcard: Digium Wildcard TE110P T1/E1
Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 2 FXO, 2 FXS)
Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 1 FXO, 3 FXS)
Wildcard TDM: Wildcard TDM2400P Prototype (24 modules) (12 FXO's - rest
empty)

Total 15 FX0's, 5 FXS out of which 5 to 6 FXO/FXS are being used. We have
about 300 active SIP accounts.

Queues, SIP extensions, Agents are in MySQL database using asterisk
realtime static.

CPU : Intel(R) Xeon(TM) CPU 3.06GHz with Hyper threading
RAM : 1G
Mobo : Intel SE7501HG2

The system is stable, however, the IDE disk crashes every 3/4 months. 
There

are DMA timeout errors for the IDE disk before it fails completely. The
same issue occured for the past three disks and I was doubting the
recommended hdparm setting

'hdparm -d 1 -X udma2 -c 3 /dev/IDE Device'

So, I removed this setting after the last crash and the system workd fine
for another 3 months. Yes'day, the disk failed again with same symptoms.
All the disks were seagate baraccuda IDE drives.

zttool doesnt show any IRQ misses even without the above hdparm setting 
and

there is no noticeable problem in asterisk with the PRI line etc. Below is
my /proc/interrupts as well as /dev/hda settings.

[EMAIL PROTECTED] ~]# cat /proc/interrupts
  CPU0   CPU1
 0:   24771857   24719039IO-APIC-edge  timer
 1:102 62IO-APIC-edge  i8042
 8:  1  0IO-APIC-edge  rtc
 9:  0  0   IO-APIC-level  acpi
14: 134159 135915IO-APIC-edge  ide0
185:   32988610   16463264   IO-APIC-level  wctdm
193:   22173177   27275710   IO-APIC-level  wctdm
201:   21737611   27711650   IO-APIC-level  wctdm24xxp
209:   22038077   27401613   IO-APIC-level  wcte11xp
225:   18992311  0   IO-APIC-level  eth1
233:1171166879   IO-APIC-level  eth0
NMI:  0  0
LOC:   49493157   49493156
ERR:  0
MIS:  0

[EMAIL PROTECTED] ~]# hdparm -i /dev/hda

/dev/hda:

Model=ST340014A, FwRev=3.06, SerialNo=5JX96VFV
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=78165360
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:

* signifies the current active mode

I looked at the mailing lists and couldnt any such issues reported.

Please advice. Should i be using SCSI disks on RAID 1 or something ? Will
that help ?

Also, should i be looking at any other mobo then Intel SE7501HG2 ? Iam
planning to put in a another asterisk server as a failover and would
appreciate inputs abt the kind of hardware i should be using for the 
system

with the specs i mentioned.

Thanks
Dushyanth

___
--Bandwidth and Colocation provided by Easynews.com --

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users 


___
--Bandwidth and Colocation provided by Easynews.com --

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [asterisk-users] Asterisk Server : IDE HDD frequent crash

2006-10-05 Thread Stuart Sheldon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I would look at ventilation if I were you. Drive failures at the rate
you are talking about can usually be traced back to thermal failures.

Just a thought

Stu


Dushyanth wrote:
> Hey guys,
> 
> Iam having a peculiar problem with my asterisk installation. The specs 
> are..
> 
> [EMAIL PROTECTED] ~]# asterisk -V
> Asterisk 1.2.7.1
> 
> Wildcard: Digium Wildcard TE110P T1/E1
> Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 2 FXO, 2 FXS)
> Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 1 FXO, 3 FXS)
> Wildcard TDM: Wildcard TDM2400P Prototype (24 modules) (12 FXO's - rest 
> empty)
> 
> Total 15 FX0's, 5 FXS out of which 5 to 6 FXO/FXS are being used. We have 
> about 300 active SIP accounts. 
> 
> Queues, SIP extensions, Agents are in MySQL database using asterisk 
> realtime static.
> 
> CPU : Intel(R) Xeon(TM) CPU 3.06GHz with Hyper threading
> RAM : 1G
> Mobo : Intel SE7501HG2
> 
> The system is stable, however, the IDE disk crashes every 3/4 months. There 
> are DMA timeout errors for the IDE disk before it fails completely. The 
> same issue occured for the past three disks and I was doubting the 
> recommended hdparm setting 
> 
> 'hdparm -d 1 -X udma2 -c 3 /dev/IDE Device'
> 
> So, I removed this setting after the last crash and the system workd fine 
> for another 3 months. Yes'day, the disk failed again with same symptoms. 
> All the disks were seagate baraccuda IDE drives.
> 
> zttool doesnt show any IRQ misses even without the above hdparm setting and
> there is no noticeable problem in asterisk with the PRI line etc. Below is 
> my /proc/interrupts as well as /dev/hda settings.
> 
> [EMAIL PROTECTED] ~]# cat /proc/interrupts
>CPU0   CPU1
>   0:   24771857   24719039IO-APIC-edge  timer
>   1:102 62IO-APIC-edge  i8042
>   8:  1  0IO-APIC-edge  rtc
>   9:  0  0   IO-APIC-level  acpi
>  14: 134159 135915IO-APIC-edge  ide0
> 185:   32988610   16463264   IO-APIC-level  wctdm
> 193:   22173177   27275710   IO-APIC-level  wctdm
> 201:   21737611   27711650   IO-APIC-level  wctdm24xxp
> 209:   22038077   27401613   IO-APIC-level  wcte11xp
> 225:   18992311  0   IO-APIC-level  eth1
> 233:1171166879   IO-APIC-level  eth0
> NMI:  0  0
> LOC:   49493157   49493156
> ERR:  0
> MIS:  0
> 
> [EMAIL PROTECTED] ~]# hdparm -i /dev/hda
> 
> /dev/hda:
> 
>  Model=ST340014A, FwRev=3.06, SerialNo=5JX96VFV
>  Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
>  RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
>  BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=16
>  CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=78165360
>  IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
>  PIO modes:  pio0 pio1 pio2 pio3 pio4
>  DMA modes:  mdma0 mdma1 mdma2
>  UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
>  AdvancedPM=no WriteCache=enabled
>  Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:
> 
>  * signifies the current active mode
> 
> I looked at the mailing lists and couldnt any such issues reported. 
> 
> Please advice. Should i be using SCSI disks on RAID 1 or something ? Will 
> that help ?
> 
> Also, should i be looking at any other mobo then Intel SE7501HG2 ? Iam 
> planning to put in a another asterisk server as a failover and would 
> appreciate inputs abt the kind of hardware i should be using for the system 
> with the specs i mentioned.
> 
> Thanks
> Dushyanth
> 
> ___
> --Bandwidth and Colocation provided by Easynews.com --
> 
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>http://lists.digium.com/mailman/listinfo/asterisk-users
> 

- --
Randomly Generated Fortune Tag:
Many pages make a thick book.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFJfmmK69Y+xPZrWYRAi5jAJ9z3DHMK0sWvjiomDj3Qw0o3CA3vwCeJeIZ
UtyXmqFJTTTQ6iWJCk/fOWI=
=vygm
-END PGP SIGNATURE-
___
--Bandwidth and Colocation provided by Easynews.com --

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


[asterisk-users] Asterisk Server : IDE HDD frequent crash

2006-10-05 Thread Dushyanth
Hey guys,

Iam having a peculiar problem with my asterisk installation. The specs 
are..

[EMAIL PROTECTED] ~]# asterisk -V
Asterisk 1.2.7.1

Wildcard: Digium Wildcard TE110P T1/E1
Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 2 FXO, 2 FXS)
Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 1 FXO, 3 FXS)
Wildcard TDM: Wildcard TDM2400P Prototype (24 modules) (12 FXO's - rest 
empty)

Total 15 FX0's, 5 FXS out of which 5 to 6 FXO/FXS are being used. We have 
about 300 active SIP accounts. 

Queues, SIP extensions, Agents are in MySQL database using asterisk 
realtime static.

CPU : Intel(R) Xeon(TM) CPU 3.06GHz with Hyper threading
RAM : 1G
Mobo : Intel SE7501HG2

The system is stable, however, the IDE disk crashes every 3/4 months. There 
are DMA timeout errors for the IDE disk before it fails completely. The 
same issue occured for the past three disks and I was doubting the 
recommended hdparm setting 

'hdparm -d 1 -X udma2 -c 3 /dev/IDE Device'

So, I removed this setting after the last crash and the system workd fine 
for another 3 months. Yes'day, the disk failed again with same symptoms. 
All the disks were seagate baraccuda IDE drives.

zttool doesnt show any IRQ misses even without the above hdparm setting and
there is no noticeable problem in asterisk with the PRI line etc. Below is 
my /proc/interrupts as well as /dev/hda settings.

[EMAIL PROTECTED] ~]# cat /proc/interrupts
   CPU0   CPU1
  0:   24771857   24719039IO-APIC-edge  timer
  1:102 62IO-APIC-edge  i8042
  8:  1  0IO-APIC-edge  rtc
  9:  0  0   IO-APIC-level  acpi
 14: 134159 135915IO-APIC-edge  ide0
185:   32988610   16463264   IO-APIC-level  wctdm
193:   22173177   27275710   IO-APIC-level  wctdm
201:   21737611   27711650   IO-APIC-level  wctdm24xxp
209:   22038077   27401613   IO-APIC-level  wcte11xp
225:   18992311  0   IO-APIC-level  eth1
233:1171166879   IO-APIC-level  eth0
NMI:  0  0
LOC:   49493157   49493156
ERR:  0
MIS:  0

[EMAIL PROTECTED] ~]# hdparm -i /dev/hda

/dev/hda:

 Model=ST340014A, FwRev=3.06, SerialNo=5JX96VFV
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=78165360
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:

 * signifies the current active mode

I looked at the mailing lists and couldnt any such issues reported. 

Please advice. Should i be using SCSI disks on RAID 1 or something ? Will 
that help ?

Also, should i be looking at any other mobo then Intel SE7501HG2 ? Iam 
planning to put in a another asterisk server as a failover and would 
appreciate inputs abt the kind of hardware i should be using for the system 
with the specs i mentioned.

Thanks
Dushyanth

___
--Bandwidth and Colocation provided by Easynews.com --

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users