At 16:37 18/08/2005, you wrote:
Quoting Stephen Tait <[EMAIL PROTECTED]>:

I'm just in the process of setting up a Sarge server to be used as a sort of backup server. The main PATA discs are used to boot the OS offof software RAID1, with the rest of the disc space used in JBOD for not-so-important backups. However, I'm having problems getting the new disc array up and running.

We've put a SATA controller in the box, a cheap-as-chips PCI Adaptec 1210SA which, according to lspci, uses the SIlicon Image SI3112 chipset to provide two SATA channels. Connected to this are two 320GB drives which I want to turn into a RAID1 array. When the system booted first, I used mdadm to create the RAID1 array md2 (mdadm --create /dev/md2 --level=1 --raid-disks=2 /dev/sda1 /dev/sdb1), checked /proc/mdstat to wait for the array to finish syncing, and then formatted it ext3 and mounted it. Everything seemed to work fine until I rebooted, whereupon the mount failed with the report that it wasn't a valid ext[2|3] superblock; fsck confirmed this and on further inspection it seemed that it wasn't a RAID device any more either.

...and booted with that instead after editing GRUB's menu.lst. The exact same error occurred, and I'm now at a bit of a loss to explain what's happening. If I try and mount the discs on their own (i.e. mount /dev/sdX /mnt/somedir) then they work just fine, so the hardware works fine - so I'm almost certain it's a problem with initting the RAID arrays at boot. At the moment I'm just rebuilding the array to see what happens when I don't try and mount it at boot, but only after the OS has finished booting, but of course that'll only be a temporary workaround. If it's any help, here are my fstab and mdadm.conf's:

[EMAIL PROTECTED]:~$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
/dev/md1        /               ext3    defaults,errors=remount-ro 0       1
/dev/md0        /boot           ext2    defaults        0       2
/dev/hdb9       /home           ext3    defaults        0       2
/dev/hdb4       /mnt/avj-backup ext3    defaults        0       2
/dev/hda7       /mnt/dcj-backup ext3    defaults        0       2
/dev/hdb8       /tmp            ext3    defaults        0       2
/dev/md4        /usr            ext3    defaults        0       2
/dev/md3        /var            ext3    defaults        0       2
/dev/hdb7       none            swap    sw              0       0
/dev/hdc        /media/cdrom0   iso9660 ro,user,noauto  0       0
#/dev/md2       /mnt/dcj-archive        ext3    defaults        0       2

===============================================

[EMAIL PROTECTED]:~$ cat /etc/mdadm/mdadm.conf
DEVICE partitions
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=b8093124:a6d6f876:a29eecb7:e1b332f3
   devices=/dev/hda6,/dev/hdb6
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=1973b0c3:e38869d2:ffef0cde:92048042
   devices=/dev/hda5,/dev/hdb5
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=78a3be5a:f0838fe2:4d4ce7ed:3a969954
   devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=51d55d28:3e653dce:631dd682:8dd52a37
   devices=/dev/hda2,/dev/hdb2
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=56e09876:a751356e:b86535d0:95091b5b
   devices=/dev/hda1,/dev/hdb1

As you can see, most of the important directories are mounted in software RAID1 on the two PATA discs with unimportant stuff on JBOD, although of course this shouldn't make any difference. All the usual dmesg etc. stuff doesn't seem to tell me anything I don't already know. If anyone has experienced this before or has any pointers as to how I can troubleshoot it, I'd be much obliged!

I have had some trouble getting a raid array to inialize on boot in the past.
My fix, was to remove its entry from the mdadm.conf file, and re-cfdisk the disks with the auto-detect-raid setting. Then create the raid array and reboot, it came up just fine.
Other than that, I'm not sure that else could be wrong.
Hopefully someone else on the list has some better ideas.

Cheers,
Mike

Thanks for the tip Mika, I have just tried this and a number of other configurations, and the RAID array just "dies" (or doesn't initialise) on every single reboot, meaning I have to rebuild the array, reformat it, etc etc every time - obviously not what I want for a backup server without a UPS! I simply don't get it; AFAICT all the modules I need to init a SATA RAID1 array at boot exist within the initrd, and they all seem to get loaded at the right time (since when modprobe does it's thing later on in the boot process I see lots of "loading sata_sil... module already loaded" type messages). I'll post the relevant section of dmesg if anyone can spot anything I'm not familiar with, other than that I'm going to try building a another custom kernel with everything relevant compiled into the kernel (already tried one but I must've missed something as it panicked at boot).

Snipped dmesg follows:

RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 4716 blocks [1 disk] into ram disk... done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing unused kernel memory: 168k freed
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
hda: WDC WD2500JB-00EVA0, ATA DISK drive
hdb: WDC WD2000JB-00GVA0, ATA DISK drive
hdc: Compaq CRD-8484B, ATAPI CD/DVD-ROM drive
Using anticipatory io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
AMD7441: IDE controller at PCI slot 0000:00:07.1
AMD7441: chipset revision 4
AMD7441: not 100% native mode: will probe irqs later
AMD7441: 0000:00:07.1 (rev 04) UDMA100 controller
AMD7441: port 0x01f0 already claimed by ide0
AMD7441: port 0x0170 already claimed by ide1
AMD7441: neither IDE port enabled (BIOS)
SCSI subsystem initialized
libata version 1.02 loaded.
device-mapper: 4.1.0-ioctl (2003-12-10) initialised: [EMAIL PROTECTED]
sata_sil version 0.54
ACPI: PCI interrupt 0000:02:05.0[A] -> GSI 17 (level, low) -> IRQ 169
ata1: SATA max UDMA/100 cmd 0xE0823080 ctl 0xE082308A bmdma 0xE0823000 irq 169
ata2: SATA max UDMA/100 cmd 0xE08230C0 ctl 0xE08230CA bmdma 0xE0823008 irq 169
ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
ata1: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
ata2: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
  Vendor: ATA       Model: WDC WD3200JD-00K  Rev: 08.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sda: drive cache: write back
 /dev/scsi/host0/bus0/target0/lun0: p1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
  Vendor: ATA       Model: WDC WD3200JD-00K  Rev: 08.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdb: drive cache: write back
 /dev/scsi/host1/bus0/target0/lun0: p1
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: raid1 personality registered as nr 3
cpci_hotplug: CompactPCI Hot Plug Core version: 0.2
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
shpchp: HPC vendor_id 1022 device_id 700d ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7448 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
vesafb: probe of vesafb0 failed with error -6
NET: Registered protocol family 1
hda: max request size: 1024KiB
hda: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63
 /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 < p5 p6 p7 >
hdb: max request size: 1024KiB
hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63
 /dev/ide/host0/bus0/target1/lun0: p1 p2 p3 < p5 p6 p7 p8 p9 > p4
md: md1 stopped.
md: bind<hdb2>
md: bind<hda2>
raid1: raid set md1 active with 2 out of 2 mirrors
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1951856k swap on /dev/hdb7.  Priority:-1 extents:1
EXT3 FS on md1, internal journal
hdc: ATAPI 48X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.20
ieee1394: Initialized config rom entry `ip1394'
sbp2: $Rev: 1219 $ Ben Collins <[EMAIL PROTECTED]>
ACPI: PCI interrupt 0000:02:06.0[A] -> GSI 18 (level, low) -> IRQ 185
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
0000:02:06.0: 3Com PCI 3c905C Tornado at 0xa400. Vers LK1.1.19
Capability LSM initialized
md: md4 stopped.
md: bind<hdb6>
md: bind<hda6>
raid1: raid set md4 active with 2 out of 2 mirrors
md: md3 stopped.
md: bind<hdb5>
md: bind<hda5>
raid1: raid set md3 active with 2 out of 2 mirrors
md: md2 stopped.
md: md0 stopped.
md: bind<hdb1>
md: bind<hda1>
raid1: raid set md0 active with 2 out of 2 mirrors

As you can see, the only mention of md2 is the "md: md2 stopped" line, whereas of course I'd be expecting a "raid1: raid set md2 active with 2 out of 2 mirrors" message. Does anyone more au fait with kernel software RAID know why the kernel won't even attempt to start md2?

Should I try a newer kernel? Were there problems with SATA and software RAID in 2.6.8? So many questions, and an angry boss!

P.S. I don't know if it's anything remotely significant, but after setting up software RAID on Gentoo I was led to believe that RAID configuration was done via the help of /etc/raidtab which the Sarge installer didn't put on my machine, so I assumed it wasn't needed and everything was done via mdadm.conf; I doubt it'd help my current situation, but would it do any harm to put one in there? Gentoo, by default, has an empty mdadm.conf so I'm assuming that the two both serve a similar function.

Yours one very confused Debian user!

Stephen Tait

--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to