At 16:37 18/08/2005, you wrote:
Quoting Stephen Tait <[EMAIL PROTECTED]>:
I'm just in the process of setting up a Sarge server to be used as a sort
of backup server. The main PATA discs are used to boot the OS offof
software RAID1, with the rest of the disc space used in JBOD for
not-so-important backups. However, I'm having problems getting the new
disc array up and running.
We've put a SATA controller in the box, a cheap-as-chips PCI Adaptec
1210SA which, according to lspci, uses the SIlicon Image SI3112 chipset
to provide two SATA channels. Connected to this are two 320GB drives
which I want to turn into a RAID1 array. When the system booted first, I
used mdadm to create the RAID1 array md2 (mdadm --create /dev/md2
--level=1 --raid-disks=2 /dev/sda1 /dev/sdb1), checked /proc/mdstat to
wait for the array to finish syncing, and then formatted it ext3 and
mounted it. Everything seemed to work fine until I rebooted, whereupon
the mount failed with the report that it wasn't a valid ext[2|3]
superblock; fsck confirmed this and on further inspection it seemed that
it wasn't a RAID device any more either.
...and booted with that instead after editing GRUB's menu.lst. The exact
same error occurred, and I'm now at a bit of a loss to explain what's
happening. If I try and mount the discs on their own (i.e. mount /dev/sdX
/mnt/somedir) then they work just fine, so the hardware works fine - so
I'm almost certain it's a problem with initting the RAID arrays at boot.
At the moment I'm just rebuilding the array to see what happens when I
don't try and mount it at boot, but only after the OS has finished
booting, but of course that'll only be a temporary workaround. If it's
any help, here are my fstab and mdadm.conf's:
[EMAIL PROTECTED]:~$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
/dev/md1 / ext3 defaults,errors=remount-ro 0 1
/dev/md0 /boot ext2 defaults 0 2
/dev/hdb9 /home ext3 defaults 0 2
/dev/hdb4 /mnt/avj-backup ext3 defaults 0 2
/dev/hda7 /mnt/dcj-backup ext3 defaults 0 2
/dev/hdb8 /tmp ext3 defaults 0 2
/dev/md4 /usr ext3 defaults 0 2
/dev/md3 /var ext3 defaults 0 2
/dev/hdb7 none swap sw 0 0
/dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0
#/dev/md2 /mnt/dcj-archive ext3 defaults 0 2
===============================================
[EMAIL PROTECTED]:~$ cat /etc/mdadm/mdadm.conf
DEVICE partitions
ARRAY /dev/md4 level=raid1 num-devices=2
UUID=b8093124:a6d6f876:a29eecb7:e1b332f3
devices=/dev/hda6,/dev/hdb6
ARRAY /dev/md3 level=raid1 num-devices=2
UUID=1973b0c3:e38869d2:ffef0cde:92048042
devices=/dev/hda5,/dev/hdb5
ARRAY /dev/md2 level=raid1 num-devices=2
UUID=78a3be5a:f0838fe2:4d4ce7ed:3a969954
devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md1 level=raid1 num-devices=2
UUID=51d55d28:3e653dce:631dd682:8dd52a37
devices=/dev/hda2,/dev/hdb2
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=56e09876:a751356e:b86535d0:95091b5b
devices=/dev/hda1,/dev/hdb1
As you can see, most of the important directories are mounted in software
RAID1 on the two PATA discs with unimportant stuff on JBOD, although of
course this shouldn't make any difference. All the usual dmesg etc. stuff
doesn't seem to tell me anything I don't already know. If anyone has
experienced this before or has any pointers as to how I can troubleshoot
it, I'd be much obliged!
I have had some trouble getting a raid array to inialize on boot in the past.
My fix, was to remove its entry from the mdadm.conf file, and re-cfdisk
the disks with the auto-detect-raid setting. Then create the raid array
and reboot, it came up just fine.
Other than that, I'm not sure that else could be wrong.
Hopefully someone else on the list has some better ideas.
Cheers,
Mike
Thanks for the tip Mika, I have just tried this and a number of other
configurations, and the RAID array just "dies" (or doesn't initialise) on
every single reboot, meaning I have to rebuild the array, reformat it, etc
etc every time - obviously not what I want for a backup server without a
UPS! I simply don't get it; AFAICT all the modules I need to init a SATA
RAID1 array at boot exist within the initrd, and they all seem to get
loaded at the right time (since when modprobe does it's thing later on in
the boot process I see lots of "loading sata_sil... module already loaded"
type messages). I'll post the relevant section of dmesg if anyone can spot
anything I'm not familiar with, other than that I'm going to try building a
another custom kernel with everything relevant compiled into the kernel
(already tried one but I must've missed something as it panicked at boot).
Snipped dmesg follows:
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 4716 blocks [1 disk] into ram disk... done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing unused kernel memory: 168k freed
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
hda: WDC WD2500JB-00EVA0, ATA DISK drive
hdb: WDC WD2000JB-00GVA0, ATA DISK drive
hdc: Compaq CRD-8484B, ATAPI CD/DVD-ROM drive
Using anticipatory io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
AMD7441: IDE controller at PCI slot 0000:00:07.1
AMD7441: chipset revision 4
AMD7441: not 100% native mode: will probe irqs later
AMD7441: 0000:00:07.1 (rev 04) UDMA100 controller
AMD7441: port 0x01f0 already claimed by ide0
AMD7441: port 0x0170 already claimed by ide1
AMD7441: neither IDE port enabled (BIOS)
SCSI subsystem initialized
libata version 1.02 loaded.
device-mapper: 4.1.0-ioctl (2003-12-10) initialised: [EMAIL PROTECTED]
sata_sil version 0.54
ACPI: PCI interrupt 0000:02:05.0[A] -> GSI 17 (level, low) -> IRQ 169
ata1: SATA max UDMA/100 cmd 0xE0823080 ctl 0xE082308A bmdma 0xE0823000 irq 169
ata2: SATA max UDMA/100 cmd 0xE08230C0 ctl 0xE08230CA bmdma 0xE0823008 irq 169
ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
ata1: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
ata2: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
Vendor: ATA Model: WDC WD3200JD-00K Rev: 08.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sda: drive cache: write back
/dev/scsi/host0/bus0/target0/lun0: p1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: ATA Model: WDC WD3200JD-00K Rev: 08.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdb: drive cache: write back
/dev/scsi/host1/bus0/target0/lun0: p1
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: raid1 personality registered as nr 3
cpci_hotplug: CompactPCI Hot Plug Core version: 0.2
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
shpchp: HPC vendor_id 1022 device_id 700d ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7448 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
vesafb: probe of vesafb0 failed with error -6
NET: Registered protocol family 1
hda: max request size: 1024KiB
hda: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63
/dev/ide/host0/bus0/target0/lun0: p1 p2 p3 < p5 p6 p7 >
hdb: max request size: 1024KiB
hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63
/dev/ide/host0/bus0/target1/lun0: p1 p2 p3 < p5 p6 p7 p8 p9 > p4
md: md1 stopped.
md: bind<hdb2>
md: bind<hda2>
raid1: raid set md1 active with 2 out of 2 mirrors
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1951856k swap on /dev/hdb7. Priority:-1 extents:1
EXT3 FS on md1, internal journal
hdc: ATAPI 48X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.20
ieee1394: Initialized config rom entry `ip1394'
sbp2: $Rev: 1219 $ Ben Collins <[EMAIL PROTECTED]>
ACPI: PCI interrupt 0000:02:06.0[A] -> GSI 18 (level, low) -> IRQ 185
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
0000:02:06.0: 3Com PCI 3c905C Tornado at 0xa400. Vers LK1.1.19
Capability LSM initialized
md: md4 stopped.
md: bind<hdb6>
md: bind<hda6>
raid1: raid set md4 active with 2 out of 2 mirrors
md: md3 stopped.
md: bind<hdb5>
md: bind<hda5>
raid1: raid set md3 active with 2 out of 2 mirrors
md: md2 stopped.
md: md0 stopped.
md: bind<hdb1>
md: bind<hda1>
raid1: raid set md0 active with 2 out of 2 mirrors
As you can see, the only mention of md2 is the "md: md2 stopped" line,
whereas of course I'd be expecting a "raid1: raid set md2 active with 2 out
of 2 mirrors" message. Does anyone more au fait with kernel software RAID
know why the kernel won't even attempt to start md2?
Should I try a newer kernel? Were there problems with SATA and software
RAID in 2.6.8? So many questions, and an angry boss!
P.S. I don't know if it's anything remotely significant, but after setting
up software RAID on Gentoo I was led to believe that RAID configuration was
done via the help of /etc/raidtab which the Sarge installer didn't put on
my machine, so I assumed it wasn't needed and everything was done via
mdadm.conf; I doubt it'd help my current situation, but would it do any
harm to put one in there? Gentoo, by default, has an empty mdadm.conf so
I'm assuming that the two both serve a similar function.
Yours one very confused Debian user!
Stephen Tait
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]