Re: Losing my mind, RAID1 on Sparc completely broken?
I'm having the exact same problem. Check out the thread starting with http://lists.debian.org/debian-sparc/2004/debian-sparc-200402/msg00077.html. booting with nodma has made my ide subsystem completely stable, including an md raid1 mirror pair. I'm still trying to get the bug workaround described in that thread working, but my source has been out of town, so I'm stalled for now. Marc
Re: Losing my mind, RAID1 on Sparc completely broken?
On Wed, Mar 10, 2004 at 09:18:11AM +1000, Adam Conrad wrote: Antonio Prioglio wrote: You need to leave the block 0 alone and start the partitions on block 1. It is a known issue on sparcs. It's my understanding that SILO needs to be installed to a partition that begins on cylinder 0, or it won't boot. Regardless, I fail to see how this could be affecting a RAID on partitions later on the disks. Which kernel version are you using? IIRC there are bugs in 2.4.18 WRT the md code. When I had md working on sparc (I no longer use md) I believe I got it working with 2.4.21, so any recent kernel should do. My Ultra 60 is currently running 2.4.25+debian+ben's patches and it seems to be happy. Good luck, -- Nathan Norman - Incanus Networking mailto:[EMAIL PROTECTED] No. Should I include quotations after my reply?
Re: Losing my mind, RAID1 on Sparc completely broken?
I have an ultra 2 doing raid1 just fine (crosses fingers..) Linux ultrasparcy 2.4.25 #1 SMP Wed Feb 18 11:32:17 EST 2004 sparc64 GNU/Linux 20:35:49 up 21 days, 8:19, 2 users, load average: 0.08, 0.03, 0.00 FilesystemSize Used Avail Use% Mounted on /dev/md0 2.0G 958M 938M 51% / ultrasparcy:~# cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md0 : active raid1 sdb1[1] sda1[0] 2076992 blocks [2/2] [UU] unused devices: none ultrasparcy:~# dd if=/dev/md0 of=/dev/null 4153984+0 records in 4153984+0 records out 2126839808 bytes transferred in 251.762802 seconds (8447792 bytes/sec) I used mdadm to create it, have you tried using that instead? though silo is broken... I managed to get it to work by booting with the rescue cd and mounting /dev/sda1 read only and running chroot silo, anything else refuses to work... see #224870 -- -Justin
Re: Losing my mind, RAID1 on Sparc completely broken?
On Tuesday 09 March 2004 03:10, Adam Conrad wrote: Disk /dev/hda (Sun disk label): 16 heads, 255 sectors, 19156 cylinders Units = cylinders of 4080 * 512 bytes Device FlagStart EndBlocks Id System /dev/hda1 020 40800 83 Linux native /dev/hda221 18665 38033760 83 Linux native /dev/hda3 0 19156 390782405 Whole disk /dev/hda4 18666 19156999600 83 Linux native No swap? Guess that's a problem... Bye Antonello
RE: Losing my mind, RAID1 on Sparc completely broken?
Antonio Prioglio wrote: You need to leave the block 0 alone and start the partitions on block 1. It is a known issue on sparcs. It's my understanding that SILO needs to be installed to a partition that begins on cylinder 0, or it won't boot. Regardless, I fail to see how this could be affecting a RAID on partitions later on the disks. ... Adam
Re: Losing my mind, RAID1 on Sparc completely broken?
The problem comes in when I make both disks a member of the same array. (in this case, hda2 and hdc2 are members of md1). As soon as I sync the array, I write to it, and it pretty much instantly corrupts. Unmounting md1 and running a fsck on it shows a large number of illegal blocks. md5sums of various binaries on the system are wrong, apt and dpkg's status files get so horribly corrupted that they segfault or refuse to run. All within minutes of even seconds of writing to md device. This is opposite to the problem I have: I have an SS20 with dual hypersparc 125MHz CPU's and dual 4.1GB SCA drives on an espfast controller inside the machine. After getting to a point where I am happy with the Debian installation on a single disk I copied the partition data across to the second disk and started using the failed-disk method to copy data onto the raid. I found that when copying quite a lot of data the kernel will oops, and I had to restart the process several times. I did eventually find I was able to copy the ~1.5GB of install across without oopsing if I added the -u argument to cp (checks if the source file is newer than the target file) - it seems to me that the kernel was oopsing when writing constantly to the raid array, but was okay as long as it was read from quite a lot. Anyway, I have succeeded in getting it all up and running with both disks in the array and using auto-detect to create the raidset at boot time and mounting it as root. It seems to be working okay now, and I haven't noticed any data corruption on my ext3 partition. I am using a home built Linux 2.4.25 kernel. One thing is that SILO now only gets as far as S and then I get Program terminated - I'm not sure what's causing this, but now I have to do boot cdrom /iommu/.;1/vmlinuz root=/dev/md0 ro to start my machine. Anyone have any ideas about what's up with SILO? I have the latest SILO from sarge installed. Cheers. --- James Whatever My niece called me Uncle Robot on the weekend! Giant Robot Ltd, http://www.giantrobot.co.nz/
Re: Losing my mind, RAID1 on Sparc completely broken?
Following up to my previous post, here is the config for the RAID array that I just can't keep living: --- 8 --- cranx:~# cat /etc/raidtab raiddev /dev/md1 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 chunk-size 4 persistent-superblock 1 device /dev/hda2 raid-disk 0 device /dev/hdc2 raid-disk 1 cranx:~# fdisk -l /dev/hda Disk /dev/hda (Sun disk label): 16 heads, 255 sectors, 19156 cylinders Units = cylinders of 4080 * 512 bytes Device FlagStart EndBlocks Id System /dev/hda1 020 40800 83 Linux native /dev/hda221 18665 38033760 83 Linux native /dev/hda3 0 19156 390782405 Whole disk /dev/hda4 18666 19156999600 83 Linux native cranx:~# fdisk -l /dev/hdc Disk /dev/hdc (Sun disk label): 16 heads, 255 sectors, 19156 cylinders Units = cylinders of 4080 * 512 bytes Device FlagStart EndBlocks Id System /dev/hdc1 020 40800 83 Linux native /dev/hdc221 18665 38033760 83 Linux native /dev/hdc3 0 19156 390782405 Whole disk /dev/hdc4 18666 19156999600 83 Linux native cranx:~# cat /proc/mdstat Personalities : [raid0] [raid1] md1 : active raid1 hdc2[1] hda2[0] 38033664 blocks [2/2] [UU] --- 8 --- If I set up a brand new filesystem on /dev/md1, mount it, write a bunch of data to it (in this case, a copt of my root filesystem), umount it, then fsck it, 9 times out of ten I get illegal blocks, and once I got something about a bad filename on /usr/include/linux/[somethingorother]. Basically, the FS is being corrupted in record time. No errors are thrown in dmesg, no oopses, no nothing. Just silent corruption. Any ideas? ... Adam Conrad -- backup [n] (bak'up): The duplicate copy of crucial data that no one bothered to make; used only in the abstract. 1024D/C6CEA0C9 C8B2 CB3E 3225 49BB 5ED2 0002 BE3C ED47 C6CE A0C9