Re: Losing my mind, RAID1 on Sparc completely broken?

2004-03-14 Thread Marc Horowitz
I'm having the exact same problem.  Check out the thread starting with
http://lists.debian.org/debian-sparc/2004/debian-sparc-200402/msg00077.html.

booting with nodma has made my ide subsystem completely stable,
including an md raid1 mirror pair.  I'm still trying to get the bug
workaround described in that thread working, but my source has been
out of town, so I'm stalled for now.

Marc



Re: Losing my mind, RAID1 on Sparc completely broken?

2004-03-10 Thread Nathan Eric Norman
On Wed, Mar 10, 2004 at 09:18:11AM +1000, Adam Conrad wrote:
 Antonio Prioglio wrote:
  
  You need to leave the block 0 alone and start the partitions 
  on block 1.
  
  It is a known issue on sparcs.
 
 It's my understanding that SILO needs to be installed to a partition
 that begins on cylinder 0, or it won't boot.  Regardless, I fail to see
 how this could be affecting a RAID on partitions later on the disks.

Which kernel version are you using?  IIRC there are bugs in 2.4.18 WRT
the md code.  When I had md working on sparc (I no longer use md) I
believe I got it working with 2.4.21, so any recent kernel should do.
My Ultra 60 is currently running 2.4.25+debian+ben's patches and it
seems to be happy.

Good luck,

-- 
Nathan Norman - Incanus Networking mailto:[EMAIL PROTECTED]
  No.
   Should I include quotations after my reply?



Re: Losing my mind, RAID1 on Sparc completely broken?

2004-03-10 Thread Justin A
I have an ultra 2 doing raid1 just fine (crosses fingers..)

Linux ultrasparcy 2.4.25 #1 SMP Wed Feb 18 11:32:17 EST 2004 sparc64
GNU/Linux
20:35:49 up 21 days,  8:19,  2 users,  load average: 0.08, 0.03, 0.00

FilesystemSize  Used Avail Use% Mounted on
/dev/md0  2.0G  958M  938M  51% /

ultrasparcy:~# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sda1[0]
  2076992 blocks [2/2] [UU]
  
unused devices: none

ultrasparcy:~# dd if=/dev/md0 of=/dev/null
4153984+0 records in
4153984+0 records out
2126839808 bytes transferred in 251.762802 seconds (8447792 bytes/sec)


I used mdadm to create it, have you tried using that instead?

though silo is broken... I managed to get it to work by booting with the
rescue cd and mounting /dev/sda1 read only and running chroot silo,
anything else refuses to work... see #224870

-- 
-Justin



Re: Losing my mind, RAID1 on Sparc completely broken?

2004-03-09 Thread Antonello
On Tuesday 09 March 2004 03:10, Adam Conrad wrote:

 Disk /dev/hda (Sun disk label): 16 heads, 255 sectors, 19156 cylinders
 Units = cylinders of 4080 * 512 bytes

Device FlagStart   EndBlocks   Id  System
 /dev/hda1 020 40800   83  Linux native
 /dev/hda221 18665  38033760   83  Linux native
 /dev/hda3 0 19156  390782405  Whole disk
 /dev/hda4 18666 19156999600   83  Linux native

No swap? Guess that's a problem...

Bye
Antonello



RE: Losing my mind, RAID1 on Sparc completely broken?

2004-03-09 Thread Adam Conrad
Antonio Prioglio wrote:
 
 You need to leave the block 0 alone and start the partitions 
 on block 1.
 
 It is a known issue on sparcs.

It's my understanding that SILO needs to be installed to a partition
that begins on cylinder 0, or it won't boot.  Regardless, I fail to see
how this could be affecting a RAID on partitions later on the disks.

... Adam



Re: Losing my mind, RAID1 on Sparc completely broken?

2004-03-08 Thread James Whatever

The problem comes in when I make both disks a member of the same array.
(in this case, hda2 and hdc2 are members of md1).  As soon as I sync 
the
array, I write to it, and it pretty much instantly corrupts.  
Unmounting

md1 and running a fsck on it shows a large number of illegal blocks.
md5sums of various binaries on the system are wrong, apt and dpkg's
status files get so horribly corrupted that they segfault or refuse to
run.  All within minutes of even seconds of writing to md device.


This is opposite to the problem I have:

I have an SS20 with dual hypersparc 125MHz CPU's and dual 4.1GB SCA 
drives on an espfast controller inside the machine.


After getting to a point where I am happy with the Debian installation 
on a single disk I copied the partition data across to the second disk 
and started using the failed-disk method to copy data onto the raid.  I 
found that when copying quite a lot of data the kernel will oops, and I 
had to restart the process several times.  I did eventually find I was 
able to copy the ~1.5GB of install across without oopsing if I added 
the -u argument to cp (checks if the source file is newer than the 
target file) - it seems to me that the kernel was oopsing when writing 
constantly to the raid array, but was okay as long as it was read from 
quite a lot.


Anyway, I have succeeded in getting it all up and running with both 
disks in the array and using auto-detect to create the raidset at boot 
time and mounting it as root.  It seems to be working okay now, and I 
haven't noticed any data corruption on my ext3 partition.


I am using a home built Linux 2.4.25 kernel.

One thing is that SILO now only gets as far as S and then I get 
Program terminated - I'm not sure what's causing this, but now I have 
to do boot cdrom /iommu/.;1/vmlinuz root=/dev/md0 ro to start my 
machine.  Anyone have any ideas about what's up with SILO?


I have the latest SILO from sarge installed.

Cheers.

---
James Whatever
My niece called me Uncle Robot on the weekend!
Giant Robot Ltd, http://www.giantrobot.co.nz/



Re: Losing my mind, RAID1 on Sparc completely broken?

2004-03-08 Thread Adam Conrad
Following up to my previous post, here is the config for the RAID array
that I just can't keep living:

--- 8 ---
cranx:~# cat /etc/raidtab
raiddev /dev/md1
raid-level  1
nr-raid-disks   2
nr-spare-disks  0
chunk-size  4
persistent-superblock   1
device  /dev/hda2
raid-disk   0
device  /dev/hdc2
raid-disk   1
cranx:~# fdisk -l /dev/hda

Disk /dev/hda (Sun disk label): 16 heads, 255 sectors, 19156 cylinders
Units = cylinders of 4080 * 512 bytes

   Device FlagStart   EndBlocks   Id  System
/dev/hda1 020 40800   83  Linux native
/dev/hda221 18665  38033760   83  Linux native
/dev/hda3 0 19156  390782405  Whole disk
/dev/hda4 18666 19156999600   83  Linux native
cranx:~# fdisk -l /dev/hdc

Disk /dev/hdc (Sun disk label): 16 heads, 255 sectors, 19156 cylinders
Units = cylinders of 4080 * 512 bytes

   Device FlagStart   EndBlocks   Id  System
/dev/hdc1 020 40800   83  Linux native
/dev/hdc221 18665  38033760   83  Linux native
/dev/hdc3 0 19156  390782405  Whole disk
/dev/hdc4 18666 19156999600   83  Linux native
cranx:~# cat /proc/mdstat
Personalities : [raid0] [raid1]
md1 : active raid1 hdc2[1] hda2[0]
  38033664 blocks [2/2] [UU]
--- 8 ---

If I set up a brand new filesystem on /dev/md1, mount it, write a bunch
of data to it (in this case, a copt of my root filesystem), umount it,
then fsck it, 9 times out of ten I get illegal blocks, and once I got
something about a bad filename on /usr/include/linux/[somethingorother].
Basically, the FS is being corrupted in record time.

No errors are thrown in dmesg, no oopses, no nothing.  Just silent
corruption.

Any ideas?

... Adam Conrad

--
backup [n] (bak'up): The duplicate copy of crucial data that no one
 bothered to make; used only in the abstract.

1024D/C6CEA0C9  C8B2 CB3E 3225 49BB 5ED2  0002 BE3C ED47 C6CE A0C9