Re: Problem with --manage

2006-07-18 Thread Benjamin Schieder
On 18.07.2006 15:46:53, Neil Brown wrote:
 On Monday July 17, [EMAIL PROTECTED] wrote:
  
  /dev/md/0 on /boot type ext2 (rw,nogrpid)
  /dev/md/1 on / type reiserfs (rw)
  /dev/md/2 on /var type reiserfs (rw)
  /dev/md/3 on /opt type reiserfs (rw)
  /dev/md/4 on /usr type reiserfs (rw)
  /dev/md/5 on /data type reiserfs (rw)
  
  I'm running the following kernel:
  Linux ceres 2.6.16.18-rock #1 SMP PREEMPT Sun Jun 25 10:47:51 CEST 2006 
  i686 GNU/Linux
  
  and mdadm 2.4.
  Now, hdb seems to be broken, even though smart says everything's fine.
  After a day or two, hdb would fail:
  
  Jul 16 16:58:41 ceres kernel: raid5: Disk failure on hdb3, disabling 
  device. Operation continuing on 2 devices
  Jul 16 16:58:41 ceres kernel: raid5: Disk failure on hdb5, disabling 
  device. Operation continuing on 2 devices
  Jul 16 16:59:06 ceres kernel: raid5: Disk failure on hdb7, disabling 
  device. Operation continuing on 2 devices
  Jul 16 16:59:37 ceres kernel: raid5: Disk failure on hdb8, disabling 
  device. Operation continuing on 2 devices
  Jul 16 17:02:22 ceres kernel: raid5: Disk failure on hdb6, disabling 
  device. Operation continuing on 2 devices
 
 Very odd... no other message from the kernel?  You would expect
 something if there was a real error.

This was the only output on the console. But I just checked /var/log/messages
now... ouch...

---
Jul 16 16:59:36 ceres kernel: hdb: status error: status=0x00 { }
Jul 16 16:59:36 ceres kernel: ide: failed opcode was: 0xea
Jul 16 16:59:36 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:36 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
Jul 16 16:59:36 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:36 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:36 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
Jul 16 16:59:36 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:36 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:36 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
Jul 16 16:59:36 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:36 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:36 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
Jul 16 16:59:36 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:37 ceres kernel: ide0: reset: success
Jul 16 16:59:37 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
Jul 16 16:59:37 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:37 ceres kernel: hdb: status error: status=0x00 { }
Jul 16 16:59:37 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:37 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
Jul 16 16:59:37 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:37 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
Jul 16 16:59:37 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:37 ceres kernel: ide0: reset: success
Jul 16 16:59:37 ceres kernel: hdb: status error: status=0x00 { }
Jul 16 16:59:37 ceres kernel: ide: failed opcode was: unknown
Jul 16 16:59:37 ceres kernel: end_request: I/O error, dev hdb, sector 488391932
Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:37 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
Jul 16 16:59:37 ceres kernel: ide: failed opcode was: 0xea
Jul 16 16:59:37 ceres kernel: raid5: Disk failure on hdb8, disabling device. 
Operation continuing on 2 devices
Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
Jul 16 16:59:37 ceres kernel: RAID5 conf printout:
Jul 16 16:59:37 ceres kernel:  --- rd:3 wd:2 fd:1
Jul 16 16:59:37 ceres kernel:  disk 0, o:0, dev:hdb8
Jul 16 16:59:37 ceres kernel:  disk 1, o:1, dev:hda8
Jul 16 16:59:37 ceres kernel:  disk 2, o:1, dev:hdc8
Jul 16 16:59:37 ceres kernel: RAID5 conf printout:
Jul 16 16:59:37 ceres kernel:  --- rd:3 wd:2 fd:1
Jul 16 16:59:37 ceres kernel:  disk 1, o:1, dev:hda8
Jul 16 16:59:37 ceres kernel:  disk 2, o:1, dev:hdc8
---

Now, is this a broken IDE controller or harddisk? Because smartctl claims
that everything is fine.

  The problem now is, the machine hangs after the last message and I can only
  turn it off by physically removing the power plug.
 
 alt-sysrq-P  or alt-sysrq-T give anything useful?

I tried alt-sysrq-o and -b, to no avail. Support for it is in my kernel and
it works (tested earlier).

  When I now reboot the machine, `mdadm -A /dev/md[1-5]' will not start the
  arrays cleanly. They will all be lacking the hdb device and be 'inactive'.
  `mdadm -R' will not start them in this state. According to
  `mdadm --manage --help' using `mdadm --manage /dev/md/3 -a 

Re: Problem with --manage

2006-07-18 Thread Neil Brown
On Tuesday July 18, [EMAIL PROTECTED] wrote:
 Jul 16 16:59:37 ceres kernel: ide: failed opcode was: unknown
 Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
 Jul 16 16:59:37 ceres kernel: ide0: reset: success
 Jul 16 16:59:37 ceres kernel: hdb: status error: status=0x00 { }
 Jul 16 16:59:37 ceres kernel: ide: failed opcode was: unknown
 Jul 16 16:59:37 ceres kernel: end_request: I/O error, dev hdb, sector 
 488391932
 Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
 Jul 16 16:59:37 ceres kernel: hdb: status error: status=0x10 { SeekComplete }
 Jul 16 16:59:37 ceres kernel: ide: failed opcode was: 0xea
 Jul 16 16:59:37 ceres kernel: raid5: Disk failure on hdb8, disabling device. 
 Operation continuing on 2 devices
 Jul 16 16:59:37 ceres kernel: hdb: drive not ready for command
 Jul 16 16:59:37 ceres kernel: RAID5 conf printout:
 Jul 16 16:59:37 ceres kernel:  --- rd:3 wd:2 fd:1
 Jul 16 16:59:37 ceres kernel:  disk 0, o:0, dev:hdb8
 Jul 16 16:59:37 ceres kernel:  disk 1, o:1, dev:hda8
 Jul 16 16:59:37 ceres kernel:  disk 2, o:1, dev:hdc8
 Jul 16 16:59:37 ceres kernel: RAID5 conf printout:
 Jul 16 16:59:37 ceres kernel:  --- rd:3 wd:2 fd:1
 Jul 16 16:59:37 ceres kernel:  disk 1, o:1, dev:hda8
 Jul 16 16:59:37 ceres kernel:  disk 2, o:1, dev:hdc8
 ---
 
 Now, is this a broken IDE controller or harddisk? Because smartctl claims
 that everything is fine.
 

Ouch indeed.  I've no idea whose 'fault' this is.  Maybe ask on
linux-ide.

 
 I don't have a script log or something, but here's what I did from an initrd
 with init=/bin/bash
 
 #  mount /dev /proc /sys /tmp 
 #  start udevd udevtrigger udevsettle 
 while read a dev c ; do
   [ $a != ARRAY ]  continue
   [ -e /dev/md/${dev##*/} ] || /bin/mknod $dev b 9 ${dev##*/}
   /sbin/mdadm -A ${dev}
 done  /etc/mdadm.conf

mdadm -As --auto=yes
should be sufficient

 
 Personalities : [linear] [raid0] [raid1] [raid5] [raid4]
 md5 : inactive raid5 hda8[1] hdc8[2]
   451426304 blocks level 5, 64k chunk, algorithm 2 [2/3] [_UU]

Ahh, Ok, make that

  mdadm -As --force --auto=yes

A crash combined with a drive failure can cause undetectable data
corruption.  You need to give --force to effectively acknowledge
that..

I should get mdadm to explain what is happening so that I don't have
to as much

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm -X bitmap status off by 2^16

2006-07-18 Thread Janos Farkas
Hi!

Another pseudo-problem :) I've just set up a RAID5 array by creating a
three-disk one from two disks, and later adding the third.  Everything
seems normal, but the mdadm (2.5.2) -X output:

Filename : /dev/hda3
   Magic : 6d746962
 Version : 4
UUID : 293ceee6.d1811fb1.a8b316e6.b54abcc7
  Events : 12
  Events Cleared : 12
   State : OK
   Chunksize : 1 MB
  Daemon : 5s flush period
  Write Mode : Normal
   Sync Size : 292784512 (279.22 GiB 299.81 GB)
  Bitmap : 285923 bits (chunks), 65536 dirty (22.9%)

# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 65536 dirty (22.9%)
# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 1 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 1 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 65537 dirty (22.9%)
# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 7 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 7 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 65543 dirty (22.9%)
# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 65536 dirty (22.9%)

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 hdd3[2] hdb3[0] hda3[1]
  585569024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
  bitmap: 0/140 pages [0KB], 1024KB chunk

Is this going to bite me later on, or just a harmless display problem?

Janos
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XFS and write barrier

2006-07-18 Thread Nathan Scott
On Mon, Jul 17, 2006 at 01:32:38AM +0800, Federico Sevilla III wrote:
 On Sat, Jul 15, 2006 at 12:48:56PM +0200, Martin Steigerwald wrote:
  I am currently gathering information to write an article about journal
  filesystems with emphasis on write barrier functionality, how it
  works, why journalling filesystems need write barrier and the current
  implementation of write barrier support for different filesystems.
 
 Cool! Would you by any chance have information on the interaction
 between journal filesystems with write barrier functionality, and
 software RAID (md)? Based on my experience with 2.6.17, XFS detects that
 the underlying software RAID 1 device does not support barriers and
 therefore disables that functionality.

Noone here seems to know, maybe Neil | the other folks on linux-raid
can help us out with details on status of MD and write barriers?

cheers.

-- 
Nathan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trying to brute-force my RAID 5...

2006-07-18 Thread Francois Barre

What are you expecting fdisk to tell you?  fdisk lists partitions and
I suspect you didn't have any partitions on /dev/md0
More likely you want something like
   fsck -n -f /dev/md0

and see which one produces the least noise.


Maybe a simple file -s /dev/md0 could do the trick, and would only
produce output different from the mere data when the good
configuration is found...

--
F.-E.B.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trying to brute-force my RAID 5...

2006-07-18 Thread Brad Campbell

Francois Barre wrote:

What are you expecting fdisk to tell you?  fdisk lists partitions and
I suspect you didn't have any partitions on /dev/md0
More likely you want something like
   fsck -n -f /dev/md0

and see which one produces the least noise.


Maybe a simple file -s /dev/md0 could do the trick, and would only
produce output different from the mere data when the good
configuration is found...

More likely to produce an output whenever the 1st disk in the array is in the right place as it will 
just look at the 1st couple of sectors for the superblock.


I'd go with the fsck idea as it will try to inspect the rest of the filesystem 
also.

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trying to brute-force my RAID 5...

2006-07-18 Thread Francois Barre

More likely to produce an output whenever the 1st disk in the array is in the 
right place as it will
just look at the 1st couple of sectors for the superblock.

I'd go with the fsck idea as it will try to inspect the rest of the filesystem 
also.



Obviously that's true, but it's still a good way to be sure of the
first disk, and the time cost of the file -s is
neglectible...Personally, I would have done both.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Still can't get md arrays that were started from an initrd to shutdown

2006-07-18 Thread Christian Pernegger

with lvm you have to stop lvm before you can stop the arrays... i wouldn't
be surprised if evms has the same issue...


AFAIK there's no counterpart to evms_activate.

Besides, I'm no longer using EVMS, I just included it in my testing
since this issue bit me there first.

Thanks,

Christian
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


read-ahead cache on indiv. raid members and entire md device

2006-07-18 Thread Roy Waldspurger

Hi,

I'm looking for advice on tuning the read-ahead cache for an md device...

for example, should I merely set the read-ahead for the md device:

blockdev --setra ### /dev/md2

or should I start touching the individual raid member devices:

blockdev --setra ### /dev/sdc1
blockdev --setra ### /dev/sdd1
blockdev --setra ### /dev/sde1, etc.

I'm not sure about the relation between the two caches.

Also, does the read-ahead cache for the entire md device show-up in 
/proc/ or /sys/block/mdX?  I of course find the read-ahead for the 
individual devices under, for example:


/sys/block/sdc/queue/read_ahead_kb

Many thanks in advance.

Cheers,

-- roy


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: which disk the the one that data is on?

2006-07-18 Thread Shai

Hi,

So if I were to want to stop the resync process on a very large array
(1.4T), since it is in the middle of the day and makes work slower...
How can I tell which drive is the one that is being used to check all
the rest of the data? Or in other words, how can I stop the resync
process and let it cont. later?

Shai

On 7/18/06, Neil Brown [EMAIL PROTECTED] wrote:

On Tuesday July 18, [EMAIL PROTECTED] wrote:
 Hi,

 I rebooted my server today to find out that one of the arrays is being
 re-synced (see output below)
 .
 1. What does the (S) to the right of hdh1[5](S) mean?

Spare.

 2. How do I know, from this output, which disk is the one holding the
 most current data and from which all the other drives are syncing
 from? Or are they all containing the data and this sync process is
 something else? Maybe I'm just not understanding what is being done
 exactly?

This is raid5.  The data is distributed over all of the drives.
There are also 'parity' blocks used for coping with missing devices.
This 'resync' process is checking that the parity blocks are all
correct and will correct any that are wrong.

NeilBrown



 Thanks in advance for the help.

 Shai
 ---

 md0 : active raid5 hdd1[1] hdc1[0] hdh1[5](S) hdg1[4] hdf1[3] hde1[2]
   781433344 blocks level 5, 64k chunk, algorithm 2 [5/5] [U]
   [=...]  resync =  5.1% (9965184/195358336)
 finish=180.5min speed=17110K/sec



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] enable auto=yes by default when using udev

2006-07-18 Thread Christian Pernegger

I think I'm leaning towards auto-creating names if they look like
standard names (or are listed in mdadm.conf?), but required
auto=whatever to create anything else.


The auto= option has the disadvantage that it is different for
partitionable and regular arrays -- is there no way to detect from the
array if it is supposed to be partitionable or not?

As it is scripts are better off creating the node with correct
major/minor and assembling without auto=.

Regards,

C.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid and LVM and LILO

2006-07-18 Thread Paul Waldo

Hi Du,

Did you create a /boot partition?  /boot cannot be on LVM (AFAIK), and 
can be a regular partition or raid1.  HTH.


Paul

Du wrote:
Hi, I was/am trying to install Debian Sarge r2 with 2 Sata HD's working 
on Raid 1 via Software and in this newly MD device, I put LVM. All works 
fine and the debian installs well, but when the LILO try to install, it 
says me that I dont have an active partition and no matter what I do, it 
doesnt installs


Does anybody knows what is happening?

Is it a bad idea work with Raid via software and LVM???
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: which disk the the one that data is on?

2006-07-18 Thread Bill Davidsen

Shai wrote:


Hi,

I rebooted my server today to find out that one of the arrays is being
re-synced (see output below)
.
1. What does the (S) to the right of hdh1[5](S) mean?
2. How do I know, from this output, which disk is the one holding the
most current data and from which all the other drives are syncing
from? Or are they all containing the data and this sync process is
something else? Maybe I'm just not understanding what is being done
exactly? 


In addition to what you have already been told, if you find out that 
the array is in rebuild I would be a lot more worried to find out why. 
If it was from unclean shutdown you really should look into a bitmap if 
you don't have one.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issue with internal bitmaps

2006-07-18 Thread Bill Davidsen

Bill Davidsen wrote:


Neil Brown wrote:


On Thursday July 6, [EMAIL PROTECTED] wrote:
 


hello, i just realized that internal bitmaps do not seem to work
anymore.
  



I cannot imagine why.  Nothing you have listed show anything wrong
with md...

Maybe you were expecting
  mdadm -X /dev/md100
to do something useful.  Like -E, -X must be applied to a component
device.  Try
  mdadm -X /dev/sda1

To take this from the other end, why should -X apply to a component? 
Since the components can and do change names, and you frequently 
mention assembly by UUID, why aren't the component names determined 
from the invariant array name when mdadm wants them, instead of having 
a user or script check the array to get the components?


Boy, I didn't say that well... what I meant to suggest is that when -E 
or -X are applied to the array as a whole, would it not be useful to 
itterate them over all of the components rather than than looking for 
non-existant data in the array itself?


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid and LVM and LILO

2006-07-18 Thread Paul Waldo
I assume that your /boot was raid1...  I had similar issues with the 
Debian installer, trying to install a file server using LVM on top of 
RAID.  I never did work out the problem; I installed Fedora Core :-/ 
Sorry I can't be of more help :-(


Paul

Du wrote:

Paul Waldo wrote:

Hi Du,

Did you create a /boot partition?  /boot cannot be on LVM (AFAIK), and 
can be a regular partition or raid1.  HTH.
The second thing I tried was that. I made a 200 MB /dev/md0 to be the 
/boot partition and the rest to be /dev/md1 where the system will be 
under LVM. But LILO says be that I dont have an active partition. I set 
the /dev/md0 to be bootable, and set the 2 Raid Linux Autodetect 
partitions that makes /dev/md0 bootable too. But LILO never installs...


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trying to brute-force my RAID 5...

2006-07-18 Thread Molle Bestefich

Sevrin Robstad wrote:

I created the RAID when I installed Fedora Core 3 some time ago,
didn't do anything special so the chunks should be 64kbyte and
parity should be left-symmetric ?


I have no idea what's default on FC3, sorry.


Any Idea ?


I missed that you were trying to fdisk -l /dev/md0..
As others have suggested, search for filesystems using fsck, or mount,
or what not ;-).
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm -X bitmap status off by 2^16

2006-07-18 Thread Paul Clements

Janos Farkas wrote:


# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 65536 dirty (22.9%)


This indicates that the _on-disk_ bits are cleared on two disks, but set 
on the third.




# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 hdd3[2] hdb3[0] hda3[1]
  585569024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
  bitmap: 0/140 pages [0KB], 1024KB chunk


This indicates that the _in-memory_ bits are all cleared.

At array startup, md initializes the in-memory bitmap from the on-disk 
copy. It then uses the in-memory bitmap from that point on, shadowing 
any changes there into the on-disk bitmap.


At the end of a rebuild (which should have happened after you added the 
third disk), the bits should all be cleared. The on-disk bits get 
cleared lazily, though. Is there any chance that they are cleared now? 
If not, it sounds like a bug to me.


--
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


md reports: unknown partition table

2006-07-18 Thread David Greaves
Hi

After a powercut I'm trying to mount an array and failing :(

teak:~# mdadm --assemble /dev/media --auto=p /dev/sd[bcdef]1
mdadm: /dev/media has been started with 5 drives.

Good

However:
teak:~# mount /media
mount: /dev/media1 is not a valid block device

teak:~# dd if=/dev/media1 of=/dev/null
dd: opening `/dev/media1': No such device or address

teak:~# dd if=/dev/media of=/dev/null
792442+0 records in
792441+0 records out
405729792 bytes transferred in 4.363571 seconds (92981135 bytes/sec)
(after ^C)

dmesg shows:
raid5: device sdb1 operational as raid disk 0
raid5: device sdf1 operational as raid disk 4
raid5: device sde1 operational as raid disk 3
raid5: device sdd1 operational as raid disk 2
raid5: device sdc1 operational as raid disk 1
raid5: allocated 5235kB for md_d127
raid5: raid level 5 set md_d127 active with 5 out of 5 devices, algorithm 2
RAID5 conf printout:
 --- rd:5 wd:5 fd:0
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sde1
 disk 4, o:1, dev:sdf1
md_d127: bitmap initialized from disk: read 1/1 pages, set 0 bits, status: 0
created bitmap (5 pages) for device md_d127
 md_d127: unknown partition table

That last line looks odd...

It was created like so:

mdadm --create /dev/media --level=5 -n 5 -e1.2 --bitmap=internal
--name=media --auto=p /dev/sd[bcdef]1

and the xfs fstab entry is:
  /dev/media1 /media xfs rw,noatime,logdev=/dev/media2 0 0

fdisk /dev/media
shows:
 Device Boot  Start End  Blocks   Id  System
/dev/media1   1   312536035  1250144138   83  Linux
/dev/media2   312536036   312560448   97652   da  Non-FS data

cfdisk even gets the filesystem right...

Which is expected.

teak:~# ll /dev/media*
brw-rw  1 root disk 254, 192 2006-07-18 17:18 /dev/media
brw-rw  1 root disk 254, 193 2006-07-18 17:18 /dev/media1
brw-rw  1 root disk 254, 194 2006-07-18 17:18 /dev/media2
brw-rw  1 root disk 254, 195 2006-07-18 17:18 /dev/media3
brw-rw  1 root disk 254, 196 2006-07-18 17:18 /dev/media4

teak:~# uname -a
Linux teak 2.6.16.19-teak-060602-01 #3 PREEMPT Sat Jun 3 09:20:24 BST
2006 i686 GNU/Linux
teak:~# mdadm -V
mdadm - v2.5.2 -  27 June 2006


David

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm -X bitmap status off by 2^16

2006-07-18 Thread Janos Farkas
Hi!

On 2006-07-18 at 11:30:42, Paul Clements wrote:
 Personalities : [raid1] [raid6] [raid5] [raid4]
 md0 : active raid5 hdd3[2] hdb3[0] hda3[1]
   585569024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
   bitmap: 0/140 pages [0KB], 1024KB chunk
 This indicates that the _in-memory_ bits are all cleared.

Makes sense.

 At array startup, md initializes the in-memory bitmap from the on-disk 
 copy. It then uses the in-memory bitmap from that point on, shadowing 
 any changes there into the on-disk bitmap.
 
 At the end of a rebuild (which should have happened after you added the 
 third disk), the bits should all be cleared. The on-disk bits get 
 cleared lazily, though. Is there any chance that they are cleared now? 
 If not, it sounds like a bug to me.

I just removed/readded the bitmap as follows, but before that, the 65536
still was there as of 5 minutes ago.

# mdadm /dev/md0 --grow -b none
# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 3 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 3 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 65539 dirty (22.9%)
# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 3 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 3 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 65539 dirty (22.9%)

(Bitmaps still present, probably I was just too impatient after the
removal)

# mdadm /dev/md0 --grow -b internal
# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 hdd3[2] hdb3[0] hda3[1]
  585569024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
  bitmap: 140/140 pages [560KB], 1024KB chunk

unused devices: none

(Ouch, I hoped there wouldn't be another resync :)

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 hdd3[2] hdb3[0] hda3[1]
  585569024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
  bitmap: 1/140 pages [4KB], 1024KB chunk

unused devices: none

(Now the in-memory bitmaps seems to be emptied again)

# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
  Bitmap : 285923 bits (chunks), 285923 dirty (100.0%)
# for i in hdb3 hdd3 hda3 ; mdadm -X /dev/$i|grep map
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)
  Bitmap : 285923 bits (chunks), 0 dirty (0.0%)

And fortunately the on disk ones too...

This discrepancy was there after at least two reboots after the whole
resync has been done.  I also did a scrub (check) on the array, and
it still did not change.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XFS and write barrier

2006-07-18 Thread David Chinner
On Tue, Jul 18, 2006 at 06:58:56PM +1000, Neil Brown wrote:
 On Tuesday July 18, [EMAIL PROTECTED] wrote:
  On Mon, Jul 17, 2006 at 01:32:38AM +0800, Federico Sevilla III wrote:
   On Sat, Jul 15, 2006 at 12:48:56PM +0200, Martin Steigerwald wrote:
I am currently gathering information to write an article about journal
filesystems with emphasis on write barrier functionality, how it
works, why journalling filesystems need write barrier and the current
implementation of write barrier support for different filesystems.
 
 Journalling filesystems need write barrier isn't really accurate.
 They can make good use of write barrier if it is supported, and where
 it isn't supported, they should use blkdev_issue_flush in combination
 with regular submit/wait.

blkdev_issue_flush() causes a write cache flush - just like a
barrier typically causes a write cache flush up to the I/O with the
barrier in it.  Both of these mechanisms provide the same thing - an
I/O barrier that enforces ordering of I/Os to disk.

Given that filesystems already indicate to the block layer when they
want a barrier, wouldn't it be better to get the block layer to issue
this cache flush if the underlying device doesn't support barriers
and it receives a barrier request?

FWIW, Only XFS and Reiser3 use this function, and only then when
issuing a fsync when barriers are disabled to make sure a common
test (fsync then power cycle) doesn't result in data loss...

  Noone here seems to know, maybe Neil | the other folks on linux-raid
  can help us out with details on status of MD and write barriers?
 
 In 2.6.17, md/raid1 will detect if the underlying devices support
 barriers and if they all do, it will accept barrier requests from the
 filesystem and pass those requests down to all devices.
 
 Other raid levels will reject all barrier requests.

Any particular reason for not supporting barriers on the other types
of RAID?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: md reports: unknown partition table - fixed.

2006-07-18 Thread David Greaves
David Greaves wrote:
 Hi
 
 After a powercut I'm trying to mount an array and failing :(

A reboot after tidying up /dev/ fixed it.

The first time through I'd forgotten to update the boot scripts and they
were assembling the wrong UUID. That was fine; I realised this and ran
the manual assemble:

  mdadm --assemble /dev/media /dev/sd[bcdef]1
  dmesg
  cat /proc/mdstat

All OK (but I'd forgotten that this was a partitioned array). I suspect
the device entries for /dev/media[1234] from last time were hanging about.

  mount /media
  fdisk /dev/media
So I guess this fails because the major-minor are for a non-p md device?

  mdadm --assemble /dev/media --auto=p /dev/sd[bcdef]1
  mdadm --stop /dev/media
This fails because I'm on mdadm 2.4.1

  mdadm --assemble /dev/media --auto=p /dev/sd[bcdef]1
  cat /proc/mdstat
  mdadm --stop /dev/md_d0
  mdadm --stop /dev/md0
  cat /proc/mdstat
So by now I upgrade to mdadm 2.5.1 in another session.

  mdadm --stop /dev/media
  dmesg
  cat /proc/mdstat
and it stops.

  mdadm --assemble /dev/media --auto=p /dev/sd[bcdef]1
But now it won't create working devices...

Much messing about with assemble and I try a kernel upgrade - can't
because the driver for my video card won't compile under 2.6.17 yet so
WTF, I suspect major/minor numbers so just reboot it under the same kernel.

All seems well.

I think there's a bug here somewhere. I wonder/suspect that the
superblock should contain the fact that it's a partitioned/able md device?

David

-- 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trying to brute-force my RAID 5...

2006-07-18 Thread Sevrin Robstad

Neil Brown wrote:

I have written some posts about this before... My 6 disk RAID 5 broke
down because of hardware failure. When I tried to get it up'n'running 
again

I did a --create without any missing disk, which made it rebuild. I have
also lost all information about how the old RAID was set up..

I got a friend of mine to make a list of all the 6^6 combinations of dev
1 2 3 4 5 missing, and set it up this way :

mdadm --create -n 6 -l 5  dev1 2 3 4 5 missing ; fdisk -l /dev/md0 ;
mdadm --stop /dev/md0 .
But a cat logfile | grep Linux of the output of this script tells me
that on no of these combination does it find a valid type 83 partition.

shouldn't this work ???

 No.

 What are you expecting fdisk to tell you?  fdisk lists partitions and
 I suspect you didn't have any partitions on /dev/md0
 More likely you want something like
fsck -n -f /dev/md0

 and see which one produces the least noise.

They all produce

Bad magic number in super-block while trying to open /dev/md0 .

I tried file -s /dev/md0 also, and with one of the disk as first disk I 
got ext 3 filedata (needs journal recovery) (errors) .


but as fsck -n -f can't do anything with it, there might not be any hope ?


Or can it still be that I have some wrong setting?

Chunk size is (and was) default 64k, yes?

Sevrin
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issue with internal bitmaps

2006-07-18 Thread Luca Berra

On Tue, Jul 18, 2006 at 09:34:35AM -0400, Bill Davidsen wrote:
Boy, I didn't say that well... what I meant to suggest is that when -E 
or -X are applied to the array as a whole, would it not be useful to 
itterate them over all of the components rather than than looking for 
non-existant data in the array itself?

the question i believe is to distinguish the case where an md device is
a component of another md device...
L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: only 4 spares and no access to my data

2006-07-18 Thread Nix
On 18 Jul 2006, Neil Brown moaned:
 The superblock locations for sda and sda1 can only be 'one and the
 same' if sda1 is at an offset in sda which is a multiple of 64K, and
 if sda1 ends near the end of sda.  This certainly can happen, but it
 is by no means certain.
 
 For this reason, version-1 superblocks record the offset of the
 superblock in the device so that if a superblock is written to sda1
 and then read from sda, it will look wrong (wrong offset) and so will
 be ignored (no valid superblock here).

One case where this can happen is Sun slices (and I think BSD disklabels
too), where /dev/sda and /dev/sda1 start at the *same place*.

(This causes amusing problems with LVM vgscan unless the raw devices
are excluded, too.)

-- 
`We're sysadmins. We deal with the inconceivable so often I can clearly 
 see the need to define levels of inconceivability.' --- Rik Steenwinkel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: which disk the the one that data is on?

2006-07-18 Thread Shai

Hi,

Another question on this matter please:
If there is a raid5 with 4 disks and 1 missing, and we add that disk,
while its doing the resync of that disk, how do we know which disk it
is (if we forgot what one we added)?

Thanks,
Shai

On 7/18/06, Neil Brown [EMAIL PROTECTED] wrote:

On Tuesday July 18, [EMAIL PROTECTED] wrote:
 Hi,

 So if I were to want to stop the resync process on a very large array
 (1.4T), since it is in the middle of the day and makes work slower...
 How can I tell which drive is the one that is being used to check all
 the rest of the data? Or in other words, how can I stop the resync
 process and let it cont. later?

There is no one that is being used to check all the rest.  They are
all in this together.

But if you want to stop the resync because it is slowing things down
then write a small number to /proc/sys/dev/raid/speed_limit_min
e.g.

  echo 10  /proc/sys/dev/raid/speed_limit_min

This won't actually stop it, but it will make it go very slowly so as
not to interfere with anything else.

NeilBrown


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html