Re: mkraid secret flag

2000-03-19 Thread m.allan noah

WARNING: i am going to type the secret command in the following mail. if you
cannot handle it, delete it now. if you want a rational explanation for why i
would do that, read on. i am not trying to be inflammatory, just make a point.

i have been using software raid under linux for quite a while now. both my
current and prior employers are using software raid boxes that i have setup
over the past two+ years. some of the older ones are running kernel 2.0.3x and
the more recent ones are 2.2.x of some sort. some older ones are using the
0.4x raid code, the newer, 0.90.

with these 20 or so machines, i have found the 0.90 tools and kernel level
driver to be much more robust and user friendly. i can remember when raid had
a single mdtab, then MULTIPLE conf files, and now a single raidtab. the newer
versions, with the inclusion of things like martin bene's failed-disk patches,
has been one of the most useful pieces of software i have applied in an x86
server environment. by comparison, the 0.40 raid code, replete with initrds
and weird startup scripts looks quite pathetic.

while there is some point to protecting a user from system damage, the message
that is printed, and the command name needed to actually get the work done,
make the 0.90 code appear unstable or untested.

there are at least hundreds if not thousands of people on this list who use
linux, and linux software raid every day. this is code is stable, and it is
tested. it is bulletproof in comparison to the 0.4x code, and if mr cox has
his way, it will finally be in the main kernel.

as such, i feel it is time to rename the --really-force command from mkraid to
--force. to leave as-is, is just great ammunition for those who would slight
the efforts of the developers as incomplete or amaturish.

people know they are taking their own system into their hands. this is WHY
they are using linux, rather than an os that hides the real innards of the
system from them. the extra logic required to make them type the longer
command, however simple, just adds kruft to the binary, to print a message
that is a GIVEN when you are root on your box.

think about it! rm by default does not -i! and rm is potentially MUCH more
dangerous than mkraid. hell, we tell people on this list ALL THE TIME to run
mkraid --really-force on their EXISTING partitions: "it only overwrites the
raid superblock"! we dont tell people to fix problems with rm -rf * ! should
we go back and change rm to always be in -i mode? fdisk does not print such
messages. mke2fs on a clean partition? no.

i feel that mingo/gadi et al have done a fine job, and these utils need to
take the same approach as other system level programs- no convoluted messages
asking for non-disclosure, just the normal warning, and the five second pause.
raid 0.90 is almost grown up. it should act that way.

wow- that was longer than i intended on such a simple subject- sorry :)

allan noah

James Manning [EMAIL PROTECTED] said: 

 [ Wednesday, March 15, 2000 ] root wrote:
   mkraid --**-force /dev/md0
 
 /me attempts to get the Stupid Idea Of The Month award
 
 Motivation: trying to keep the Sekret Flag a secret is a failed effort
 (the number of linux-raid archives, esp. those that are searchable, make
 this a given), and a different approach could help things tremendously.
 
 *** Idea #1:
 
 How about --force / -f look for $HOME/.md_force_warning_read and
 
 if not exists:
  - print huge warning (and beep thousands of times as desired)
  - creat()/close() the file
 
 if exists:
  - Do the Horrifically Dangerous stuff
 
 Benefit:  everyone has to read at least once (or at a minimum create a
   file that says they've read it)
 Downside: adds a $HOME/ entry, relies on getenv("HOME"), etc.
 
 *** Idea #2:
 
 --force / -f prints a warning, prompts for input (no fancy term
 tricks), and continues only on "yes" being entered (read(1,..) so
 we can "echo yes |mkraid --force" in cases we want it automated).
 
 Benefit:  warning always generated
 Downside: slightly more complicated to script
 
 Both are fairly trivial patches, so I'll be glad to generate the
 patch for whichever (if either :) people seem to like.
 
 James
 





raid parity and mmx processor

2000-03-19 Thread Michael Robinton

As I understand it, the kernel tests parity calc speed and decides to 
use/not use mmx based on which is faster.

Assuming an otherwise heavily loaded cpu, would it not be better to use 
mmx even though slightly slower just to free up cpu cycles for other tasks.

If I don't have the right picture here, someone please provide an 
explaination of where the cpu cycles go to raid.

This reason for my question is the amount of "nice" cycles on one of my 
machines that is fairly busy. Only raid5d is niced, and I'm running a 
load factor over one consistently with only about a 20% cpu load. It 
appears most of the cpu cycles in use go to raid5 -- presumably parity 
calculations.

Assuming I'm not too far off base, it would be nice to have a /proc 
option or at least a compile option to force use of mmx for raid parity 
and a way to run the parity calculation test manually to see the difference 
between the two methods.

Michael



[PATCHES] Re: mkraid secret flag

2000-03-19 Thread James Manning

Patches attached:

#1: allan noah's suggestion (small warning, 5 seconds, that's it)
#2: untested "it compiles" patch for warning file (with Seth's 2 week
recommendation on time-span)

[ Saturday, March 18, 2000 ] m. allan noah wrote:
 think about it! rm by default does not -i!

true, although most systems (just going by RH's volume) have alias rm="rm
-i" for root (as well as a couple of other possibly-destructive commands)

 i feel that mingo/gadi et al have done a fine job, and these utils need to
 take the same approach as other system level programs- no convoluted messages
 asking for non-disclosure, just the normal warning, and the five second pause.
 raid 0.90 is almost grown up. it should act that way.

raid 0.90 maturity is orthogonal to the issue of whether we want to warn
people on a potentially destructive command.  The motivation "It really
sucks to LOSE DATA!" applys equally well to Bug-Free (tm) kernel code
as to stuff in development (ie, you're willing to destroy what's on disk).

In any case, since the patches are small and easy to get almost any
warning behavior desired (or none at all), it'll boil down to distro
preference anyway.

James


--- raidtools-0.90/mkraid.c.origSun Mar 19 03:31:48 2000
+++ raidtools-0.90/mkraid.c Sun Mar 19 03:33:46 2000
@@ -68,7 +68,6 @@
 int version = 0, help = 0, debug = 0;
 char * configFile = RAID_CONFIG;
 int force_flag = 0;
-int old_force_flag = 0;
 int upgrade_flag = 0;
 int no_resync_flag = 0;
 int all_flag = 0;
@@ -79,8 +78,7 @@
 enum mkraidFunc func;
 struct poptOption optionsTable[] = {
{ "configfile", 'c', POPT_ARG_STRING, configFile, 0 },
-   { "force", 'f', 0, old_force_flag, 0 },
-   { "really-force", 'R', 0, force_flag, 0 },
+   { "force", 'f', 0, force_flag, 0 },
{ "upgrade", 'u', 0, upgrade_flag, 0 },
{ "dangerous-no-resync", 'r', 0, no_resync_flag, 0 },
{ "help", 'h', 0, help, 0 },
@@ -116,12 +114,8 @@
}
 } else if (!strcmp (namestart, "raid0run")) {
 func = raid0run;
-   if (old_force_flag) {
-   fprintf (stderr, "--force not possible for raid0run!\n");
-   return (EXIT_FAILURE);
-   }
if (force_flag) {
-   fprintf (stderr, "--really-force not possible for raid0run!\n");
+   fprintf (stderr, "--force not possible for raid0run!\n");
return (EXIT_FAILURE);
}
if (upgrade_flag) {
@@ -167,23 +161,6 @@
 
 if (getMdVersion(ver)) {
fprintf(stderr, "cannot determine md version: %s\n", strerror(errno));
-   return EXIT_FAILURE;
-}
-
-if (old_force_flag  (func == mkraid)) {
-   fprintf(stderr, 
-
-"--force and the new RAID 0.90 hot-add/hot-remove functionality should be\n"
-" used with extreme care! If /etc/raidtab is not in sync with the real array\n"
-" configuration, then a --force will DESTROY ALL YOUR DATA. It's especially\n"
-" dangerous to use -f if the array is in degraded mode. \n\n"
-" PLEASE dont mention the --really-force flag in any email, documentation or\n"
-" HOWTO, just suggest the --force flag instead. Thus everybody will read\n"
-" this warning at least once :) It really sucks to LOSE DATA. If you are\n"
-" confident that everything will go ok then you can use the --really-force\n"
-" flag. Also, if you are unsure what this is all about, dont hesitate to\n"
-" ask questions on [EMAIL PROTECTED]\n");
-
return EXIT_FAILURE;
 }
 


--- raidtools-0.90/mkraid.c.origSun Mar 19 03:31:48 2000
+++ raidtools-0.90/mkraid.c Sun Mar 19 03:55:19 2000
@@ -68,7 +68,6 @@
 int version = 0, help = 0, debug = 0;
 char * configFile = RAID_CONFIG;
 int force_flag = 0;
-int old_force_flag = 0;
 int upgrade_flag = 0;
 int no_resync_flag = 0;
 int all_flag = 0;
@@ -79,8 +78,7 @@
 enum mkraidFunc func;
 struct poptOption optionsTable[] = {
{ "configfile", 'c', POPT_ARG_STRING, configFile, 0 },
-   { "force", 'f', 0, old_force_flag, 0 },
-   { "really-force", 'R', 0, force_flag, 0 },
+   { "force", 'f', 0, force_flag, 0 },
{ "upgrade", 'u', 0, upgrade_flag, 0 },
{ "dangerous-no-resync", 'r', 0, no_resync_flag, 0 },
{ "help", 'h', 0, help, 0 },
@@ -116,12 +114,8 @@
}
 } else if (!strcmp (namestart, "raid0run")) {
 func = raid0run;
-   if (old_force_flag) {
-   fprintf (stderr, "--force not possible for raid0run!\n");
-   return (EXIT_FAILURE);
-   }
if (force_flag) {
-   fprintf (stderr, "--really-force not possible for raid0run!\n");
+   fprintf (stderr, "--force not possible for raid0run!\n");
return (EXIT_FAILURE);
}
if (upgrade_flag) {
@@ -170,8 +164,17 @@
return EXIT_FAILURE;
 }
 
-if (old_force_flag  (func == mkraid)) {
-   fprintf(stderr, 
+if (force_flag  (func == mkraid)) {

Adding extra disk on Raid1?

2000-03-19 Thread Brian Lavender

I have a running Raid1 with the mingo patches on the 2.2.14 kernel. I
got everything working, so I decided to turn off the machine and remove
one of the disks. I deleted the contents of the disk one, and then I 
restarted the machine. It successfully started in degraded mode. I am running
root raid. I then took the removed disk, and I recreated the partitions on it. 
I tried sticking it back in and doing a raidhotadd, but it complains. 

everest:~# raidhotadd  /dev/md5 /dev/hda5
trying to hot-add hda5 to md5 ...
/dev/md5: can not hot-add disk: disk busy!
everest:~# raidhotadd   /dev/md6 /dev/hda6
trying to hot-add hda6 to md6 ...
/dev/md6: can not hot-add disk: disk busy!


Here is a snippet of the /etc/raidtab.

raiddev /dev/md5
  raid-level 1
  nr-raid-disks  2
  nr-spare-disks 0
  chunk-size 4
  persistent-superblock 1

  device /dev/hde5
  raid-disk  0

  device /dev/hda5
  raid-disk  1

raiddev /dev/md6
  raid-level 1
  nr-raid-disks  2
  nr-spare-disks 0
  chunk-size 4
  persistent-superblock 1

  device /dev/hde6
  raid-disk  0

  device /dev/hda6
  raid-disk  1

Section from /var/log/syslog

Mar 19 02:30:06 everest kernel: md6: no spare disk to reconstruct array! -- continuing 
in degraded mode
Mar 19 02:30:06 everest kernel: md5: no spare disk to reconstruct array! -- continuing 
in degraded mode

-- 
Brian Lavender
http://www.brie.com/brian/



Re: raid parity and mmx processor

2000-03-19 Thread Martin Eriksson

- Original Message -
From: "Michael Robinton" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, March 19, 2000 9:54 AM
Subject: raid parity and mmx processor


 As I understand it, the kernel tests parity calc speed and decides to
 use/not use mmx based on which is faster.

 Assuming an otherwise heavily loaded cpu, would it not be better to use
 mmx even though slightly slower just to free up cpu cycles for other
tasks.

If you are doing the calculations faster, it takes shorter time before the
CPU is free to do other things. Fastest is fastest.


 If I don't have the right picture here, someone please provide an
 explaination of where the cpu cycles go to raid.

 This reason for my question is the amount of "nice" cycles on one of my
 machines that is fairly busy. Only raid5d is niced, and I'm running a
 load factor over one consistently with only about a 20% cpu load. It
 appears most of the cpu cycles in use go to raid5 -- presumably parity
 calculations.

What kinda disks are you running on? SCSI? IDE? If IDE, do you have all nice
things such as DMA transfers enabled?


 Assuming I'm not too far off base, it would be nice to have a /proc
 option or at least a compile option to force use of mmx for raid parity
 and a way to run the parity calculation test manually to see the
difference
 between the two methods.

Maybe not for speed, but for bug tracing. Anyway, that would be good.


 Michael

_
|  Martin Eriksson [EMAIL PROTECTED]
|  http://www.fysnet.nu/
|  ICQ: 5869159




Re: Adding extra disk on Raid1?

2000-03-19 Thread Daniel Wirth


Did raidhotremove the disk first?

Daniel

On Sun, 19 Mar 2000, Brian Lavender wrote:

 Date: Sun, 19 Mar 2000 02:44:27 -0800
 From: Brian Lavender [EMAIL PROTECTED]
 To: Linux Raid [EMAIL PROTECTED]
 Subject: Adding extra disk on Raid1?
 
 I have a running Raid1 with the mingo patches on the 2.2.14 kernel. I
 got everything working, so I decided to turn off the machine and remove
 one of the disks. I deleted the contents of the disk one, and then I 
 restarted the machine. It successfully started in degraded mode. I am running
 root raid. I then took the removed disk, and I recreated the partitions on it. 
 I tried sticking it back in and doing a raidhotadd, but it complains. 
 
 everest:~# raidhotadd  /dev/md5 /dev/hda5
 trying to hot-add hda5 to md5 ...
 /dev/md5: can not hot-add disk: disk busy!
 everest:~# raidhotadd   /dev/md6 /dev/hda6
 trying to hot-add hda6 to md6 ...
 /dev/md6: can not hot-add disk: disk busy!
 
 
 Here is a snippet of the /etc/raidtab.
 
 raiddev /dev/md5
   raid-level   1
   nr-raid-disks  2
   nr-spare-disks 0
   chunk-size 4
   persistent-superblock 1
 
   device /dev/hde5
   raid-disk  0
 
   device /dev/hda5
   raid-disk  1
 
 raiddev /dev/md6
   raid-level 1
   nr-raid-disks  2
   nr-spare-disks 0
   chunk-size 4
   persistent-superblock 1
 
   device /dev/hde6
   raid-disk  0
 
   device /dev/hda6
   raid-disk  1
 
 Section from /var/log/syslog
 
 Mar 19 02:30:06 everest kernel: md6: no spare disk to reconstruct array! -- 
continuing in degraded mode
 Mar 19 02:30:06 everest kernel: md5: no spare disk to reconstruct array! -- 
continuing in degraded mode
 
 

-- 


DANIEL WIRTH

[EMAIL PROTECTED]
Tel.: +49 89 890 41 641
Fax.: +49 89 890 41 642




Raid0 error messages

2000-03-19 Thread Ulf Mehlig

Sorry if this is not the appropriate place to ask -- I would
appreciate if you would direct me to the right site :-)

Since a few days, we get lots of messages like

  Mar 19 15:45:12 server kernel: raid0_map bug: hash-zone0==NULL for block 1895815569 
  Mar 19 15:45:12 server kernel: Bad md_map in ll_rw_block 
  Mar 19 15:45:12 server kernel: raid0_map bug: hash-zone0==NULL for block 1895815569 
  Mar 19 15:45:12 server kernel: Bad md_map in ll_rw_block 
  Mar 19 15:45:15 server kernel: raid0_map bug: hash-zone0==NULL for block 1895815569 

in our syslogs. We have RedHat 6.0 with kernel 2.11 and the
raid-patches for that kernel. /proc/mdstat contains

  Personalities : [raid0] [translucent] 
  read_ahead 1024 sectors
  md0 : active raid0 sdc1[1] sdb1[0] 17847936 blocks 16k chunks
  unused devices: none

and our raidtab is

  raiddev /dev/md0
  raid-level0
  persistent-superblock 1
  nr-raid-disks 2
  nr-spare-disks0
  chunk-size   16 
  device/dev/sdb1
  raid-disk 0
  device/dev/sdc1
  raid-disk 1

Do we have to fear something really weird? What are youre suggestions?
I would like to reconstruct the raid, but I have to install a backup
server, first ...

BTW, should we stay with kernel 2.2.11, or should we upgrade to 2.14
and the raid patches at http://people.redhat.com/mingo/?

Many thanks for your attention!
Regards,
Ulf

P.S.: Please CC: me, I'm not on this list at the moment!

-- 
==
Ulf Mehlig[EMAIL PROTECTED]
  Center for Tropical Marine Ecology/ZMT, Bremen, Germany
--



Re: raid parity and mmx processor

2000-03-19 Thread Michael Robinton


On Sun, 19 Mar 2000, Martin Eriksson wrote:

 - Original Message -
 From: "Michael Robinton" [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Sunday, March 19, 2000 9:54 AM
 Subject: raid parity and mmx processor
 
 
  As I understand it, the kernel tests parity calc speed and decides to
  use/not use mmx based on which is faster.
 
  Assuming an otherwise heavily loaded cpu, would it not be better to use
  mmx even though slightly slower just to free up cpu cycles for other
 tasks.
 
 If you are doing the calculations faster, it takes shorter time before the
 CPU is free to do other things. Fastest is fastest.
 
Hmmm not really, if so, why would we have multi-processor systems.
mmx is a distinct processor apart from the fpp or main cpu. It is 
possible for all to be doing separate operations at the same time.
i.e. gzip, raid parity, network traffic


 
  If I don't have the right picture here, someone please provide an
  explaination of where the cpu cycles go to raid.
 
  This reason for my question is the amount of "nice" cycles on one of my
  machines that is fairly busy. Only raid5d is niced, and I'm running a
  load factor over one consistently with only about a 20% cpu load. It
  appears most of the cpu cycles in use go to raid5 -- presumably parity
  calculations.
 
 What kinda disks are you running on? SCSI? IDE? If IDE, do you have all nice
 things such as DMA transfers enabled?
 
IDE 33 +dma

 
  Assuming I'm not too far off base, it would be nice to have a /proc
  option or at least a compile option to force use of mmx for raid parity
  and a way to run the parity calculation test manually to see the
 difference
  between the two methods.
 
 Maybe not for speed, but for bug tracing. Anyway, that would be good.
 
 



Re: raid parity and mmx processor

2000-03-19 Thread m.allan noah

   As I understand it, the kernel tests parity calc speed and decides to
   use/not use mmx based on which is faster.
  
   Assuming an otherwise heavily loaded cpu, would it not be better to use
   mmx even though slightly slower just to free up cpu cycles for other
  tasks.
  
  If you are doing the calculations faster, it takes shorter time before the
  CPU is free to do other things. Fastest is fastest.
  
 Hmmm not really, if so, why would we have multi-processor systems.
 mmx is a distinct processor apart from the fpp or main cpu. It is 
 possible for all to be doing separate operations at the same time.
 i.e. gzip, raid parity, network traffic

untrue. mmx is NOT a separate entity within an intel cpu. go read the data
sheets for any mmx intel cpu. or even better, go read the simd article at
www.arstechnica.com

mmx, due to its cheap, hackish nature, re-uses the fpu registers. you cannot
do an mmx instruction followed by an fpu instruction in the same program
without a painful context switch.

this is not an issue with other processes on the system, however, since you
are going to have some task switching anyway when you go from the raid parity
code back to a userland piece of code and so on.

there is always the debate as well about how well your compiler/kernel will
order your instructions/schedule your procs so that different functional units
within the cpu keep their pipelines full.

afaik, it is better to just write fast code and get it out of the way of the
other procs asap, than try to get two different procs to execute two different
instruction sets on the same cpu at the same time.

allan

 
  
   If I don't have the right picture here, someone please provide an
   explaination of where the cpu cycles go to raid.
  
   This reason for my question is the amount of "nice" cycles on one of my
   machines that is fairly busy. Only raid5d is niced, and I'm running a
   load factor over one consistently with only about a 20% cpu load. It
   appears most of the cpu cycles in use go to raid5 -- presumably parity
   calculations.
  
  What kinda disks are you running on? SCSI? IDE? If IDE, do you have all
nice
  things such as DMA transfers enabled?
  
 IDE 33 +dma
 
  
   Assuming I'm not too far off base, it would be nice to have a /proc
   option or at least a compile option to force use of mmx for raid parity
   and a way to run the parity calculation test manually to see the
  difference
   between the two methods.
  
  Maybe not for speed, but for bug tracing. Anyway, that would be good.
  
  
 





product testimonials

2000-03-19 Thread Seth Vidal

Hi folks,
 I've got a user in my dept who is thinking about using software raid5
(after I explained the advantages to them) - but they want "testimonials"
ie: - people who have used software raid5 under linux and have had it save
their ass or have had it work correctly and keep them from a costly backup
restore. IE: success stories. Also I would like to hear some failure
stories too - sort of horror stories - now the obscure situations I don't
care about - if you got an axe stuck in your drive by accident and it
killed the entire array then I  feel sorry for you but I don't consider
that average use.

Can anyone give some testimonials on the 0.90 raid?
thanks

-sv



Re: product testimonials

2000-03-19 Thread Martin Eriksson

I have not tested RAID5, but RAID1 has spared me much. One of the HD's
failed, and I simply just tugged it out (it was the master disk, so I had to
change the jumpers).

I copy the MBR and the root partition after every major change (because they
are not raided) so it just booted up in degraded mode.

When I got the second HD back, I plugged it in and did a disk-wide dd and
also ran the appropriate raid commands, and lo! it started working like a
charm!

This was on two similar disks though (IBM 7200rpm) ran from standard DMA
controller on a BE6-2.

Was a pain to set up though...

_
|  Martin Eriksson [EMAIL PROTECTED]
|  http://www.fysnet.nu/
|  ICQ: 5869159


- Original Message -
From: "Seth Vidal" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, March 20, 2000 4:16 AM
Subject: product testimonials


 Hi folks,
  I've got a user in my dept who is thinking about using software raid5
 (after I explained the advantages to them) - but they want "testimonials"
 ie: - people who have used software raid5 under linux and have had it save
 their ass or have had it work correctly and keep them from a costly backup
 restore. IE: success stories. Also I would like to hear some failure
 stories too - sort of horror stories - now the obscure situations I don't
 care about - if you got an axe stuck in your drive by accident and it
 killed the entire array then I  feel sorry for you but I don't consider
 that average use.

 Can anyone give some testimonials on the 0.90 raid?
 thanks

 -sv