Bug#575382: nagios-plugins-standard: check_linux_raid does not warn if resync is in process

2012-02-14 Thread Petter Reinholdtsen

[Christoph Martin]
 check_linux_raid should also warn if a md device is in resync mode and
 not only if in recovory mode.

I suspect this is the wrong conclusion to the problem.  I ran into a
similar issue with my Nagios monitored raid, where one of the disks
failed and the spare were automatically resynced into the RAID.  But I
do not want a warning because of the resync.  I want a warning because
there is a failing disk.

So in this case:

[Jan Wagner]
 md3 : active raid10 sdd4[4](F) sdc4[1] sdb4[5](F) sda4[0]
   1887974656 blocks 64K chunks 2 near-copies [4/2] [UU__]
   [==..]  recovery = 50.3% (474987648/943987328) 
 finish=4363639.0min speed=1K/sec

I believe the module should report the devices listed with '(F)' as at
least a warning and preferably a critical issue, and ignore the fact
that a sync/recovery is in progress.

It would also be nice if it would report the disk serial number of the
failing disk, to make it easier to locate the correct disk when
replacing disks.  The serial number can either be discovered using
'hdparm -I /dev/sdd4' (in the example above), or by looking in /sys/.
-- 
Happy hacking
Petter Reinholdtsen



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#575382: [Pkg-nagios-devel] Bug#575382: nagios-plugins-standard: check_linux_raid does not warn if resync is in process

2010-03-29 Thread Christoph Martin
Jan Wagner schrieb am 29.03.2010 00:02:
 Hi Christoph,
 
 Okay ... found an example, hopefully equivalent to your problem:
 
 bb:~# cat /tmp/mdstat
 Personalities : [raid1] [raid10]
 md2 : active raid1 sdd2[4] sdc2[1] sdb2[2] sda2[0]
   16386176 blocks [4/3] [UUU_]
 resync=DELAYED
 
 md1 : active raid1 sdd3[4] sdc3[1] sdb3[2] sda3[0]
   6144768 blocks [4/3] [UUU_]
 resync=DELAYED
 
 md3 : active raid10 sdd4[4](F) sdc4[1] sdb4[5](F) sda4[0]
   1887974656 blocks 64K chunks 2 near-copies [4/2] [UU__]
   [==..]  recovery = 50.3% (474987648/943987328) 
 finish=4363639.0min speed=1K/sec
 
 md0 : active raid1 sdd1[4](S) sdc1[5] sdb1[2] sda1[0]
   10241280 blocks [4/2] [U_U_]
 resync=DELAYED
 
 unused devices: none
 
 bb:~# /tmp/check_linux_raid md3
 WARNING md3 status=[UU__], recovery=50.3%, finish=4363639.0min.
 bb:~# /tmp/check_linux_raid_new md3
 WARNING md3 status=[UU__], recovery=recovery, finish=4363639.0min.
 bb:~# diff -Nur /tmp/check_linux_raid /tmp/check_linux_raid_new
 --- /tmp/check_linux_raid 2010-03-28 23:19:08.0 +0200
 +++ /tmp/check_linux_raid_new 2010-03-28 21:33:44.0 +0200
 @@ -61,7 +61,7 @@
   if (defined $device) {
   if (/(\[[_U]+\])/) {
   $status{$device} = $1;
 - } elsif (/recovery = (.*?)\s/) {  
 + } elsif (/(recovery|resync) = (.*?)\s/) {  
   $recovery{$device} = $1;
   ($finish{$device}) = /finish=(.*?min)/;
   } elsif (/^\s*$/) {
 
 I don't see, how that should help. :) Did I miss something?

I see now, that the patch is wrong. It should not output recovery = x%
There should be a $2 in the line below:

} elsif (/(recovery|refresh) = (.*?)\s/) {
$recovery{$device} = $2;



 
 I just did have a closer look, what do you think about the following:
 
 --- /tmp/check_linux_raid 2010-03-28 23:19:08.0 +0200
 +++ /tmp/check_linux_raid_new 2010-03-28 23:48:11.0 +0200
 @@ -50,6 +50,7 @@
  my $msg = ;
  my %status;
  my %recovery;
 +my %resync;
  my %finish;
  my %active;
  my %devices;
 @@ -64,6 +65,8 @@
   } elsif (/recovery = (.*?)\s/) {  
   $recovery{$device} = $1;
   ($finish{$device}) = /finish=(.*?min)/;
 + } elsif (/resync=(.*?)\s/) {
 + $resync{$device} = $1;
   } elsif (/^\s*$/) {
   $device=undef;
   }
 @@ -84,6 +87,10 @@
   $msg .= sprintf  %s status=%s, recovery=%s, 
 finish=%s.,
   $devices{$k}, $status{$k}, $recovery{$k}, 
 $finish{$k};
   $code = max_state($code, WARNING);
 + } elsif (defined $resync{$k}) {
 + $msg .= sprintf  %s status=%s,  resync=%s.,
 + $devices{$k}, $status{$k}, $resync{$k};
 + $code = max_state($code, WARNING);
   } else {
   $msg .= sprintf  %s status=%s., $devices{$k}, 
 $status{$k};
   $code = max_state($code, CRITICAL);
 
 Maybe thats what you expecting?

This is also not correct, because the resync line might also have the
%-value and the finish= value.

Christoph
attachment: martin.vcf

signature.asc
Description: OpenPGP digital signature


Bug#575382: [Pkg-nagios-devel] Bug#575382: nagios-plugins-standard: check_linux_raid does not warn if resync is in process

2010-03-28 Thread Jan Wagner
Hi Martin,

On Thursday 25 March 2010, Christoph Martin wrote:
 Thanks for the last patch. There is another one:

 check_linux_raid should also warn if a md device is in resync mode
 and not only if in recovory mode.

 *** /usr/lib/nagios/plugins/check_linux_raid.pl~Fri Mar 19 12:06:24
 2010 --- /usr/lib/nagios/plugins/check_linux_raid.pl Wed Mar 24 00:21:04
 2010 ***
 *** 61,67 
 if (defined $device) {
 if (/(\[[_U]+\])/) {
 $status{$device} = $1;
 !   } elsif (/recovery = (.*?)\s/) {
 $recovery{$device} = $1;
 ($finish{$device}) = /finish=(.*?min)/;
 $device=undef;
 --- 61,67 
 if (defined $device) {
 if (/(\[[_U]+\])/) {
 $status{$device} = $1;
 !   } elsif (/(recovery|resync) = (.*?)\s/) {
 $recovery{$device} = $1;
 ($finish{$device}) = /finish=(.*?min)/;
 $device=undef;

I'm not very familiar with mdadm, but how does you patch help here? Once a 
month /etc/cron.d/mdadm is doing a resync.

# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10] 
md1 : active raid1 sda3[0] sdb3[1]
  133821376 blocks [2/2] [UU]
  [=...]  check = 29.6% (3924/133821376) 
finish=24.1min speed=64932K/sec

unused devices: none

check_linux_raid is reporting OK md1 status=[UU]. with and without of your 
patch.
Under which conditions matches your regex? Do you have an example and when 
will this happen?

Thanks and with kind regards, Jan.
-- 
Never write mail to w...@spamfalle.info, you have been warned!
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GIT d-- s+: a C+++ UL P+ L+++ E--- W+++ N+++ o++ K++ w--- O M V- PS PE Y++
PGP++ t-- 5 X R tv- b+ DI D+ G++ e++ h r+++ y 
--END GEEK CODE BLOCK--



signature.asc
Description: This is a digitally signed message part.


Bug#575382: [Pkg-nagios-devel] Bug#575382: nagios-plugins-standard: check_linux_raid does not warn if resync is in process

2010-03-28 Thread Christoph Martin
Hi Jan,

Jan Wagner schrieb am 28.03.2010 22:19:
 Hi Martin,

s/Martin/Christoph/

 
 On Thursday 25 March 2010, Christoph Martin wrote:
 Thanks for the last patch. There is another one:

 check_linux_raid should also warn if a md device is in resync mode
 and not only if in recovory mode.

 *** /usr/lib/nagios/plugins/check_linux_raid.pl~Fri Mar 19 12:06:24
 2010 --- /usr/lib/nagios/plugins/check_linux_raid.pl Wed Mar 24 00:21:04
 2010 ***
 *** 61,67 
 if (defined $device) {
 if (/(\[[_U]+\])/) {
 $status{$device} = $1;
 !   } elsif (/recovery = (.*?)\s/) {
 $recovery{$device} = $1;
 ($finish{$device}) = /finish=(.*?min)/;
 $device=undef;
 --- 61,67 
 if (defined $device) {
 if (/(\[[_U]+\])/) {
 $status{$device} = $1;
 !   } elsif (/(recovery|resync) = (.*?)\s/) {
 $recovery{$device} = $1;
 ($finish{$device}) = /finish=(.*?min)/;
 $device=undef;
 
 I'm not very familiar with mdadm, but how does you patch help here? Once a 
 month /etc/cron.d/mdadm is doing a resync.
 
 # cat /proc/mdstat 
 Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
 [raid10] 
 md1 : active raid1 sda3[0] sdb3[1]
   133821376 blocks [2/2] [UU]
   [=...]  check = 29.6% (3924/133821376) 
 finish=24.1min speed=64932K/sec
 
 unused devices: none
 
 check_linux_raid is reporting OK md1 status=[UU]. with and without of your 
 patch.
 Under which conditions matches your regex? Do you have an example and when 
 will this happen?

If with raid6 a disk fails, a automatic recovery is done with one
missing disk. With raid6 two disk may fail and the raid can be
recovered. If you add a good disk, after the recovery a resync is done
with state UUU_. Only after this resync it will get into state 
again. This resync should generate a warning.

Christoph
attachment: martin.vcf

signature.asc
Description: OpenPGP digital signature


Bug#575382: [Pkg-nagios-devel] Bug#575382: nagios-plugins-standard: check_linux_raid does not warn if resync is in process

2010-03-28 Thread Jan Wagner
Hi Christoph,

On Sunday 28 March 2010, Christoph Martin wrote:
 Jan Wagner schrieb am 28.03.2010 22:19:
  Hi Martin,

 s/Martin/Christoph/

sorry. :/

 If with raid6 a disk fails, a automatic recovery is done with one
 missing disk. With raid6 two disk may fail and the raid can be
 recovered. If you add a good disk, after the recovery a resync is done
 with state UUU_. Only after this resync it will get into state 
 again. This resync should generate a warning.

Okay ... found an example, hopefully equivalent to your problem:

bb:~# cat /tmp/mdstat
Personalities : [raid1] [raid10]
md2 : active raid1 sdd2[4] sdc2[1] sdb2[2] sda2[0]
  16386176 blocks [4/3] [UUU_]
resync=DELAYED

md1 : active raid1 sdd3[4] sdc3[1] sdb3[2] sda3[0]
  6144768 blocks [4/3] [UUU_]
resync=DELAYED

md3 : active raid10 sdd4[4](F) sdc4[1] sdb4[5](F) sda4[0]
  1887974656 blocks 64K chunks 2 near-copies [4/2] [UU__]
  [==..]  recovery = 50.3% (474987648/943987328) 
finish=4363639.0min speed=1K/sec

md0 : active raid1 sdd1[4](S) sdc1[5] sdb1[2] sda1[0]
  10241280 blocks [4/2] [U_U_]
resync=DELAYED

unused devices: none

bb:~# /tmp/check_linux_raid md3
WARNING md3 status=[UU__], recovery=50.3%, finish=4363639.0min.
bb:~# /tmp/check_linux_raid_new md3
WARNING md3 status=[UU__], recovery=recovery, finish=4363639.0min.
bb:~# diff -Nur /tmp/check_linux_raid /tmp/check_linux_raid_new
--- /tmp/check_linux_raid   2010-03-28 23:19:08.0 +0200
+++ /tmp/check_linux_raid_new   2010-03-28 21:33:44.0 +0200
@@ -61,7 +61,7 @@
if (defined $device) {
if (/(\[[_U]+\])/) {
$status{$device} = $1;
-   } elsif (/recovery = (.*?)\s/) {  
+   } elsif (/(recovery|resync) = (.*?)\s/) {  
$recovery{$device} = $1;
($finish{$device}) = /finish=(.*?min)/;
} elsif (/^\s*$/) {

I don't see, how that should help. :) Did I miss something?

I just did have a closer look, what do you think about the following:

--- /tmp/check_linux_raid   2010-03-28 23:19:08.0 +0200
+++ /tmp/check_linux_raid_new   2010-03-28 23:48:11.0 +0200
@@ -50,6 +50,7 @@
 my $msg = ;
 my %status;
 my %recovery;
+my %resync;
 my %finish;
 my %active;
 my %devices;
@@ -64,6 +65,8 @@
} elsif (/recovery = (.*?)\s/) {  
$recovery{$device} = $1;
($finish{$device}) = /finish=(.*?min)/;
+   } elsif (/resync=(.*?)\s/) {
+   $resync{$device} = $1;
} elsif (/^\s*$/) {
$device=undef;
}
@@ -84,6 +87,10 @@
$msg .= sprintf  %s status=%s, recovery=%s, 
finish=%s.,
$devices{$k}, $status{$k}, $recovery{$k}, 
$finish{$k};
$code = max_state($code, WARNING);
+   } elsif (defined $resync{$k}) {
+   $msg .= sprintf  %s status=%s,  resync=%s.,
+   $devices{$k}, $status{$k}, $resync{$k};
+   $code = max_state($code, WARNING);
} else {
$msg .= sprintf  %s status=%s., $devices{$k}, 
$status{$k};
$code = max_state($code, CRITICAL);

Maybe thats what you expecting?

With kind regards, Jan.
-- 
Never write mail to w...@spamfalle.info, you have been warned!
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GIT d-- s+: a C+++ UL P+ L+++ E--- W+++ N+++ o++ K++ w--- O M V- PS PE Y++
PGP++ t-- 5 X R tv- b+ DI D+ G++ e++ h r+++ y 
--END GEEK CODE BLOCK--



signature.asc
Description: This is a digitally signed message part.


Bug#575382: nagios-plugins-standard: check_linux_raid does not warn if resync is in process

2010-03-25 Thread Christoph Martin
Package: nagios-plugins-standard
Version: 1.4.12-5
Severity: normal


Thanks for the last patch. There is another one:

check_linux_raid should also warn if a md device is in resync mode
and not only if in recovory mode.

*** /usr/lib/nagios/plugins/check_linux_raid.pl~Fri Mar 19 12:06:24 2010
--- /usr/lib/nagios/plugins/check_linux_raid.pl Wed Mar 24 00:21:04 2010
***
*** 61,67 
if (defined $device) {
if (/(\[[_U]+\])/) {
$status{$device} = $1;
!   } elsif (/recovery = (.*?)\s/) {  
$recovery{$device} = $1;
($finish{$device}) = /finish=(.*?min)/;
$device=undef;
--- 61,67 
if (defined $device) {
if (/(\[[_U]+\])/) {
$status{$device} = $1;
!   } elsif (/(recovery|resync) = (.*?)\s/) {  
$recovery{$device} = $1;
($finish{$device}) = /finish=(.*?min)/;
$device=undef;


Christoph

-- System Information:
Debian Release: 5.0.4
  APT prefers stable
  APT policy: (900, 'stable'), (90, 'oldstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-2-686-bigmem (SMP w/2 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages nagios-plugins-standard depends on:
ii  dnsutils1:9.5.1.dfsg.P3-1+lenny1 Clients provided with BIND
ii  fping   2.4b2-to-ipv6-15 sends ICMP ECHO_REQUEST packets to
ii  host2331-9   utility for querying DNS servers
ii  libc6   2.7-18lenny2 GNU C Library: Shared libraries
ii  libldap-2.4-2   2.4.11-1+lenny1  OpenLDAP libraries
ii  libmysqlclient1 5.0.51a-24+lenny3MySQL database client library
ii  libnet-snmp-per 5.2.0-1  Script SNMP connections
ii  libpq5  8.3.9-0lenny1PostgreSQL C client library
ii  libradiusclient 0.5.5-1  Enhanced RADIUS client library
ii  nagios-plugins- 1.4.12-5 Plugins for the nagios network mon
ii  qstat   2.11-1   Command-line tool for querying qua
ii  radiusclient1   0.3.2-11.1   /bin/login replacement which uses 
ii  smbclient   2:3.2.5-4lenny9  a LanManager-like simple client fo
ii  snmp5.4.1~dfsg-12SNMP (Simple Network Management Pr
ii  ucf 3.0016   Update Configuration File: preserv

nagios-plugins-standard recommends no packages.

Versions of packages nagios-plugins-standard suggests:
pn  nagios3   none (no description available)
pn  whois none (no description available)

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org