On 2016-10-17 12:44, Stefan Malte Schumacher wrote:
Hello
I would like to monitor my btrfs-filesystem for missing drives. On
Debian mdadm uses a script in /etc/cron.daily, which calls mdadm and
sends an email if anything is wrong with the array. I would like to do
the same with btrfs. In my first attempt I grepped and cut the
information from "btrfs fi show" and let the script send an email if
the number of devices was not equal to the preselected number.
Then I saw this:
ubuntu@ubuntu:~$ sudo btrfs filesystem show
Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 6 FS bytes used 5.47TiB
devid 1 size 1.81TiB used 1.71TiB path /dev/sda3
devid 2 size 1.81TiB used 1.71TiB path /dev/sdb3
devid 3 size 1.82TiB used 1.72TiB path /dev/sdc1
devid 4 size 1.82TiB used 1.72TiB path /dev/sdd1
devid 5 size 2.73TiB used 2.62TiB path /dev/sde1
*** Some devices missing
on this page:
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
The number of devices is still at 6, despite the fact that one of the
drives is missing, which means that my first idea doesnt work.
This is actually correct behavior, the filesystem reports that it should
have 6 devices, which is how it knows a device is missing.
I have
two questions:
1) Has anybody already written a script like this? After all, there is
no need to reinvent the wheel a second time.
Not that I know of, but I may be wrong.
2) What should I best grep for? In this case I would just go for the
"missing". Does this cover all possible outputs of btrfs fi show in
case of a damaged array? What other outputs do I need to consider for
my script.
That should catch any case of a failed device. It will not catch things
like devices being out of sync or at-rest data corruption. In general,
you should be running scrub regularly to check for those conditions (and
fix them if they have happened).
FWIW, what I watch is the filesystem flags (it will go read-only if it
becomes degraded), and the filesystem size (which will change in most
cases as disks are added or removed). I also have regular SMART status
checks on the disks themselves too though, so even aside from BTRFS,
I'll know if a disk has failed (or thinks it's failed) pretty quickly.
Also, you may want to look into something like Monit
(https://mmonit.com/monit/) to handle the monitoring. It lets you
define all your monitoring requirements in a single file (or multiple if
you prefer), and provides the infrastructure to handle e-mail
notifications (including the ability to cache messages when the upstream
mail server is down).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html