Package: smartmontools
Version: 6.5+svn4324-1
Severity: wishlist
Tags: upstream

Dear Maintainer,

Using Areca official proprietary `cli64` [0] utility to check
raidset/volume status (used by  Nagios/Icinga monitoring tools) produces
"conflict" with `smartctl`. Munin and smartd monitoring fails at rare,
but still annoyingly enough, cases, when `cli64` utility has opened `/dev/sg2`
device with exclusive lock at the same time.

This is example email sent by `smartd`:

```
This message was generated by the smartd daemon running on:

   host name:  hostname
      DNS domain: my.tld

      The following warning/error was logged by the smartd daemon:

      Device: /dev/sg2 [areca_disk#01_enc#01], unable to open device

      Device info:
      WDC WD2005FBYZ-01YCBB2, S/N:WD-WMC...,
      WWN:5-0014ee-059..., FW:RR07, 2.00 TB

      For details see host's SYSLOG.

      You can also use the smartctl utility for further investigation.
      Another message will be sent in 24 hours if the problem persists.
```
This is example of running `cli64` and `smartctl` at the same time:

```
# cli64 rsf info &
[1] 3934
# smartctl -d areca,3 /dev/sg2 -x
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.18.0-0.bpo.1-amd64] (local
build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke,
www.smartmontools.org

Unable to open /proc/scsi/sg/device_hdr for reading
do_scsi_cmnd_io with write buffer failed code = ffffffff
Unable to open /proc/scsi/sg/device_hdr for reading
do_scsi_cmnd_io with write buffer failed code = ffffffff
Unable to open /proc/scsi/sg/device_hdr for reading
do_scsi_cmnd_io with write buffer failed code = ffffffff
Smartctl open device: /dev/sg2 [areca_disk#03_enc#01] failed:
Input/output error
```
`strace` shows that `cli64` uses `O_EXCL` mode:

```
# strace -t -f -e open cli64 rsf info
...
11:07:40 
open("/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:0:16/0:0:16:0/scsi_generic/sg2/dev",
O_RDONLY) = 3
11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
 #  Name             Disks TotalCap  FreeCap DiskChannels       State
 ===============================================================================
 11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
 11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
  1  Gold                 4 8000.0GB    0.0GB 139B               Normal
  
===============================================================================
  GuiErrMsg<0x00>: Success.
  ```

So in the end, this produces syslog, email spam and some lost columns in
Munin graphs. It also produces issues when you use configuration
management system that tries to detect available drives, "hidden" under
hw raid adapter, via `smartctl` calls, for rendering Munin/smartd configuration 
- results
are flapping due to random lock conflicts.

It would be "fixable" by retrying at least 2-3 times (probably with a
delay of a second or so) in case of device inspected by smartctl is
locked at the moment. Or maybe waiting on device, or other similar
implementation.

Although it is possible to workaround this issue in your own Salt
modules by retrying `smartctl` call multiple times, but `smartd` would still 
have this issue.

[0] http://www.areca.us/support/s_linux/driver/cli/linuxcli_V1.15.8_180529.zip


-- System Information:
Debian Release: 9.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.18.0-0.bpo.1-amd64 (SMP w/6 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL 
set to en_US.UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set 
to en_US.UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages smartmontools depends on:
ii  debianutils          4.8.1.1
ii  init-system-helpers  1.48
ii  libc6                2.24-11+deb9u3
ii  libcap-ng0           0.7.7-3+b1
ii  libgcc1              1:6.3.0-18+deb9u1
ii  libselinux1          2.6-3+b3
ii  libstdc++6           6.3.0-18+deb9u1
ii  lsb-base             9.20161125

Versions of packages smartmontools recommends:
ii  bsd-mailx [mailx]  8.1.2-0.20160123cvs-4

Versions of packages smartmontools suggests:
pn  gsmartcontrol   <none>
pn  smart-notifier  <none>

-- Configuration Files:
/etc/smartd.conf changed:
/dev/sg2 -d areca,1 -H -l selftest -l error -f -m root -M exec 
/usr/share/smartmontools/smartd-runner
/dev/sg2 -d areca,3 -H -l selftest -l error -f -m root -M exec 
/usr/share/smartmontools/smartd-runner
/dev/sg2 -d areca,9 -H -l selftest -l error -f -m root -M exec 
/usr/share/smartmontools/smartd-runner
/dev/sg2 -d areca,11 -H -l selftest -l error -f -m root -M exec 
/usr/share/smartmontools/smartd-runner
DEVICESCAN -d removable -n standby -m root -M exec 
/usr/share/smartmontools/smartd-runner


-- no debconf information

Reply via email to