Source: diamond Version: 4.0.515-3 Severity: serious In my setup, diamond takes a long time to stop after getting a SIGTERM from systemd, thus slowing down the overall shutdown of the system by about a minute (and also making it slow to perform restarts). When it finally does stop, an error message about QueueHandler is emitted. The issue is reproducible without systemd too -- run diamond in one terminal with --log-stdout --foreground and then run kill from another terminal (Ctrl+C i.e. SIGINT won't reproduce it). This happens even with a relatively minimal config, e.g. without any Collectors defined and only with a single GraphiteHandler.
What I see here is: root@d-i-test:/etc/diamond# diamond --log-stdout --foreground 1486775151.66 [MainProcess:9336:INFO] Changed UID: 0 () GID: 0 (). 1486775152.06 [MainProcess:9336:DEBUG] metric_queue_size: 16384 1486775152.06 [MainProcess:9336:DEBUG] Loading Handler diamond.handler.graphite.GraphiteHandler 1486775152.1 [MainProcess:9336:DEBUG] GraphiteHandler: Established connection to graphite server 10.192.16.33:2003. 1486775152.11 [Handlers:9347:DEBUG] Starting process Handlers 1486775153.12 [TCPCollector:9350:DEBUG] Starting 1486775153.12 [TCPCollector:9350:DEBUG] Interval: 60.0 seconds 1486775153.13 [TCPCollector:9350:DEBUG] Max collection time: 28 seconds 1486775153.16 [TCPCollector:9350:DEBUG] Collection took 32 ms 1486775154.14 [NetworkCollector:9353:DEBUG] Starting 1486775154.14 [NetworkCollector:9353:DEBUG] Interval: 60.0 seconds 1486775154.14 [NetworkCollector:9353:DEBUG] Max collection time: 19 seconds 1486775154.16 [NetworkCollector:9353:DEBUG] Collection took 16 ms 1486775155.15 [DiskUsageCollector:9356:DEBUG] Starting 1486775155.15 [DiskUsageCollector:9356:DEBUG] Interval: 60.0 seconds 1486775155.15 [DiskUsageCollector:9356:DEBUG] Max collection time: 48 seconds 1486775155.15 [DiskUsageCollector:9356:DEBUG] Collection took 1 ms 1486775156.16 [CPUCollector:9359:DEBUG] Starting 1486775156.16 [CPUCollector:9359:DEBUG] Interval: 60.0 seconds 1486775156.16 [CPUCollector:9359:DEBUG] Max collection time: 43 seconds 1486775156.17 [CPUCollector:9359:DEBUG] Collection took 7 ms <send kill from another terminal> 1486775186.96 [MainProcess:9336:INFO] Signal Received: 15 1486775186.96 [NetworkCollector:9353:INFO] Signal Received: 15 1486775186.96 [Handlers:9347:INFO] Signal Received: 15 1486775186.96 [DiskUsageCollector:9356:INFO] Signal Received: 15 1486775186.96 [TCPCollector:9350:INFO] Signal Received: 15 1486775186.96 [CPUCollector:9359:INFO] Signal Received: 15 1486775186.97 [NetworkCollector:9353:INFO] Signal Received: 15 1486775186.97 [CPUCollector:9359:INFO] Signal Received: 15 <long wait> Exception socket.error: error(111, 'Connection refused') in <bound method QueueHandler.__del__ of <diamond.handler.queue.QueueHandler object at 0x7f8fa53bff10>> ignored While this long wait is happening, strace shows diamond is busylooping running this over and over (apparently python multiprocessing related): connect(4, {sa_family=AF_UNIX, sun_path="/tmp/pymp-xq1DKs/listener-8_h02F"}, 34) = -1 ECONNREFUSED (Connection refused) close(4) = 0 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000}) = 0 (Timeout) socket(AF_UNIX, SOCK_STREAM, 0) = 4 fcntl(4, F_GETFL) = 0x2 (flags O_RDWR) fcntl(4, F_SETFL, O_RDWR) = 0 Let me know if I can help in any way. -- System Information: Debian Release: 9.0 APT prefers testing APT policy: (500, 'testing'), (500, 'stable'), (50, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-1-amd64 (SMP w/4 CPU cores) Locale: LANG=el_GR.UTF-8, LC_CTYPE=el_GR.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system)