Package: fam Version: 2.7.0-17.1 Followup-For: Bug #483811 I eventually managed to compile a non-stripped version of famd. In order to run it, I modified /etc/init.d/fam, which reportbug detected and automatically copied below (search "-ale:"). I'll attach /root/famd-wrapper in a moment. It's a useful workaround anyway.
Now what I found in the dumped core file: Reading symbols from /usr/sbin/famd...done. [New LWP 27816] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/famd -v -f -T 0'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000004128e7 in TCP_Client::unblock_handler (closure=0x1a9f4c0) at TCP_Client.c++:270 #2 0x00000000004103cc in Scheduler::handle_io (fds=0x17438c0, fds@entry=0x7ffd982ee5b0, iotype=&Scheduler::FDInfo::read, iotype@entry=&Scheduler::FDInfo::write) at Scheduler.c++:315 #3 0x0000000000410601 in Scheduler::select () at Scheduler.c++:342 #4 0x0000000000402dc5 in loop () at Scheduler.h:89 #5 main (argc=<optimized out>, argv=0x7ffd982ee828) at main.c++:306 Scheduler::handle_io called the handler, presumably NetConnection::write_handler() which called NetConnection::flush() which called NetConnection::set_handlers(). All of those function return void and call the next function right before returning, so their stack frames are optimized out. The relevant code in frame #1 is: Interest *ip; while (client->ready_for_events() && (ip = client->to_be_scanned.first())) { client->to_be_scanned.remove(ip); ip->scan(); } The culprit value of ip: (gdb) info vtbl ip vtable for 'Interest' @ 0x1743530 (subobject @ 0x17438c0): [0]: 0x20 [1]: 0x311 [2]: 0x17431b0 [3]: 0x17438b0 [4]: 0x0 [5]: 0x0 [6]: 0x1743470 [7]: 0x1743396 [8]: 0x0 [9]: 0x0 ip->scan() corresponds to entry [4], callq *0x20(%rdx), hence segv. I checked the value of ip is correct. Although it was just removed from the set, the value is consistent with the registers. It is not an Interest*. In addition, I looked at the 6 key entries in the root node of the set, defined as Set<Interest *> to_be_scanned; I found: (gdb) set $i = 0 (gdb) while ($i < 6) >printf "%s\n", >typeid(*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[$i]).__name >set $i = $i + 1 >end 8DirEntry ` 8Interest 8DirEntry 8DirEntry 8DirEntry So there is a pure abstract element and an unknown pointer. The diagnosis is garbage in the set. Now this could be a flaw in BTree.h or something strange in ClientInterest::scan(), which seems to be the only place where new entries are added to the set. Ideas? The key[0] entry is actually a mail server temporary file: (gdb) p (*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[0]).myname $1 = 0x18479b0 ".5058547.1517362926.M462851P21244V", '0' <repeats 12 times>, "6811I0000000000566211_0.north,S=4981" (gdb) p (*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[0]).old_stat.st_mtim $2 = {tv_sec = 1517563957, tv_nsec = 0} The tv_sec matches the creation of the core dump, four days ago. Note that it took weeks to get a core dump. Now I wait for the next. Ale -- System Information: Debian Release: 8.10 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Init: sysvinit (via /sbin/init) Versions of packages fam depends on: ii libc6 2.19-18+deb8u10 ii libgcc1 1:4.9.2-10 ii libstdc++6 4.9.2-10 ii lsb-base 4.1+Debian13+nmu1 ii rpcbind [portmap] 0.2.1-6+deb8u2 ii update-inetd 4.43 fam recommends no packages. fam suggests no packages. -- Configuration Files: /etc/init.d/fam changed: PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin DAEMON="/usr/sbin/famd" NAME="FAM" DESC="file alteration monitor" FAMOPTS="-T 0" test -x $DAEMON || exit 0 egrep -qs "^(sgi_fam|391002)" /etc/inetd.conf . /lib/lsb/init-functions set -e case "$1" in start) status_of_proc $DAEMON $NAME > /dev/null && exit 0 log_daemon_msg "Starting $DESC" "$NAME" # -ale: was:start-stop-daemon --start --quiet --exec $DAEMON -- $FAMOPTS < /dev/null (setsid /root/famd-wrapper& )& log_end_msg $? ;; stop) log_daemon_msg "Stopping $DESC" "$NAME" start-stop-daemon --stop --oknodo --quiet --exec $DAEMON log_end_msg $? ;; restart|force-reload) $0 stop sleep 1 $0 start ;; status) status_of_proc $DAEMON $NAME ;; *) echo "Usage: $0 {start|stop|restart|force-reload|status}" >&2 exit 1 ;; esac exit 0 -- no debconf information
#! /bin/sh # called from /etc/init.d/fam, where I replaced the start-stop-daemon # --start line with `(setsid /root/famd-wrapper& )&` set -e out=$(tempfile -d /var/tmp -p FAM- -s .out) err=$(tempfile -d /var/tmp -p FAM- -s .err) set +e exec 2> /dev/null ulimit -c unlimited while true; do # using -v; -d produces really big files /usr/sbin/famd -v -f -T 0 >> $out 2>> $err ret=$? if [ $ret -eq 143 ]; then rm -f $out $err break else printf '%s\n%s: famd exited with code %d\n\n\n' \ '__________________________________________' \ "$(date --rfc-3339=seconds)" "$ret" | tee -a $out >> $err logger -p daemon.crit "famd exited with code $ret" fi done