Package: fam
Version: 2.7.0-17.1
Followup-For: Bug #483811

I eventually managed to compile a non-stripped version of famd.
In order to run it, I modified /etc/init.d/fam, which reportbug
detected and automatically copied below (search "-ale:").  I'll
attach /root/famd-wrapper in a moment.  It's a useful workaround
anyway.

Now what I found in the dumped core file:

Reading symbols from /usr/sbin/famd...done.
[New LWP 27816]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/famd -v -f -T 0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000000004128e7 in TCP_Client::unblock_handler (closure=0x1a9f4c0) at 
TCP_Client.c++:270
#2  0x00000000004103cc in Scheduler::handle_io (fds=0x17438c0, 
fds@entry=0x7ffd982ee5b0, iotype=&Scheduler::FDInfo::read, 
    iotype@entry=&Scheduler::FDInfo::write) at Scheduler.c++:315
#3  0x0000000000410601 in Scheduler::select () at Scheduler.c++:342
#4  0x0000000000402dc5 in loop () at Scheduler.h:89
#5  main (argc=<optimized out>, argv=0x7ffd982ee828) at main.c++:306

Scheduler::handle_io called the handler, presumably 
NetConnection::write_handler()
which called NetConnection::flush() which called NetConnection::set_handlers().
All of those function return void and call the next function right before 
returning,
so their stack frames are optimized out.

The relevant code in frame #1 is:

    Interest *ip;
    while (client->ready_for_events() && (ip = client->to_be_scanned.first()))
    {   client->to_be_scanned.remove(ip);
        ip->scan();
    }

The culprit value of ip:
(gdb) info vtbl ip
vtable for 'Interest' @ 0x1743530 (subobject @ 0x17438c0):
[0]: 0x20
[1]: 0x311
[2]: 0x17431b0
[3]: 0x17438b0
[4]: 0x0
[5]: 0x0
[6]: 0x1743470
[7]: 0x1743396
[8]: 0x0
[9]: 0x0

ip->scan() corresponds to entry [4], callq  *0x20(%rdx), hence segv.

I checked the value of ip is correct.  Although it was just removed
from the set, the value is consistent with the registers.  It is not
an Interest*.  In addition, I looked at the 6 key entries in the root
node of the set, defined as Set<Interest *> to_be_scanned; I found:

(gdb) set $i = 0
(gdb) while ($i < 6)
 >printf "%s\n", 
 >typeid(*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[$i]).__name
 >set $i = $i + 1
 >end

8DirEntry
`
8Interest
8DirEntry
8DirEntry
8DirEntry

So there is a pure abstract element and an unknown pointer.
The diagnosis is garbage in the set.  Now this could be a
flaw in BTree.h or something strange in ClientInterest::scan(),
which seems to be the only place where new entries are added
to the set.  Ideas?

The key[0] entry is actually a mail server temporary file:
(gdb) p (*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[0]).myname
$1 = 0x18479b0 ".5058547.1517362926.M462851P21244V", '0' <repeats 12 times>, 
"6811I0000000000566211_0.north,S=4981"
(gdb) p 
(*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[0]).old_stat.st_mtim
$2 = {tv_sec = 1517563957, tv_nsec = 0}

The tv_sec matches the creation of the core dump, four days ago.

Note that it took weeks to get a core dump.  Now I wait for the next.

Ale

-- System Information:
Debian Release: 8.10
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: sysvinit (via /sbin/init)

Versions of packages fam depends on:
ii  libc6              2.19-18+deb8u10
ii  libgcc1            1:4.9.2-10
ii  libstdc++6         4.9.2-10
ii  lsb-base           4.1+Debian13+nmu1
ii  rpcbind [portmap]  0.2.1-6+deb8u2
ii  update-inetd       4.43

fam recommends no packages.

fam suggests no packages.

-- Configuration Files:
/etc/init.d/fam changed:
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON="/usr/sbin/famd"
NAME="FAM"
DESC="file alteration monitor"
FAMOPTS="-T 0"
test -x $DAEMON || exit 0
egrep -qs "^(sgi_fam|391002)" /etc/inetd.conf
. /lib/lsb/init-functions
set -e
case "$1" in
  start)
        status_of_proc $DAEMON $NAME > /dev/null && exit 0
        log_daemon_msg "Starting $DESC" "$NAME"
        # -ale: was:start-stop-daemon --start --quiet --exec $DAEMON -- 
$FAMOPTS < /dev/null
        (setsid /root/famd-wrapper& )&
        log_end_msg $?
        ;;
  stop)
        log_daemon_msg "Stopping $DESC" "$NAME"
        start-stop-daemon --stop --oknodo --quiet --exec $DAEMON
        log_end_msg $?
        ;;
  restart|force-reload)
        $0 stop
        sleep 1
        $0 start
        ;;
  status)
        status_of_proc $DAEMON $NAME
        ;;
  *)
        echo "Usage: $0 {start|stop|restart|force-reload|status}" >&2
        exit 1
        ;;
esac
exit 0


-- no debconf information
#! /bin/sh
# called from /etc/init.d/fam, where I replaced the start-stop-daemon
# --start line with `(setsid /root/famd-wrapper& )&`

set -e
out=$(tempfile -d /var/tmp -p FAM- -s .out)
err=$(tempfile -d /var/tmp -p FAM- -s .err)

set +e
exec 2> /dev/null

ulimit -c unlimited

while true; do
	# using -v; -d produces really big files
	/usr/sbin/famd -v -f -T 0 >> $out 2>> $err 
	ret=$?
	if [ $ret -eq 143 ]; then
		rm -f $out $err
		break
	else
		printf '%s\n%s: famd exited with code %d\n\n\n' \
			'__________________________________________' \
			"$(date --rfc-3339=seconds)" "$ret" | tee -a $out >> $err
		logger -p daemon.crit "famd exited with code $ret"
	fi
done


Reply via email to