Bug#483811: [fam] fam segfaults

2018-09-10 Thread Alessandro Vesely
I experienced more core dumps, but haven't had time to examine them.
In particular, at a time I had three core dumps in a row on Jul 14:

-rw--- 1 ale staff  6180864 Jul 14 17:10 
core_epoch=1531581032_pid=5409_file=!usr!sbin!famd
-rw--- 1 ale staff  9302016 Jul 14 17:09 
core_epoch=1531580943_pid=5317_file=!usr!sbin!famd
-rw--- 1 ale staff 26234880 Jul 14 17:05 
core_epoch=1531580714_pid=26593_file=!usr!sbin!famd

The first of these only run for 1 minute.  The "closure" in
TCP_Client::unblock_handler (closure=0x295ca20) at TCP_Client.c++:270
is always a LocalClient with a bad to_be_scanned element.  In the first (short) 
dump, it had:

to_be_scanned = {
   , std::allocator >> = 
std::set with 251 elements = {

   Elements 0 to 102 were bad memory pointers.
   Elements 103 to 250 were entries of type DirEntry with a common parent 
corresponding to
  a courierimapkeywords directory of a Courier-MTA mail folder.  The parent 
had 1170
  entries, and this batch matched entries 12 to 159 in the parent.  Since 
those file names
  contain Unix epoch, I could check that most --not all-- of them were in 
ascending order.
  One file was listed twice, and was out of order.
  Dates ranged from June to a few hours before crash.

}, }

Courier at times removes a bunch of those files and stores the corresponding 
IMAP keywords
in a single ":list" file.  I have to study the code better to understand how 
parent and
clients can be notified of Interest* deletion.  Maybe next month...



Bug#483811: [fam] fam segfaults

2018-06-16 Thread Alessandro Vesely
On sat 26 may 2018 it segfaulted again.  Identical pattern:
(gdb) bt
#0  0x6811 in ?? ()
#1  0x00412b1c in TCP_Client::unblock_handler (closure=0xdb1870) at 
TCP_Client.c++:270
#2  0x004105dc in Scheduler::handle_io (fds=0xaf3eb0, 
fds@entry=0x7ffc71013620, 
iotype=::FDInfo::read, iotype@entry=::FDInfo::write) at 
Scheduler.c++:315
#3  0x00410811 in Scheduler::select () at Scheduler.c++:342
#4  0x00402fa5 in loop () at Scheduler.h:89
#5  main (argc=, argv=0x7ffc71013898) at main.c++:306

Clearly, the set contains a pointer to an object of type Interest which was 
deleted.

Since ClientInterest issues a dequeue_from_scan when it is deleted, the culprit 
must be a DirEntry.  In fact, a call to its scan() function would not be 
detectable from the core dump, because the code is tail-optimized:

bool
DirEntry::scan(Interest *ip)
{
assert(!ip);
return parent->scan(this);
}

(gdb) info address DirEntry::scan
Symbol "DirEntry::scan(Interest*)" is a function at address 0x404750.
(gdb) disass 0x404750
Dump of assembler code for function DirEntry::scan(Interest*):
   0x00404750 <+0>: test   %rsi,%rsi
   0x00404753 <+3>: jne0x40476b 

   0x00404755 <+5>: mov%rdi,%rax
   0x00404758 <+8>: mov0xc8(%rdi),%rdi
   0x0040475f <+15>:mov%rax,%rsi
   0x00404762 <+18>:mov(%rdi),%rdx
   0x00404765 <+21>:mov0x20(%rdx),%rdx
   0x00404769 <+25>:jmpq   *%rdx <--- this jumps to a deleted 
ClientInterest

   0x0040476b <+27>:push   %rax
   0x0040476c <+28>:mov$0x414c80,%ecx
   0x00404771 <+33>:mov$0x3b,%edx
   0x00404776 <+38>:mov$0x414b8e,%esi
   0x0040477b <+43>:mov$0x414b9b,%edi
   0x00404780 <+48>:callq  0x4024b0 <__assert_fail@plt>
End of assembler dump.

Now I modified ClientInterest.h and Directory.c++, and further modified 
ClientInterest.c++.  I have:

--- orig/src/ClientInterest.h   2018-02-06 18:05:59.0 +0100
+++ ./src/ClientInterest.h  2018-06-16 10:25:18.0 +0200
@@ -73,6 +73,8 @@
 
 virtual FileSystem * get_filesystem()  { return myfilesystem; }
 
+void dequeue_from_scan(Interest *ip);
+
 private:
 
 enum { ACTIVE_STATE = 1 << 0 };
--- orig/src/ClientInterest.c++ 2003-01-18 15:18:12.0 +0100
+++ ./src/ClientInterest.c++2018-06-16 10:29:08.0 +0200
@@ -62,6 +62,8 @@
 ClientInterest::~ClientInterest()
 {
 myfilesystem->cancel(this, fs_request);
+if (myclient)
+myclient->dequeue_from_scan(this);
 }
 
 void
@@ -162,6 +164,13 @@
myclient->dequeue_from_scan(ip);
 }
 
+void
+ClientInterest::dequeue_from_scan(Interest *ip)
+{
+if (myclient)
+myclient->dequeue_from_scan(ip);
+}
+
 bool
 ClientInterest::do_scan()
 {
--- orig/src/Directory.c++  2003-01-18 15:18:12.0 +0100
+++ ./src/Directory.c++ 2018-06-16 10:17:32.0 +0200
@@ -77,6 +77,7 @@
 {   (void) chdir();
while (p)
{   q = p->next;
+   dequeue_from_scan(p);
delete p;
p = q;
}



Waiting for next core dump...



Bug#483811: [fam] fam segfaults

2018-04-13 Thread Alessandro Vesely
On Fri, 30 Mar 2018 12:47:15 +0200 I wrote:
>  That suggests that Set.h doesn't work as expected.

No, that wasn't the case.  A couple of days ago I got a core dump with exactly
the same back trace (bt), even if using std::set.

Core was generated by `/usr/sbin/famd -v -f -T 0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x in ?? ()
(gdb) bt
#0  0x in ?? ()
#1  0x00412aec in TCP_Client::unblock_handler (closure=0x1935d20) at 
TCP_Client.c++:270
#2  0x004105ac in Scheduler::handle_io (fds=0x1c028f0, 
fds@entry=0x7ffd7bd92ef0, iotype=::FDInfo::read, 
iotype@entry=::FDInfo::write) at Scheduler.c++:315
#3  0x004107e1 in Scheduler::select () at Scheduler.c++:342
#4  0x00402fa5 in loop () at Scheduler.h:89
#5  main (argc=, argv=0x7ffd7bd93168) at main.c++:306


Need to look more closely at ip.

(gdb) hd
Undefined command: "hd".  Try "help".
(gdb) define hd
Type commands for definition of "hd".
End with a line saying just "end".
>dump binary memory dump.tmp $arg0 $arg0+$arg1
>shell hd dump.tmp
>end

(gdb) p sizeof(Interest)
$11 = 200

(gdb) p sizeof(File)
$43 = 240

(gdb) hd (char*)ip 240
  30 c1 73 02 00 00 00 00  50 56 02 02 00 00 00 00  |0.s.PV..|
0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
0020  e0 29 c0 01 00 00 00 00  36 30 30 30 36 38 31 31  |.)..60006811|
0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
00c0  7f 00 00 01 01 00 00 00  f0 44 91 01 00 00 00 00  |.D..|
00d0  10 2b c0 01 00 00 00 00  01 00 00 00 00 00 00 00  |.+..|
00e0  f0 00 00 00 00 00 00 00  60 00 00 00 00 00 00 00  |`...|

We have the following layout  (from Interest.h, ClientInterest.h and `p 
&((Interest*)0).'):

offset  Instance Variables  content found

_vtbl   bad (virtual functions)
0008Interest *hashlink; ??
0010dev_t dev;  0
0018ino_t ino;  0
0020char *const myname; mixed buffer (see MYNAME below)
0028ScanState scan_state: 1;"6000?6811I?"
ExecState cur_exec_state: 1;
ExecState old_exec_state: 1;
0029char ci_char;
002achar dir_char;
0030struct stat old_stat;
00c0in_addr myhost; 127.0.0.1
00c4bool mypath_exported_to_host;   1 (remaining 3 bytes could have 
been set before)

00c8Client *myclient;   bad, 0x19144f0 (see CLIENT below)
00d0Request request;
00d8const Cred mycred;
00e0FileSystem *myfilesystem;
00e8Request fs_request;

MYNAME:
(gdb) hd 0x1c029e0 80
  00 00 00 00 00 00 00 00  32 39 2e 4d 38 33 31 38  |29.M8318|
0010  34 37 50 32 37 31 36 38  56 30 30 30 30 30 30 30  |47P27168V000|
0020  30 30 30 30 30 36 38 31  31 49 30 30 30 30 30 30  |06811I00|
0030  30 30 30 30 34 45 35 33  37 32 5f 31 2e 6e 6f 72  |4E5372_1.nor|
0040  74 68 2c 53 3d 32 32 35  36 36 35 00 53 00 00 53  |th,S=225665.S..S|
0050


The same is true for other Interest pointers still in to_be_scanned, $tbs0, ...:


(gdb) hd $tbs0 200
  90 d6 73 02 00 00 00 00  c0 2c c0 01 00 00 00 00  |..s..,..|
0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
0020  40 2a c0 01 00 00 00 00  36 30 30 30 36 38 31 31  |@*..60006811|
0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
00c0  7f 00 00 01 01 35 34 33   |.543|
00c8

(gdb) p ((Interest*)$tbs0)->myname
$13 = 0x1c02a40 ""
(gdb) hd 0x1c02a40 80
  00 00 00 00 00 00 00 00  32 39 2e 4d 38 34 33 38  |29.M8438|
0010  38 38 50 32 37 31 37 30  56 30 30 30 30 30 30 30  |88P27170V000|
0020  30 30 30 30 30 36 38 31  31 49 30 30 30 30 30 30  |06811I00|
0030  30 30 30 30 34 45 35 33  38 43 5f 31 2e 6e 6f 72  |4E538C_1.nor|
0040  74 68 2c 53 3d 32 32 35  36 36 35 00 00 00 00 00  |th,S=225665.|
0050  b0 01 00 00 00 00 00 00  30 00 00 00 00 00 00 00  |0...|


(gdb) hd $tbs1 200
  00 2b c0 01 00 00 00 00  80 2e c0 01 00 00 00 00  |.+..|
0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
0020  00 2c c0 01 00 00 00 00  36 30 30 30 36 38 31 31  |.,..60006811|
0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
00c0  7f 00 00 01 01 35 34 33   |.543|
00c8

(gdb) p ((Interest*)$tbs1)->myname
$14 = 0x1c02c00 ""
(gdb) hd 0x1c02c00 80
  00 00 00 00 00 00 00 00  34 34 2e 4d 34 35 37 31  |44.M4571|
0010  36 50 32 37 32 30 35 56  30 30 30 30 30 30 30 30  |6P27205V|
0020  30 30 30 30 36 38 31 31  49 30 30 30 30 30 30 30  |6811I000|
0030  30 

Bug#483811: [fam] fam segfaults

2018-03-30 Thread Alessandro Vesely
Yesterday I got another core dump and, again, it was like the previous ones:

#1  0x00412927 in TCP_Client::unblock_handler (closure=0x1ffa030) at 
TCP_Client.c++:270

I checked the assertions were indeed compiled in, by (gdb) disass 
((TCP_Client*)$closure)->dequeue_from_scan, which ended with:

  0x004129e4 <+84>:callq  0x402310 <__assert_fail@plt>


Yet, those assertions didn't catch anything.  So I must have extracted from 
client->to_be_scanned an ip with vtbl[4] == 0, which was never put into the 
set.  That suggests that Set.h doesn't work as expected.  To check the latter 
hypothesis, I modified TCP_Client.h so as to use a standard set instead:

--- fam2/fam-2.7.0/build-tree/fam-2.7.0/src/TCP_Client.h2003-01-18 
15:18:12.0 +0100
+++ fam/fam-2.7.0/build-tree/fam-2.7.0/src/TCP_Client.h 2018-03-30 
10:53:03.0 +0200
@@ -25,9 +25,28 @@
 
 #include "ClientConnection.h"
 #include "MxClient.h"
-#include "Set.h"
+
+// use std::set instead
+//#include "Set.h"
+#include 
 #include "Cred.h"
 
+class stdset : public std::set
+{
+public:
+// Inherit insert(), size()
+bool contains(const key_type e) {return find(e) != end();}
+void remove(const key_type e) {erase(e);}
+key_type first() const {return begin() == end()? NULL: *begin();}
+key_type next(const key_type k) const {
+   const_iterator it = find(k);
+   if (it != end()) ++it;
+   if (it == end()) return NULL;
+   return *it;
+}
+// sizeofnode() method is not used
+};
+
 struct sockaddr_un;
 
 //  A TCP_Client is a client that connects to fam using the TCP/IP
@@ -60,7 +79,8 @@ protected:
 
 private:
 
-Set to_be_scanned;
+// Set to_be_scanned;
+stdset to_be_scanned;
 Scanner *my_scanner;
 ClientConnection conn;
 Activity a;// simply declaring it 
activates timer.


Waiting for next core dump...



Bug#483811: [fam] fam segfaults

2018-03-16 Thread Alessandro Vesely
Yesterday it happened again, exactly the same pattern:

Reading symbols from ./build-tree/fam-2.7.0/src/famd...done.
[New LWP 18773]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/famd -v -f -T 0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x in ?? ()
(gdb) bt
#0  0x in ?? ()
#1  0x004128e7 in TCP_Client::unblock_handler (closure=0x10982a0) at 
TCP_Client.c++:270
#2  0x004103cc in Scheduler::handle_io (fds=0x1a4dd70, 
fds@entry=0x7ffce30d7a40, 
iotype=::FDInfo::read, iotype@entry=::FDInfo::write) at 
Scheduler.c++:315
#3  0x00410601 in Scheduler::select () at Scheduler.c++:342
#4  0x00402dc5 in loop () at Scheduler.h:89
#5  main (argc=, argv=0x7ffce30d7cb8) at main.c++:306

This confirms the diagnosis.  I put a couple of assertions like so:

--- src/TCP_Client.c++.orig 2003-01-18 15:18:12.0 +0100
+++ src/TCP_Client.c++  2018-03-15 18:35:24.0 +0100
@@ -287,12 +287,14 @@
 {
 if (!to_be_scanned.size())
conn.ready_for_input(false);
+assert((*(long **)ip)[4] != 0L); // ip func[4] (scan) not virtual 
 to_be_scanned.insert(ip);
 }
 
 void
 TCP_Client::dequeue_from_scan(Interest *ip)
 {
+assert((*(long **)ip)[4] != 0L); // ip func[4] (scan) not virtual 
 to_be_scanned.remove(ip);
 if (!to_be_scanned.size())
conn.ready_for_input(true);

While recompiling, I noticed this warning:
Listener.o: In function `Listener::create_local_client(TCP_Client&, unsigned 
int)':
fam-2.7.0/build-tree/fam-2.7.0/src/Listener.c++:220: warning: the use of 
`tempnam' is dangerous, better use `mkstemp'

I'm not sure how that is related to this bug, if at all.

Waiting for next core dump...



Bug#483811: [fam] fam segfaults

2018-02-06 Thread Alessandro Vesely
Package: fam
Version: 2.7.0-17.1
Followup-For: Bug #483811

I eventually managed to compile a non-stripped version of famd.
In order to run it, I modified /etc/init.d/fam, which reportbug
detected and automatically copied below (search "-ale:").  I'll
attach /root/famd-wrapper in a moment.  It's a useful workaround
anyway.

Now what I found in the dumped core file:

Reading symbols from /usr/sbin/famd...done.
[New LWP 27816]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/famd -v -f -T 0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x in ?? ()
(gdb) bt
#0  0x in ?? ()
#1  0x004128e7 in TCP_Client::unblock_handler (closure=0x1a9f4c0) at 
TCP_Client.c++:270
#2  0x004103cc in Scheduler::handle_io (fds=0x17438c0, 
fds@entry=0x7ffd982ee5b0, iotype=::FDInfo::read, 
iotype@entry=::FDInfo::write) at Scheduler.c++:315
#3  0x00410601 in Scheduler::select () at Scheduler.c++:342
#4  0x00402dc5 in loop () at Scheduler.h:89
#5  main (argc=, argv=0x7ffd982ee828) at main.c++:306

Scheduler::handle_io called the handler, presumably 
NetConnection::write_handler()
which called NetConnection::flush() which called NetConnection::set_handlers().
All of those function return void and call the next function right before 
returning,
so their stack frames are optimized out.

The relevant code in frame #1 is:

Interest *ip;
while (client->ready_for_events() && (ip = client->to_be_scanned.first()))
{   client->to_be_scanned.remove(ip);
ip->scan();
}

The culprit value of ip:
(gdb) info vtbl ip
vtable for 'Interest' @ 0x1743530 (subobject @ 0x17438c0):
[0]: 0x20
[1]: 0x311
[2]: 0x17431b0
[3]: 0x17438b0
[4]: 0x0
[5]: 0x0
[6]: 0x1743470
[7]: 0x1743396
[8]: 0x0
[9]: 0x0

ip->scan() corresponds to entry [4], callq  *0x20(%rdx), hence segv.

I checked the value of ip is correct.  Although it was just removed
from the set, the value is consistent with the registers.  It is not
an Interest*.  In addition, I looked at the 6 key entries in the root
node of the set, defined as Set to_be_scanned; I found:

(gdb) set $i = 0
(gdb) while ($i < 6)
 >printf "%s\n", 
 >typeid(*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[$i]).__name
 >set $i = $i + 1
 >end

8DirEntry
`
8Interest
8DirEntry
8DirEntry
8DirEntry

So there is a pure abstract element and an unknown pointer.
The diagnosis is garbage in the set.  Now this could be a
flaw in BTree.h or something strange in ClientInterest::scan(),
which seems to be the only place where new entries are added
to the set.  Ideas?

The key[0] entry is actually a mail server temporary file:
(gdb) p (*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[0]).myname
$1 = 0x18479b0 ".5058547.1517362926.M462851P21244V", '0' , 
"6811I00566211_0.north,S=4981"
(gdb) p 
(*(*(*(TCP_Client*)0x1a9f4c0).to_be_scanned.root).key[0]).old_stat.st_mtim
$2 = {tv_sec = 1517563957, tv_nsec = 0}

The tv_sec matches the creation of the core dump, four days ago.

Note that it took weeks to get a core dump.  Now I wait for the next.

Ale

-- System Information:
Debian Release: 8.10
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: sysvinit (via /sbin/init)

Versions of packages fam depends on:
ii  libc6  2.19-18+deb8u10
ii  libgcc11:4.9.2-10
ii  libstdc++6 4.9.2-10
ii  lsb-base   4.1+Debian13+nmu1
ii  rpcbind [portmap]  0.2.1-6+deb8u2
ii  update-inetd   4.43

fam recommends no packages.

fam suggests no packages.

-- Configuration Files:
/etc/init.d/fam changed:
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON="/usr/sbin/famd"
NAME="FAM"
DESC="file alteration monitor"
FAMOPTS="-T 0"
test -x $DAEMON || exit 0
egrep -qs "^(sgi_fam|391002)" /etc/inetd.conf
. /lib/lsb/init-functions
set -e
case "$1" in
  start)
status_of_proc $DAEMON $NAME > /dev/null && exit 0
log_daemon_msg "Starting $DESC" "$NAME"
# -ale: was:start-stop-daemon --start --quiet --exec $DAEMON -- 
$FAMOPTS < /dev/null
(setsid /root/famd-wrapper& )&
log_end_msg $?
;;
  stop)
log_daemon_msg "Stopping $DESC" "$NAME"
start-stop-daemon --stop --oknodo --quiet --exec $DAEMON
log_end_msg $?
;;
  restart|force-reload)
$0 stop
sleep 1
$0 start
;;
  status)
status_of_proc $DAEMON $NAME
;;
  *)
echo "Usage: $0 {start|stop|restart|force-reload|status}" >&2
exit 1
;;
esac
exit 0


-- no debconf information
#! /bin/sh
# called from /etc/init.d/fam, where I replaced the start-stop-daemon
# 

Bug#483811: [fam] fam segfaults

2008-05-31 Thread Jürgen Richtsfeld
Package: fam
Version: 2.7.0-13.2
Severity: normal

--- Please enter the report below this line. ---
no idea why, but i found this in my /var/log/messages:

kernel: famd[3044]: segfault at e08 rip 414f45 rsp 7fffedd01c80 error 4

sorry, I don't have anything more.

jürgen


--- System information. ---
Architecture: amd64
Kernel:   Linux 2.6.24-5-1

Debian Release: lenny/sid
  500 unstablewww.debian-multimedia.org 
  500 unstablemi.mirror.garr.it 
  500 unstableftp.tuke.sk 
  500 unstableftp.de.debian.org 
  500 unstableftp.at.debian.org 
  500 unstabledebian.inode.at 
1 experimentaldebian.inode.at 

--- Package information. ---
Depends(Version) | Installed
-+-
libc6 (= 2.7-1) | 2.7-11
libgcc1  (= 1:4.1.1-21) | 1:4.3.0-5
libstdc++6 (= 4.1.1-21) | 4.3.0-5
lsb-base  (= 3.0-6) | 3.2-12
portmap  | 6.0-5.1
update-inetd | 4.30




--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]