Hi Mark,

Yes, I have checked from SVN that the code was not changed for many
revisions, so I also suspected there might be something specific to 
those installations and our cfengine configuration, but what is strange
is that I had not experienced similar crashes with our machines with 
identical configuration (and software version, by the way),
because all the disks were dd'ed from the same source. I raised it
because I was unsure whether there might be some behaviour I don't 
know of that may lead to those crashes. In Case 1 the SIGABRT occurred
while recursive copying a dir to another dir with mixed regular files
and symbolic links. In Case 2 the
SIGSEGV occurred while processing the "processes" section and apparently
checking the ps auxw.
For instance, whether the locale or charset or special filename pattern
may have anything to do with it.

Yes, they were repeatable on many of our systems only recently, but not some
time before. So probably some of our recent file changes may have triggered
that. I recalled seeing some spurious SIGSEGV on a few other systems but 
were not always repeatable there.

I have briefly compiled a version of 2.2.2 and put it in some temporary
directory on some of our systems with crashes seen with 2.2.1. Up to now
a few runs have been through without crashing, but considering it has
not sustained enough real usage,
At this point I cannot tell for sure whether those problems may pop up
again once I replaced the
systemwide version ( 2.2.1 ) with 2.2.2 and some file changes made which
require copying and
restarting processes.

Will try to migrate more machines to 2.2.2 and check whether the issue
would go away.

Thank you again for your reply.

Regards,
Bernard Chan.

On Sun, 07 Oct 2007 10:25:41 +0200, Mark Burgess wrote
> Hi Bernard,
> 
> thanks for this information. This is a little unusual. In fact this 
> is not a SEG fault but an abort signal, which is software generated. 
> It comes from file operations, which is code that has not changed 
> for several years. This makes me suspect that there could be some 
> site-specific reason for this.
> 
> Does this happen regularly/repeatably? On the same host, or different
> ones? Would you be willing to try compiling 2.2.2 to see if there are
> any differences?
> 
> thanks
> Mark
> 
> Bernard Chan wrote:
> > Hello,
> > 
> > I experienced various instances of segfaults on some cfengine 
> > installations.
> > Following shows the two cases which I encounter so far:
> > 
> > Compiler: gcc 3.4.4
> > Version: cfengine 2.2.1
> > Linux (Distribubtion: AsteriskNow)
> > 
> > CASE 1
> > 
> > (gdb) run -D forceUpdate
> > Starting program: /usr/local/sbin/cfagent -D forceUpdate
> > Detaching after fork from child process 4679.
> > *** glibc detected *** free(): invalid pointer: 0x081772c8 ***
> > 
> > Program received signal SIGABRT, Aborted.
> > 0xb7f7f410 in ?? ()
> > (gdb) back
> > #0  0xb7f7f410 in ?? ()
> > #1  0xbff88560 in ?? ()
> > #2  0x00000006 in ?? ()
> > #3  0x00001244 in ?? ()
> > #4  0xb7c3b275 in raise () from /lib/tls/libc.so.6
> > #5  0xb7c3ca59 in abort () from /lib/tls/libc.so.6
> > #6  0xb7c6f19a in __fsetlocking () from /lib/tls/libc.so.6
> > #7  0xb7c750a7 in malloc_usable_size () from /lib/tls/libc.so.6
> > #8  0xb7c75abb in free () from /lib/tls/libc.so.6
> > #9  0xb7c97e08 in closedir () from /lib/tls/libc.so.6
> > #10 0x0805fe75 in cfclosedir (dirh=0xb7d29e40) at image.c:1086
> > #11 0x080a162e in RecursiveImage (ip=0x81602f0,
> >     from=0xbff92950 "/mnt/asterisksetup", to=0xbff90950 "/etc/asterisk_bak",
> >     maxrecurse=-99) at expand-image.c:234
> > #12 0x08052c25 in MakeImages () at do.c:2548
> > #13 0x0804de24 in DoTree (passes=3, info=0x80a7afa "Main Tree")
> >     at cfagent.c:1328
> > #14 0x0804ea5f in main (argc=3, argv=0xbff94aa4) at cfagent.c:180
> > 
> > CASE 2
> > 
> > 
> > (gdb) run -q -D forceUpdate
> > Starting program: /usr/local/sbin/cfagent -q -D forceUpdate
> > Detaching after fork from child process 6206.
> > 
> > Detaching after fork from child process 6207.
> > Detaching after fork from child process 6208.
> > Detaching after fork from child process 6209.
> > Detaching after fork from child process 6210.
> > Detaching after fork from child process 6211.
> > 
> > Program received signal SIGSEGV, Segmentation fault.
> > 0xb7c0efee in free () from /lib/tls/libc.so.6
> > (gdb) back
> > #0  0xb7c0efee in free () from /lib/tls/libc.so.6
> > #1  0xb7c10701 in malloc () from /lib/tls/libc.so.6
> > #2  0x0806237d in AppendItem (liststart=0xbfd1b608,
> >     itemstring=0x816d130 "root       362  0.0  0.0      0     0 ?        
> > S<   10:05   0:00 [cifsoplockd]", classes=0x8199b18 "") at item.c:349
> > #3  0x080624fd in CopyList (dest=0xbfd1b608, source=0x8193030) at item.c:210
> > #4  0x0805d513 in LoadProcessTable (procdata=0xbfd1b748,
> >     psopts=0x80b15c1 "auxw") at process.c:78
> > #5  0x0805302d in CheckProcesses () at do.c:2678
> > #6  0x0804ddc9 in DoTree (passes=3, info=0x80a7afa "Main Tree")
> >     at cfagent.c:1348
> > #7  0x0804ea5f in main (argc=4, argv=0xbfd1b834) at cfagent.c:180
> > 
> > 
> > Thanks for creating the cfengine
> > 
> > Regards,
> > Bernard Chan.
> > 
> > _______________________________________________
> > Bug-cfengine mailing list
> > [email protected]
> > https://cfengine.org/mailman/listinfo/bug-cfengine
> 
> -- 
> Mark Burgess
> 
> Professor of Network and System Administration
> Oslo University College
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Work:
+47 22453272            Email:  [EMAIL PROTECTED] Fax : +47 22453205    
       WWW  :  http://www.iu.hio.no/~mark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
_______________________________________________ Bug-cfengine mailing list
[email protected] https://cfengine.org/mailman/listinfo/bug-cfengine


--
PowerAll Networks Ltd (http://www.powerallnetworks.com)

_______________________________________________
Bug-cfengine mailing list
[email protected]
https://cfengine.org/mailman/listinfo/bug-cfengine

Reply via email to