Re: [Nut-upsuser] [Bug 535583] Excessive logging by apcsmart program

2011-02-15 Thread Arnaud Quette
2011/2/15 Lupe Christoph

 On Monday, 2011-02-14 at 21:54:20 -, Arnaud Quette wrote:
  I definitely need more info!
  please reply to ALL:

  - what is the exact model and date of manufacturing?

 SmartUPS 300I NET. I have the serial number (GS9809283199) but no date.


 it seems to be a recent model.

 - are you sure this unit is ok?

 You can't prove the absence of faults.


this was related to the following question...


  - have you really checked the cabling or made the whole (cable + UPS)
 work
  somehow (using APC's software or apcupsd)?

 Well, as I said this is working OK for days or weeks. Then something
 happens that triggers a bug in apcsmart.


 quickly reading back the thread, I can't find these info...

 - what is the meantime between occurrences of these issues?

 I don;t have enough data. It's in the range of weeks or months.


as per your previous posts, this seemed more to be a matter of minutes /
hours.

 - is the device reachable (using upsc for example) between issues?

 Sure, everything works fine.

  A driver debug output is really needed!

 I'm running it again, but no promises. Reboots are much more frequent
 than this misbehaviour.

  Note that I'm not the developer of this driver, nor have any acquaintance
  with APC.

 Same here. Though I will probably try to locate this bug if we don;t
 make progress with the debugging output, either because it does not tell
 us enough or because I don't manage to capture it.

 I would have thought finding the place in the code where it is trying to
 reset the UPS connection wouldn't be this hard.


this is not the problem. This code is in the smartmode() function of
apcsmart.c:
http://svn.debian.org/wsvn/nut/trunk/drivers/apcsmart.c

we see the 5 attempts to go to smart mode ('Y' command), but my aim is to
understand why it is failing, and how to cleanly solve this without
impacting support for other units.

Some more questions:
- how are you handling the device's permissions?
Refer to ยง II, section 3:
http://git.debian.org/?p=collab-maint/nut.git;a=blob_plain;f=debian/nut.README.Debian;hb=HEAD

cheers
Arnaud
-- 
Linux / Unix Expert RD - Eaton - http://powerquality.eaton.com
Network UPS Tools (NUT) Project Leader - http://www.networkupstools.org/
Debian Developer - http://www.debian.org
Free Software Developer - http://arnaud.quette.free.fr/
--
Conseiller Municipal - Saint Bernard du Touvet
___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser

Re: [Nut-upsuser] upsd crashes with a broken pipe error

2011-02-15 Thread Zach La Celle

Resurrecting this problem, because I finally caught it in the debugger...

Here's the trace, with some GDB prints.  Please excuse the length.

-
...
545815.397326   mainloop: polling 4 filedescriptors
*** glibc detected *** /sbin/upsd: malloc(): memory corruption: 
0x0061f300 ***

=== Backtrace: =
/lib/libc.so.6(+0x775b6)[0x776ac5b6]
/lib/libc.so.6(+0x7b6d8)[0x776b06d8]
/lib/libc.so.6(__libc_malloc+0x6e)[0x776b158e]
/sbin/upsd[0x408e91]
/sbin/upsd[0x4091d9]
/sbin/upsd[0x409431]
/sbin/upsd[0x4097e2]
/sbin/upsd[0x409bdb]
/sbin/upsd[0x402a26]
/sbin/upsd[0x403789]
/lib/libc.so.6(__libc_start_main+0xfd)[0x77653c4d]
/sbin/upsd[0x402079]
=== Memory map: 
0040-0040e000 r-xp  fb:00 8806463
/sbin/upsd
0060d000-0060e000 r--p d000 fb:00 8806463
/sbin/upsd
0060e000-0060f000 rw-p e000 fb:00 8806463
/sbin/upsd
0060f000-0063 rw-p  00:00 0  
[heap]

7000-70021000 rw-p  00:00 0
70021000-7400 ---p  00:00 0
76dfd000-76e13000 r-xp  fb:00 9093248
/lib/libgcc_s.so.1
76e13000-77012000 ---p 00016000 fb:00 9093248
/lib/libgcc_s.so.1
77012000-77013000 r--p 00015000 fb:00 9093248
/lib/libgcc_s.so.1
77013000-77014000 rw-p 00016000 fb:00 9093248
/lib/libgcc_s.so.1
77014000-7702 r-xp  fb:00 5104002
/lib/libnss_files-2.11.1.so
7702-7721f000 ---p c000 fb:00 5104002
/lib/libnss_files-2.11.1.so
7721f000-7722 r--p b000 fb:00 5104002
/lib/libnss_files-2.11.1.so
7722-77221000 rw-p c000 fb:00 5104002
/lib/libnss_files-2.11.1.so
77221000-7722b000 r-xp  fb:00 5104004
/lib/libnss_nis-2.11.1.so
7722b000-7742a000 ---p a000 fb:00 5104004
/lib/libnss_nis-2.11.1.so
7742a000-7742b000 r--p 9000 fb:00 5104004
/lib/libnss_nis-2.11.1.so
7742b000-7742c000 rw-p a000 fb:00 5104004
/lib/libnss_nis-2.11.1.so
7742c000-77434000 r-xp  fb:00 5104000
/lib/libnss_compat-2.11.1.so
77434000-77633000 ---p 8000 fb:00 5104000
/lib/libnss_compat-2.11.1.so
77633000-77634000 r--p 7000 fb:00 5104000
/lib/libnss_compat-2.11.1.so
77634000-77635000 rw-p 8000 fb:00 5104000
/lib/libnss_compat-2.11.1.so
77635000-777af000 r-xp  fb:00 5103992
/lib/libc-2.11.1.so
777af000-779ae000 ---p 0017a000 fb:00 5103992
/lib/libc-2.11.1.so
779ae000-779b2000 r--p 00179000 fb:00 5103992
/lib/libc-2.11.1.so
779b2000-779b3000 rw-p 0017d000 fb:00 5103992
/lib/libc-2.11.1.so

779b3000-779b8000 rw-p  00:00 0
779b8000-779c1000 r-xp  fb:00 9093205
/lib/libwrap.so.0.7.6
779c1000-77bc ---p 9000 fb:00 9093205
/lib/libwrap.so.0.7.6
77bc-77bc1000 r--p 8000 fb:00 9093205
/lib/libwrap.so.0.7.6
77bc1000-77bc2000 rw-p 9000 fb:00 9093205
/lib/libwrap.so.0.7.6

77bc2000-77bc3000 rw-p  00:00 0
77bc3000-77bda000 r-xp  fb:00 5103999
/lib/libnsl-2.11.1.so
77bda000-77dd9000 ---p 00017000 fb:00 5103999
/lib/libnsl-2.11.1.so
77dd9000-77dda000 r--p 00016000 fb:00 5103999
/lib/libnsl-2.11.1.so
77dda000-77ddb000 rw-p 00017000 fb:00 5103999
/lib/libnsl-2.11.1.so

77ddb000-77ddd000 rw-p  00:00 0
77ddd000-77dfd000 r-xp  fb:00 9093254
/lib/ld-2.11.1.so

77fee000-77ff1000 rw-p  00:00 0
77ff8000-77ffb000 rw-p  00:00 0
77ffb000-77ffc000 r-xp  00:00 0  
[vdso]
77ffc000-77ffd000 r--p 0001f000 fb:00 9093254
/lib/ld-2.11.1.so
77ffd000-77ffe000 rw-p 0002 fb:00 9093254
/lib/ld-2.11.1.so

77ffe000-77fff000 rw-p  00:00 0
7ffea000-7000 rw-p  00:00 0  
[stack]
ff60-ff601000 r-xp  00:00 0  
[vsyscall]


Program received signal SIGABRT, Aborted.
0x77668a75 in raise () from /lib/libc.so.6
(gdb) up
#1  0x7766c5c0 in abort () from /lib/libc.so.6
(gdb)
#2  0x776a24fb in ?? () from /lib/libc.so.6
(gdb)
#3  0x776ac5b6 in ?? () from 

Re: [Nut-upsuser] upsd crashes with a broken pipe error

2011-02-15 Thread Arjen de Korte

Citeren Zach La Celle lace...@roboticresearch.com:

You can see where the problem happens in parseconf.c, on line 125  
with the code:

/* resize the lists */
ctx-arglist = realloc(ctx-arglist,
   sizeof(char *) * ctx-numargs);


With the given arguments, this boils down to

ctx-arglist = realloc(NULL, sizeof(char *));

This is all normal. Upon the first invocation of add_arg_word,  
ctx-arglist will be a NULL pointer (since there is nothing in the  
list yet). This should then allocate a one element array of a pointer  
to char (to store the


If ptr is a null pointer, realloc() shall be equivalent to malloc()  
for the specified size.


After that, all hell breaks loose, but that's out of our control.

There is a slight problem in lines 131-132

ctx-argsize = realloc(ctx-argsize, sizeof(int *) * ctx-numargs);

which should really read

ctx-argsize = realloc(ctx-argsize, sizeof(size_t) * ctx-numargs);

but I doubt that sizeof(size_t) will be smaller that sizeof(int *), so  
this just wastes a few bytes of memory.



This also might help:
(gdb) p *ctx
$4 = {f = 0x0, state = 5, ch = 9, arglist = 0x0, argsize = 0x0,  
numargs = 1, maxargs = 1, wordbuf = 0x61f2e0 Z, wordptr = 0x61f2fd  
, wordbufsize = 16, linenum = 0, error = 0, errmsg = '\000'  
repeats 255 times, errhandler = 0, magic = 7497264, arg_limit =  
32, wordlen_limit = 512}


None of these values is suspect.

If I go up in GDB to the pconf_char function, here is the  
character which is killing it:

(gdb) p ch
$6 = 9 '\t'


This is expected. Any whitespace character ends the collection of  
characters for the current argument and will start a new one. Nothing  
out of the ordinary. If it was, 100% of the NUT installations would  
suffer the same problems as you're seeing 100% of the time they start  
the upsd server. This is not the case and even in your case, the  
problem seems to occur intermittently, which is more an indication  
you're either running out of memory or the system is suffering from  
bad memory. Did you run a memory check lately?


Best regards, Arjen
--
Please keep list traffic on the list (off-list replies will be rejected)


___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser