Re: [Ganglia-developers] gmond crashes in pkts_out_func().

Jason A. Smith Mon, 26 Aug 2002 14:55:38 -0700

On Mon, 2002-08-26 at 17:01, Steven Wagner wrote:
> > The real cause of the crash is of course the lone zero at the end of the
> > file buffer, but I have no idea how that got there.  It looks like
> > update_file just calls slurpfile which just does a simple open and read
> > into the buffer.
> 
> Are you sure that the extra values are on a new line?  On my 2.4.3 kernel,
> /proc/net/dev has one line per interface, period.  The transmit section is 
> similar to the receive section but has "colls" and "carrier" instead of 
> "frame" (i.e. one extra field)... (I've tested on 2.4.3 and 2.4.18)


This computer is using a fairly standard RedHat-7.2 system with the
2.4.9-31smp kernel.  The contents of /proc/net/dev look normal to me,
whenever I cat it, the last line is the eth2 line.  The only thing I can
think of is that that proc_net_dev.buffer is somehow getting corrupted. 
I have checked a few crashes now, and there is usually either an empty
line at the end of the buffer (2-newline characters), or a zero then a
newline.

> I guess I could put in an "or end-of-file" in there somewhere...

You could try checking for EOF or other errors, but I fear this would
only hide the real problem which is what is causing the corruption of
the proc_net_dev.buffer in the first place.

After thinking about this problem a little bit more, I think I might
know what is causing the problem.  The only thing I can think is that at
one point, the size of the contents in /proc/net/dev must decrease
slightly, maybe because of the way it is formatted.  In the
monitor-core/lib/file.c:slurpfile function, it just calls the system's
read function and checks for errors.  The system read call will not pad
the end of your buffer with a null to let you know where it ends, so if
you are over-writing an existing buffer filled with unknown contents the
only way you know where the buffer you just read in ends is by the
number returned from the read call.

Because of this, I think the correct fix is to have the slurpfile
function pad the end of the buffer it just read in with a null char upon
successful reads.

> It's also possible (but unlikely) that you're using a kernel that has 
> somehow changed the formatting in procfs.  In which case you'll need some 
> linux.c code that behaves differently "if (kernel_version == 'Funky Dev 
> Kernel')" ... :)
> 
> > Anyone have any ideas?  I appended a short gdb log below showing the
> > stack trace and the problem variables.
> 
> See above for ideas.  Thanks for the log, though, it helps to see 
> /proc/net/dev ...


-- 
/------------------------------------------------------------------\
|  Jason A. Smith                         Email:  [EMAIL PROTECTED]  |
|  Atlas Computing Facility               Phone:  (631)344-4226    |
|  Brookhaven National Lab, Bldg. 510M    Fax:    (631)344-7616    |
|  Upton, NY 11973-5000                                            |
\------------------------------------------------------------------/

Re: [Ganglia-developers] gmond crashes in pkts_out_func().

Reply via email to