On Wed, Feb 23, 2011 at 09:42:56AM -0800, Bernard Li wrote:
> 
> > what second pass?
> >
> > ? dummy = proc_sys_kernel_osrelease;
> > ? rval.int32 = slurpfile("/proc/sys/kernel/osrelease", &dummy,
> > ? ? ? ? ? ? ? ? ? ? ? ? ?MAX_G_STRING_SIZE);
> >
> > why would anyone call slurpfile in a loop anyway?, and slurpfile
> > doesn't call itself recursively but just reads as much data as it
> > can into the buffer provided (second parameter).
> 
> Sorry I wasn't clear, I meant the "goto read" loop:
> 
> 123   read:
> 124      read_len = read(fd, db, buflen);
> 125      if (read_len <= 0)
> 126         {
> 127            if (errno == EINTR)
> 128               goto read;
> 129            err_ret("slurpfile() read() error on file %s", filename);
> 130            close(fd);
> 131            return SYNAPSE_FAILURE;
> 132         }

this code is not relevant as it is only called when EINTR is received
because a signal interrupts the read call (very unlikely)

the second conditional after that code is used to continue reading the
buffer after it is resized if that is possible and that works fine as
shown by your tests

136    if (read_len == buflen)
137       {
138          if (dynamic) {
139             dynamic += buflen;
140             db = realloc(*buffer, dynamic);
141             *buffer = db;
142             db = *buffer + dynamic - buflen;
143             goto read;
144          } else {
145             --read_len;
146             err_msg("slurpfile() read() buffer overflow on file %s", 
filenam    e);
147          }
148       }

> When I straced the process, the first read() was able to read up to
> MAX_G_STRING, however, the second read() returns 0.  However, if I
> read a regular file (not in /proc filesystem), it was able to read the
> rest of the string in the "second" pass just fine.

this just sounds to strange, but was able to replicate it after a lot of
guessing in a CentOS 5 VM (both 32bit and 64bit) as shown by :

# strace -e read dd if=/proc/sys/kernel/osrelease bs=16 > /dev/null
read(0, "2.6.18-164.9.1.e", 16)         = 16                                    
read(0, "", 16)                         = 0       

so not a ganglia problem, and just a problem with the way you were trying
to use slurpfile and the way that specific sysctl handler is implemented
in that version of the kernel.

makes sense anyway to not worry about partial reads from a value that is
meant to be used whole anyway, but interestingly enough and as you reported
later it is no longer working that way with newer kernels.

> Regarding this particular bug -- how should we fix this?  There are
> currently two issues:
> 
> 1) The OS release is truncated in the web frontend

and that is to protect the gmond process against crashes

> 2) The warning "slurpfile() read() buffer overflow on file
> /proc/sys/kernel/osrelease" is displayed multiple times during RPM
> installation (possibly because gmond was called to generate conf files
> etc.)

that was meant to be mostly informative, but the message might need to
be reworked to be more effective.

> Can we potentially increase MAX_G_STRING or have
> proc_sys_kernel_osrelease buffer size resize dynamically?

no

Carlo

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to