Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-28 Thread Carlo Marcelo Arenas Belon
On Wed, Nov 26, 2008 at 08:40:55AM -0700, Brad Nicholes wrote:
  On 11/26/2008 at 3:45 AM, in message
 [EMAIL PROTECTED], Martin Knoblauch
 [EMAIL PROTECTED] wrote:
  
  From: Brad Nicholes [EMAIL PROTECTED]
  
   On 11/25/2008 at 10:14 AM, in message 
  [EMAIL PROTECTED],
  Ofer Inbar wrote:
   Brad Nicholes wrote:
   It needs a temp directory to get around some issues with libconfuse.
   Libconfuse doesn't actually support wildcard paths or files.  A
   libconfuse include statement must have a full path to the file that
   it is going include.  So gmond makes up for this problem by creating
   a temp file, resolving all of the file paths and names and then
   writing them as separate includes in a temp file.  Then it tells
   libconfuse to include the temp file directly.  Without the ability
   to resolve the wildcard paths and write them to a temp file, the
   wildcarding feature of gmond wouldn't work.  To solve the problem
   that you are describing, we would have to actually add wildcard
   capability to libconfuse.
   
   Might this be cleaner workaround that would work for gmond as well?
   
- override libconfuse's include function as you're already doing
- resolve file paths and names as you're already doing
- instead of writing that to a temp file and telling libconfuse to
  include that file, just tell libconfuse to include each individual
  file (the same filenames you're now writing to the temp file)
   
  
  No, libconfuse doesn't work that way.  The include handler can only 
  manipulate 
  the file path that it is handed.  So the result of the handler has to be a 
  single absolute file path.  There isn't any way to take a single file path 
  as 
  input into the handler and return multiple file paths back to libconfuse.  
  The 
  only way to do it was to write all of the individual file paths to a file 
  and 
  then hand libconfuse back a single file path to the new include file.
  
  
   the question is: can't the handler be rewritten to the conversions in 
  memory, without needing to write a temp file? This would make the process 
  more robust. You never know when a disk is full, or goes RO.
 
 No, I tried doing that already but was unsuccessful.  Libconfuse
 is limited in what you can do in this area.

the API libconfuse exports is limited to handling single file includes
(as documented) so it shouldn't be a surprise that it wouldn't handle a
wildcard include with it.

 The problem is that when libconfuse wants to read in the include file,
 it is in the middle of the lexer and needs to continue.  A handler can't
 just read the file and hand it back to libconfuse through some other
 cfg_* call.

an alternative will be to preprocess the configuration file and feed it
into a buffer in memory, resolving all includes, and then call
libconfuse to parse and process the buffer instead.

this would have also the nice side effect of preventing gmond/gmetric to
segfault if there is no gmond.conf (hence using the embedded
configuration) and there are files in the include path (as documented in
the release notes since 3.1 for requiring gmond.conf if using modpython).

 This may be a design flaw in libconfuse but it is the way it works now
 and we have to live with it.

since AFAIK no libconfuse developer was ever notified of their flaw it
might be as well that our implementation is abusing their API.

will check with them and update back with any suggestions.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-26 Thread Carlo Marcelo Arenas Belon
On Tue, Nov 25, 2008 at 04:33:05PM -0700, Brad Nicholes wrote:
 
 The result was that if the wildcard produced more than 10 included files
 (which it easily does even in our default configuration), libconfuse
 choked because it thought it had hit the maximum nesting level

our RPMs for ganglia only install 3 files in /etc/ganglia/conf.d; gentoo
has 2 and fedora 10 (just released) has 4.

even if I agree that 10 is somehow low and you would expect that as more
modules are deployed it will be soon problematic, it would seem that at
least in this case, one problem was traded for another.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-26 Thread Martin Knoblauch
- Original Message 

 From: Carlo Marcelo Arenas Belon [EMAIL PROTECTED]
 To: Ofer Inbar [EMAIL PROTECTED]
 Cc: ganglia-general@lists.sourceforge.net
 Sent: Tuesday, November 25, 2008 9:49:22 AM
 Subject: Re: [Ganglia-general] gmetric fails when disk is unwriteable?
 
 On Fri, Nov 21, 2008 at 11:33:05PM -0500, Ofer Inbar wrote:
  
  What's the dependency that causes gmetric to require that the
  filesystem the CWD is on be writeable?
 
 as explained by Brad it is not the CWD that needs to be writeable but a
 TMPDIR (which for root can also be the current directory) and that is
 detected by APR.
 
 Recent Linux (since around kernel 2.4.16) requires a ramdrive mounted in
 /dev/shm, so one way to workaround this problem is to define :
 
   TMPDIR=/dev/shm
 

 Is TMPDIR only used for the include file handler, or also for other stuff. Not 
that we fill memory with lots of unexpected data.

Cheers
Martin


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-26 Thread Brad Nicholes
 On 11/26/2008 at 3:45 AM, in message
[EMAIL PROTECTED], Martin Knoblauch
[EMAIL PROTECTED] wrote:
 - Original Message 
 
 From: Brad Nicholes [EMAIL PROTECTED]
 To: Ofer Inbar [EMAIL PROTECTED]
 Cc: ganglia-general@lists.sourceforge.net 
 Sent: Tuesday, November 25, 2008 8:43:08 PM
 Subject: Re: [Ganglia-general] gmetric fails when disk is unwriteable?
 
  On 11/25/2008 at 10:14 AM, in message 
 [EMAIL PROTECTED],
 Ofer Inbar wrote:
  Brad Nicholes wrote:
  It needs a temp directory to get around some issues with libconfuse.
  Libconfuse doesn't actually support wildcard paths or files.  A
  libconfuse include statement must have a full path to the file that
  it is going include.  So gmond makes up for this problem by creating
  a temp file, resolving all of the file paths and names and then
  writing them as separate includes in a temp file.  Then it tells
  libconfuse to include the temp file directly.  Without the ability
  to resolve the wildcard paths and write them to a temp file, the
  wildcarding feature of gmond wouldn't work.  To solve the problem
  that you are describing, we would have to actually add wildcard
  capability to libconfuse.
  
  Might this be cleaner workaround that would work for gmond as well?
  
   - override libconfuse's include function as you're already doing
   - resolve file paths and names as you're already doing
   - instead of writing that to a temp file and telling libconfuse to
 include that file, just tell libconfuse to include each individual
 file (the same filenames you're now writing to the temp file)
  
 
 No, libconfuse doesn't work that way.  The include handler can only 
 manipulate 
 the file path that it is handed.  So the result of the handler has to be a 
 single absolute file path.  There isn't any way to take a single file path 
 as 
 input into the handler and return multiple file paths back to libconfuse.  
 The 
 only way to do it was to write all of the individual file paths to a file 
 and 
 then hand libconfuse back a single file path to the new include file.
 
 
  the question is: can't the handler be rewritten to the conversions in 
 memory, without needing to write a temp file? This would make the process 
 more robust. You never know when a disk is full, or goes RO.
 

No, I tried doing that already but was unsuccessful.  Libconfuse is limited in 
what you can do in this area.  The problem is that when libconfuse wants to 
read in the include file, it is in the middle of the lexer and needs to 
continue.  A handler can't just read the file and hand it back to libconfuse 
through some other cfg_* call.  This may be a design flaw in libconfuse but it 
is the way it works now and we have to live with it. 

Brad


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-26 Thread Brad Nicholes
 On 11/26/2008 at 1:17 AM, in message [EMAIL PROTECTED],
Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:
 On Tue, Nov 25, 2008 at 04:33:05PM -0700, Brad Nicholes wrote:
 
 The result was that if the wildcard produced more than 10 included files
 (which it easily does even in our default configuration), libconfuse
 choked because it thought it had hit the maximum nesting level
 
 our RPMs for ganglia only install 3 files in /etc/ganglia/conf.d; gentoo
 has 2 and fedora 10 (just released) has 4.
 
 even if I agree that 10 is somehow low and you would expect that as more
 modules are deployed it will be soon problematic, it would seem that at
 least in this case, one problem was traded for another.
 

The fact is that 10 is low which is why I discovered that last year when I 
implemented the wildcard path support.  In our case we routinely run with 20+ 
modules and configure them using separately included .conf files so that each 
one can be easily turned on or off by simply renaming the included .conf file.  
This is a very valuable feature which isn't unique to ganglia.  Limiting this 
very useful feature now in gmond on the remote chance that a file system might 
go read only and cause an issue for gmetric, isn't a very good trade off.  It 
isn't that one problem was traded for another.  

At the time when I implemented the code to support wildcard paths, nobody knew 
anything about gmetric not being able to run in a read only file system.  There 
was no trade off begin made.  The fact is that whether or not gmetric is able 
to run in a read only file system is a much smaller issue than allowing gmond 
or gmetric to run in an undetermined state because the code allows parts of the 
configuration to be ignored.  Introducing a patch that knowingly ignores parts 
of the configuration due to errors in the file system is unacceptable behavior. 
 The bug that this kind of patch introduces is much larger than an issue with 
gmetric not being able to run in a read only environment.  Gmond being able to 
resolve wildcard paths is a standard feature and behavior that is used every 
day, gmetric being able to run in a read only file system is not.  The real 
issue is why did the disk go read only.  There are plenty of gmond metrics that 
provide the administrator with warnings and a metric module that indicates when 
a file system has gone read only is extremely easy to write.   

A more acceptable solution to the gmetric problem is to provide gmetric with 
its own .conf file that contains just the socket and port information rather 
than pointing gmetric at gmond.conf.  In this case both gmond and gmetric will 
continue to run even in a read only file system.  This solution can be easily 
implemented today without any code changes and especially without a code patch 
that introduces a much more serious bug.  If we need to solve the gmetric being 
able to run in a read only file system, then we need to come up with a better 
patch.  Crippling gmond and gmetric with a patch that allows them to ignore a 
fatal error because parts of the configuration was skipped, is not an 
acceptable patch.

Brad


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-25 Thread Carlo Marcelo Arenas Belon
On Mon, Nov 24, 2008 at 04:55:42PM -0700, Brad Nicholes wrote:
  On 11/24/2008 at 3:47 PM, in message [EMAIL PROTECTED],
 Ofer Inbar [EMAIL PROTECTED] wrote:
   I tried feeding one of my custom metrics by hand:
   [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 
   --units 
   'connections'
   /etc/ganglia/gmond.conf:94: failed to determine the temp dir
   Parse error for '/etc/ganglia/gmond.conf'
 
 It needs a temp directory to get around some issues with libconfuse.

gmond does; gmetric doesn't need anything more than to know which
channel to use (hence nothing in the includes) and it is getting
blocked by this restriction because of its use of libganglia to
read gmond's configuration through libgmond.

 To solve the problem that you are describing, we would have to actually
 add wildcard capability to libconfuse.

libconfuse is instructed to use our implementation for includes and that
uses a temporary file, so this is fixable in our code.

a fix to the problem reported by Ofer only needs our handler modified
so that failures to create temporary files to handle includes are not
treated as fatal as Committed revision 1922

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-25 Thread Carlo Marcelo Arenas Belon
On Fri, Nov 21, 2008 at 11:33:05PM -0500, Ofer Inbar wrote:
 
 What's the dependency that causes gmetric to require that the
 filesystem the CWD is on be writeable?

as explained by Brad it is not the CWD that needs to be writeable but a
TMPDIR (which for root can also be the current directory) and that is
detected by APR.

Recent Linux (since around kernel 2.4.16) requires a ramdrive mounted in
/dev/shm, so one way to workaround this problem is to define :

  TMPDIR=/dev/shm

3.0 gmetric is not affected and so could be also used as an alternative.

Carlo

PS. SysVinit workaround for gmond Committed revision 1923

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-25 Thread Ofer Inbar
Brad Nicholes [EMAIL PROTECTED] wrote:
 It needs a temp directory to get around some issues with libconfuse.
 Libconfuse doesn't actually support wildcard paths or files.  A
 libconfuse include statement must have a full path to the file that
 it is going include.  So gmond makes up for this problem by creating
 a temp file, resolving all of the file paths and names and then
 writing them as separate includes in a temp file.  Then it tells
 libconfuse to include the temp file directly.  Without the ability
 to resolve the wildcard paths and write them to a temp file, the
 wildcarding feature of gmond wouldn't work.  To solve the problem
 that you are describing, we would have to actually add wildcard
 capability to libconfuse.

Might this be cleaner workaround that would work for gmond as well?

 - override libconfuse's include function as you're already doing
 - resolve file paths and names as you're already doing
 - instead of writing that to a temp file and telling libconfuse to
   include that file, just tell libconfuse to include each individual
   file (the same filenames you're now writing to the temp file)

  -- Cos

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-25 Thread Brad Nicholes
 On 11/25/2008 at 1:08 AM, in message [EMAIL PROTECTED],
Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:
 On Mon, Nov 24, 2008 at 04:55:42PM -0700, Brad Nicholes wrote:
  On 11/24/2008 at 3:47 PM, in message 
 [EMAIL PROTECTED],
 Ofer Inbar [EMAIL PROTECTED] wrote:
   I tried feeding one of my custom metrics by hand:
   [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 
   --units 
   'connections'
   /etc/ganglia/gmond.conf:94: failed to determine the temp dir
   Parse error for '/etc/ganglia/gmond.conf'
 
 It needs a temp directory to get around some issues with libconfuse.
 
 gmond does; gmetric doesn't need anything more than to know which
 channel to use (hence nothing in the includes) and it is getting
 blocked by this restriction because of its use of libganglia to
 read gmond's configuration through libgmond.
 

Anything can be included from the main gmond.conf file.  There is nothing that 
says that a user can't put socket and channel information in a separate file 
and then include it from gmond.conf.  So making the assumption that gmetric 
doesn't need includes is false.  If this is a real problem for users, then 
gmetric should be using a different .conf file that only contains the socket 
information rather than using the same gmond.conf file that contains all of the 
metric information and includes.  Also, both gmond and gmetric both use the 
same code path for resolving the configuration, so if the code is changed to 
ignore configuration failures for gmetric, it is also changed to ignore 
configuration failures for gmond.  This isn't a good thing.  This problem 
doesn't require a code change to be resolved.  Simple documentation for gmetric 
would solve the problem.

 To solve the problem that you are describing, we would have to actually
 add wildcard capability to libconfuse.
 
 libconfuse is instructed to use our implementation for includes and that
 uses a temporary file, so this is fixable in our code.
 
 a fix to the problem reported by Ofer only needs our handler modified
 so that failures to create temporary files to handle includes are not
 treated as fatal as Committed revision 1922
 

No, libconfuse doesn't work that way.  The include handler only allows gmond to 
manipulate the input into a form that libconfuse can handle.  In this case the 
input is a single wildcard file path that needs to be translated into a single 
absolute file path.  libconfuse can not handle wild card paths.  Also 
libconfuse only knows how to get its input from a file.  The gmond include 
handler is only manipulating the wildcard path into an absolute path to a file 
that contains all of the resolved paths.  At that point libconfuse is able to 
read and process all of the included files through absolute paths.  The include 
handler has nothing to do with just translating a single wildcard path into 
multiple absolute paths and then handing them back to libconfuse in memory.  
These include paths have to be written to a file first and then libconfuse has 
to be told where the new file is.  This problem can't be fixed by just changing 
the include handler, otherwise I would have done it that way.

Revision 1922 currently breaks the configuration file handling and needs to be 
reverted.

Brad



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-25 Thread Brad Nicholes
 On 11/25/2008 at 10:14 AM, in message [EMAIL PROTECTED],
Ofer Inbar [EMAIL PROTECTED] wrote:
 Brad Nicholes [EMAIL PROTECTED] wrote:
 It needs a temp directory to get around some issues with libconfuse.
 Libconfuse doesn't actually support wildcard paths or files.  A
 libconfuse include statement must have a full path to the file that
 it is going include.  So gmond makes up for this problem by creating
 a temp file, resolving all of the file paths and names and then
 writing them as separate includes in a temp file.  Then it tells
 libconfuse to include the temp file directly.  Without the ability
 to resolve the wildcard paths and write them to a temp file, the
 wildcarding feature of gmond wouldn't work.  To solve the problem
 that you are describing, we would have to actually add wildcard
 capability to libconfuse.
 
 Might this be cleaner workaround that would work for gmond as well?
 
  - override libconfuse's include function as you're already doing
  - resolve file paths and names as you're already doing
  - instead of writing that to a temp file and telling libconfuse to
include that file, just tell libconfuse to include each individual
file (the same filenames you're now writing to the temp file)
 

No, libconfuse doesn't work that way.  The include handler can only manipulate 
the file path that it is handed.  So the result of the handler has to be a 
single absolute file path.  There isn't any way to take a single file path as 
input into the handler and return multiple file paths back to libconfuse.  The 
only way to do it was to write all of the individual file paths to a file and 
then hand libconfuse back a single file path to the new include file.

Brad 


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-25 Thread Brad Nicholes
 On 11/25/2008 at 10:14 AM, in message [EMAIL PROTECTED],
Ofer Inbar [EMAIL PROTECTED] wrote:
 Brad Nicholes [EMAIL PROTECTED] wrote:
 It needs a temp directory to get around some issues with libconfuse.
 Libconfuse doesn't actually support wildcard paths or files.  A
 libconfuse include statement must have a full path to the file that
 it is going include.  So gmond makes up for this problem by creating
 a temp file, resolving all of the file paths and names and then
 writing them as separate includes in a temp file.  Then it tells
 libconfuse to include the temp file directly.  Without the ability
 to resolve the wildcard paths and write them to a temp file, the
 wildcarding feature of gmond wouldn't work.  To solve the problem
 that you are describing, we would have to actually add wildcard
 capability to libconfuse.
 
 Might this be cleaner workaround that would work for gmond as well?
 
  - override libconfuse's include function as you're already doing
  - resolve file paths and names as you're already doing
  - instead of writing that to a temp file and telling libconfuse to
include that file, just tell libconfuse to include each individual
file (the same filenames you're now writing to the temp file)
 

At one point I had tried to do exactly what is being suggested here.  See 
revision

http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=revrevision=813

The problem that I ran into was that libconfuse thought that each call to 
cfg_include() meant that the include was nested deeper rather than at the same 
level.  The result was that if the wildcard produced more than 10 included 
files (which it easily does even in our default configuration), libconfuse 
choked because it thought it had hit the maximum nesting level even through we 
were still at a nesting level of one.

Brad


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-24 Thread Brad Nicholes
 On 11/21/2008 at 9:33 PM, in message [EMAIL PROTECTED],
Ofer Inbar [EMAIL PROTECTED] wrote:
 One of our servers encountered an I/O error that put its root
 filesystem into read only mode.  Both /var and /tmp are on that
 filesystem, so all logging stopped and most everything stopped.
 
 However, gmond kept on running, and reporting metrics.  Great!
 This is yet another way in which Ganglia wins over most other
 monitoring systems that involve scripts that write things to disk or
 otherwise depend on things (such as ssh logins) that need to write to
 disk.
 
 However, a program I have that feeds custom metrics to gmond via
 gmetric stopped working when the / filesystem went read-only.  I
 tried running it in debug mode, and got this error:
 
   /etc/ganglia/gmond.conf:94: failed to determine the temp dir
   Parse error for '/etc/ganglia/gmond.conf'
 
 Line 94 of gmond.conf is:
   include ('/etc/ganglia/conf.d/*.conf') 
 
 We've never had an /etc/ganglia/conf.d directory, it always ignores that.
 
 I tried feeding one of my custom metrics by hand:
 [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 
 --units 
 'connections'
 /etc/ganglia/gmond.conf:94: failed to determine the temp dir
 Parse error for '/etc/ganglia/gmond.conf'
 
 Then, I cd'ed over to a filesystem that is still in read/write mode:
 [root /otherfilesys]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type 
 uint8 
 --units 'connections'
 
 No error, and it worked.
 
 What's the dependency that causes gmetric to require that the
 filesystem the CWD is on be writeable?  Does it really need that
 dependency?  It's great that Ganglia is so robust in the face of
 failures, but it'd be even better if gmetric were also as robust.
   -- Cos
 

Both gmetric and gmond read the same .conf file.  If the .conf file has an 
include() statement that specifies a wildcard file path, processing the 
wildcard path requires a temp directory.  If you aren't loading any files from 
the wildcard include path (ie. /etc/gmond/conf.d/*) then just remove the 
include statement from the .conf file and everything should work fine in a 
readonly environment.  The reason why gmond kept running but you had problems 
with gmetric is because gmond had already processed the wildcard path before 
the filesystem switched to readonly.  Every time gmetric starts, it needs to 
re-read the .conf and process the wildcard path.  

Brad


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-24 Thread Ofer Inbar
  I tried feeding one of my custom metrics by hand:
  [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 
  --units 
  'connections'
  /etc/ganglia/gmond.conf:94: failed to determine the temp dir
  Parse error for '/etc/ganglia/gmond.conf'
  
  Then, I cd'ed over to a filesystem that is still in read/write mode:
  [root /otherfilesys]$ gmetric --name net_smtp_fin_wait2_out --value 0 
  --type uint8 
  --units 'connections'
  
  No error, and it worked.
  
  What's the dependency that causes gmetric to require that the
  filesystem the CWD is on be writeable?  Does it really need that
  dependency?  It's great that Ganglia is so robust in the face of
  failures, but it'd be even better if gmetric were also as robust.

Someone wrote me to suggest running it with strace, which is an
obvious thing to do but unfortunately I didn't think of it at the time
of the failure (it was late at night).  However, Brad knows the answer:

Brad Nicholes [EMAIL PROTECTED] wrote:
 Both gmetric and gmond read the same .conf file.  If the .conf file
 has an include() statement that specifies a wildcard file path,
 processing the wildcard path requires a temp directory.  If you

Removing the wildcard doesn't seem ideal, since it's something one
might want to use and it's part of the standard config, so removing
it and then forgetting that seems like a likely cause of confusion.
Also, most people would never think to investigate something that's
in the supplied conf file and doesn't seem to cause harm.  If we want
robustness in the face of failure, having gmetric and gmond able to
run without having to write to disk sounds like a better goal.  Is
it doable?

Why does it need to write to a temp directory to process a wildcard?

Are there any other parts of gmond or gmetric that depend on being
able to write to disk?  It seems that both of these programs should be
able to avoid writing to disk entirely (except for swap/paging space
on a memory-starved host).
  -- Cos

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-24 Thread Brad Nicholes
 On 11/24/2008 at 3:47 PM, in message [EMAIL PROTECTED],
Ofer Inbar [EMAIL PROTECTED] wrote:
  I tried feeding one of my custom metrics by hand:
  [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 
  --units 
  'connections'
  /etc/ganglia/gmond.conf:94: failed to determine the temp dir
  Parse error for '/etc/ganglia/gmond.conf'
  
  Then, I cd'ed over to a filesystem that is still in read/write mode:
  [root /otherfilesys]$ gmetric --name net_smtp_fin_wait2_out --value 0 
  --type 
 uint8 
  --units 'connections'
  
  No error, and it worked.
  
  What's the dependency that causes gmetric to require that the
  filesystem the CWD is on be writeable?  Does it really need that
  dependency?  It's great that Ganglia is so robust in the face of
  failures, but it'd be even better if gmetric were also as robust.
 
 Someone wrote me to suggest running it with strace, which is an
 obvious thing to do but unfortunately I didn't think of it at the time
 of the failure (it was late at night).  However, Brad knows the answer:
 
 Brad Nicholes [EMAIL PROTECTED] wrote:
 Both gmetric and gmond read the same .conf file.  If the .conf file
 has an include() statement that specifies a wildcard file path,
 processing the wildcard path requires a temp directory.  If you
 
 Removing the wildcard doesn't seem ideal, since it's something one
 might want to use and it's part of the standard config, so removing
 it and then forgetting that seems like a likely cause of confusion.
 Also, most people would never think to investigate something that's
 in the supplied conf file and doesn't seem to cause harm.  If we want
 robustness in the face of failure, having gmetric and gmond able to
 run without having to write to disk sounds like a better goal.  Is
 it doable?
 
 Why does it need to write to a temp directory to process a wildcard?
 
 Are there any other parts of gmond or gmetric that depend on being
 able to write to disk?  It seems that both of these programs should be
 able to avoid writing to disk entirely (except for swap/paging space
 on a memory-starved host).
   -- Cos

It needs a temp directory to get around some issues with libconfuse.  
Libconfuse doesn't actually support wildcard paths or files.  A libconfuse 
include statement must have a full path to the file that it is going include.  
So gmond makes up for this problem by creating a temp file, resolving all of 
the file paths and names and then writing them as separate includes in a temp 
file.  Then it tells libconfuse to include the temp file directly.  Without the 
ability to resolve the wildcard paths and write them to a temp file, the 
wildcarding feature of gmond wouldn't work.  To solve the problem that you are 
describing, we would have to actually add wildcard capability to libconfuse.

Brad


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-21 Thread Ofer Inbar
One of our servers encountered an I/O error that put its root
filesystem into read only mode.  Both /var and /tmp are on that
filesystem, so all logging stopped and most everything stopped.

However, gmond kept on running, and reporting metrics.  Great!
This is yet another way in which Ganglia wins over most other
monitoring systems that involve scripts that write things to disk or
otherwise depend on things (such as ssh logins) that need to write to
disk.

However, a program I have that feeds custom metrics to gmond via
gmetric stopped working when the / filesystem went read-only.  I
tried running it in debug mode, and got this error:

  /etc/ganglia/gmond.conf:94: failed to determine the temp dir
  Parse error for '/etc/ganglia/gmond.conf'

Line 94 of gmond.conf is:
  include ('/etc/ganglia/conf.d/*.conf') 

We've never had an /etc/ganglia/conf.d directory, it always ignores that.

I tried feeding one of my custom metrics by hand:
[root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 
'connections'
/etc/ganglia/gmond.conf:94: failed to determine the temp dir
Parse error for '/etc/ganglia/gmond.conf'

Then, I cd'ed over to a filesystem that is still in read/write mode:
[root /otherfilesys]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type 
uint8 --units 'connections'

No error, and it worked.

What's the dependency that causes gmetric to require that the
filesystem the CWD is on be writeable?  Does it really need that
dependency?  It's great that Ganglia is so robust in the face of
failures, but it'd be even better if gmetric were also as robust.
  -- Cos

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general