Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On Wed, Nov 26, 2008 at 08:40:55AM -0700, Brad Nicholes wrote: On 11/26/2008 at 3:45 AM, in message [EMAIL PROTECTED], Martin Knoblauch [EMAIL PROTECTED] wrote: From: Brad Nicholes [EMAIL PROTECTED] On 11/25/2008 at 10:14 AM, in message [EMAIL PROTECTED], Ofer Inbar wrote: Brad Nicholes wrote: It needs a temp directory to get around some issues with libconfuse. Libconfuse doesn't actually support wildcard paths or files. A libconfuse include statement must have a full path to the file that it is going include. So gmond makes up for this problem by creating a temp file, resolving all of the file paths and names and then writing them as separate includes in a temp file. Then it tells libconfuse to include the temp file directly. Without the ability to resolve the wildcard paths and write them to a temp file, the wildcarding feature of gmond wouldn't work. To solve the problem that you are describing, we would have to actually add wildcard capability to libconfuse. Might this be cleaner workaround that would work for gmond as well? - override libconfuse's include function as you're already doing - resolve file paths and names as you're already doing - instead of writing that to a temp file and telling libconfuse to include that file, just tell libconfuse to include each individual file (the same filenames you're now writing to the temp file) No, libconfuse doesn't work that way. The include handler can only manipulate the file path that it is handed. So the result of the handler has to be a single absolute file path. There isn't any way to take a single file path as input into the handler and return multiple file paths back to libconfuse. The only way to do it was to write all of the individual file paths to a file and then hand libconfuse back a single file path to the new include file. the question is: can't the handler be rewritten to the conversions in memory, without needing to write a temp file? This would make the process more robust. You never know when a disk is full, or goes RO. No, I tried doing that already but was unsuccessful. Libconfuse is limited in what you can do in this area. the API libconfuse exports is limited to handling single file includes (as documented) so it shouldn't be a surprise that it wouldn't handle a wildcard include with it. The problem is that when libconfuse wants to read in the include file, it is in the middle of the lexer and needs to continue. A handler can't just read the file and hand it back to libconfuse through some other cfg_* call. an alternative will be to preprocess the configuration file and feed it into a buffer in memory, resolving all includes, and then call libconfuse to parse and process the buffer instead. this would have also the nice side effect of preventing gmond/gmetric to segfault if there is no gmond.conf (hence using the embedded configuration) and there are files in the include path (as documented in the release notes since 3.1 for requiring gmond.conf if using modpython). This may be a design flaw in libconfuse but it is the way it works now and we have to live with it. since AFAIK no libconfuse developer was ever notified of their flaw it might be as well that our implementation is abusing their API. will check with them and update back with any suggestions. Carlo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On Tue, Nov 25, 2008 at 04:33:05PM -0700, Brad Nicholes wrote: The result was that if the wildcard produced more than 10 included files (which it easily does even in our default configuration), libconfuse choked because it thought it had hit the maximum nesting level our RPMs for ganglia only install 3 files in /etc/ganglia/conf.d; gentoo has 2 and fedora 10 (just released) has 4. even if I agree that 10 is somehow low and you would expect that as more modules are deployed it will be soon problematic, it would seem that at least in this case, one problem was traded for another. Carlo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
- Original Message From: Carlo Marcelo Arenas Belon [EMAIL PROTECTED] To: Ofer Inbar [EMAIL PROTECTED] Cc: ganglia-general@lists.sourceforge.net Sent: Tuesday, November 25, 2008 9:49:22 AM Subject: Re: [Ganglia-general] gmetric fails when disk is unwriteable? On Fri, Nov 21, 2008 at 11:33:05PM -0500, Ofer Inbar wrote: What's the dependency that causes gmetric to require that the filesystem the CWD is on be writeable? as explained by Brad it is not the CWD that needs to be writeable but a TMPDIR (which for root can also be the current directory) and that is detected by APR. Recent Linux (since around kernel 2.4.16) requires a ramdrive mounted in /dev/shm, so one way to workaround this problem is to define : TMPDIR=/dev/shm Is TMPDIR only used for the include file handler, or also for other stuff. Not that we fill memory with lots of unexpected data. Cheers Martin - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On 11/26/2008 at 3:45 AM, in message [EMAIL PROTECTED], Martin Knoblauch [EMAIL PROTECTED] wrote: - Original Message From: Brad Nicholes [EMAIL PROTECTED] To: Ofer Inbar [EMAIL PROTECTED] Cc: ganglia-general@lists.sourceforge.net Sent: Tuesday, November 25, 2008 8:43:08 PM Subject: Re: [Ganglia-general] gmetric fails when disk is unwriteable? On 11/25/2008 at 10:14 AM, in message [EMAIL PROTECTED], Ofer Inbar wrote: Brad Nicholes wrote: It needs a temp directory to get around some issues with libconfuse. Libconfuse doesn't actually support wildcard paths or files. A libconfuse include statement must have a full path to the file that it is going include. So gmond makes up for this problem by creating a temp file, resolving all of the file paths and names and then writing them as separate includes in a temp file. Then it tells libconfuse to include the temp file directly. Without the ability to resolve the wildcard paths and write them to a temp file, the wildcarding feature of gmond wouldn't work. To solve the problem that you are describing, we would have to actually add wildcard capability to libconfuse. Might this be cleaner workaround that would work for gmond as well? - override libconfuse's include function as you're already doing - resolve file paths and names as you're already doing - instead of writing that to a temp file and telling libconfuse to include that file, just tell libconfuse to include each individual file (the same filenames you're now writing to the temp file) No, libconfuse doesn't work that way. The include handler can only manipulate the file path that it is handed. So the result of the handler has to be a single absolute file path. There isn't any way to take a single file path as input into the handler and return multiple file paths back to libconfuse. The only way to do it was to write all of the individual file paths to a file and then hand libconfuse back a single file path to the new include file. the question is: can't the handler be rewritten to the conversions in memory, without needing to write a temp file? This would make the process more robust. You never know when a disk is full, or goes RO. No, I tried doing that already but was unsuccessful. Libconfuse is limited in what you can do in this area. The problem is that when libconfuse wants to read in the include file, it is in the middle of the lexer and needs to continue. A handler can't just read the file and hand it back to libconfuse through some other cfg_* call. This may be a design flaw in libconfuse but it is the way it works now and we have to live with it. Brad - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On 11/26/2008 at 1:17 AM, in message [EMAIL PROTECTED], Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Tue, Nov 25, 2008 at 04:33:05PM -0700, Brad Nicholes wrote: The result was that if the wildcard produced more than 10 included files (which it easily does even in our default configuration), libconfuse choked because it thought it had hit the maximum nesting level our RPMs for ganglia only install 3 files in /etc/ganglia/conf.d; gentoo has 2 and fedora 10 (just released) has 4. even if I agree that 10 is somehow low and you would expect that as more modules are deployed it will be soon problematic, it would seem that at least in this case, one problem was traded for another. The fact is that 10 is low which is why I discovered that last year when I implemented the wildcard path support. In our case we routinely run with 20+ modules and configure them using separately included .conf files so that each one can be easily turned on or off by simply renaming the included .conf file. This is a very valuable feature which isn't unique to ganglia. Limiting this very useful feature now in gmond on the remote chance that a file system might go read only and cause an issue for gmetric, isn't a very good trade off. It isn't that one problem was traded for another. At the time when I implemented the code to support wildcard paths, nobody knew anything about gmetric not being able to run in a read only file system. There was no trade off begin made. The fact is that whether or not gmetric is able to run in a read only file system is a much smaller issue than allowing gmond or gmetric to run in an undetermined state because the code allows parts of the configuration to be ignored. Introducing a patch that knowingly ignores parts of the configuration due to errors in the file system is unacceptable behavior. The bug that this kind of patch introduces is much larger than an issue with gmetric not being able to run in a read only environment. Gmond being able to resolve wildcard paths is a standard feature and behavior that is used every day, gmetric being able to run in a read only file system is not. The real issue is why did the disk go read only. There are plenty of gmond metrics that provide the administrator with warnings and a metric module that indicates when a file system has gone read only is extremely easy to write. A more acceptable solution to the gmetric problem is to provide gmetric with its own .conf file that contains just the socket and port information rather than pointing gmetric at gmond.conf. In this case both gmond and gmetric will continue to run even in a read only file system. This solution can be easily implemented today without any code changes and especially without a code patch that introduces a much more serious bug. If we need to solve the gmetric being able to run in a read only file system, then we need to come up with a better patch. Crippling gmond and gmetric with a patch that allows them to ignore a fatal error because parts of the configuration was skipped, is not an acceptable patch. Brad - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On Mon, Nov 24, 2008 at 04:55:42PM -0700, Brad Nicholes wrote: On 11/24/2008 at 3:47 PM, in message [EMAIL PROTECTED], Ofer Inbar [EMAIL PROTECTED] wrote: I tried feeding one of my custom metrics by hand: [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' /etc/ganglia/gmond.conf:94: failed to determine the temp dir Parse error for '/etc/ganglia/gmond.conf' It needs a temp directory to get around some issues with libconfuse. gmond does; gmetric doesn't need anything more than to know which channel to use (hence nothing in the includes) and it is getting blocked by this restriction because of its use of libganglia to read gmond's configuration through libgmond. To solve the problem that you are describing, we would have to actually add wildcard capability to libconfuse. libconfuse is instructed to use our implementation for includes and that uses a temporary file, so this is fixable in our code. a fix to the problem reported by Ofer only needs our handler modified so that failures to create temporary files to handle includes are not treated as fatal as Committed revision 1922 Carlo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On Fri, Nov 21, 2008 at 11:33:05PM -0500, Ofer Inbar wrote: What's the dependency that causes gmetric to require that the filesystem the CWD is on be writeable? as explained by Brad it is not the CWD that needs to be writeable but a TMPDIR (which for root can also be the current directory) and that is detected by APR. Recent Linux (since around kernel 2.4.16) requires a ramdrive mounted in /dev/shm, so one way to workaround this problem is to define : TMPDIR=/dev/shm 3.0 gmetric is not affected and so could be also used as an alternative. Carlo PS. SysVinit workaround for gmond Committed revision 1923 - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
Brad Nicholes [EMAIL PROTECTED] wrote: It needs a temp directory to get around some issues with libconfuse. Libconfuse doesn't actually support wildcard paths or files. A libconfuse include statement must have a full path to the file that it is going include. So gmond makes up for this problem by creating a temp file, resolving all of the file paths and names and then writing them as separate includes in a temp file. Then it tells libconfuse to include the temp file directly. Without the ability to resolve the wildcard paths and write them to a temp file, the wildcarding feature of gmond wouldn't work. To solve the problem that you are describing, we would have to actually add wildcard capability to libconfuse. Might this be cleaner workaround that would work for gmond as well? - override libconfuse's include function as you're already doing - resolve file paths and names as you're already doing - instead of writing that to a temp file and telling libconfuse to include that file, just tell libconfuse to include each individual file (the same filenames you're now writing to the temp file) -- Cos - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On 11/25/2008 at 1:08 AM, in message [EMAIL PROTECTED], Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Mon, Nov 24, 2008 at 04:55:42PM -0700, Brad Nicholes wrote: On 11/24/2008 at 3:47 PM, in message [EMAIL PROTECTED], Ofer Inbar [EMAIL PROTECTED] wrote: I tried feeding one of my custom metrics by hand: [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' /etc/ganglia/gmond.conf:94: failed to determine the temp dir Parse error for '/etc/ganglia/gmond.conf' It needs a temp directory to get around some issues with libconfuse. gmond does; gmetric doesn't need anything more than to know which channel to use (hence nothing in the includes) and it is getting blocked by this restriction because of its use of libganglia to read gmond's configuration through libgmond. Anything can be included from the main gmond.conf file. There is nothing that says that a user can't put socket and channel information in a separate file and then include it from gmond.conf. So making the assumption that gmetric doesn't need includes is false. If this is a real problem for users, then gmetric should be using a different .conf file that only contains the socket information rather than using the same gmond.conf file that contains all of the metric information and includes. Also, both gmond and gmetric both use the same code path for resolving the configuration, so if the code is changed to ignore configuration failures for gmetric, it is also changed to ignore configuration failures for gmond. This isn't a good thing. This problem doesn't require a code change to be resolved. Simple documentation for gmetric would solve the problem. To solve the problem that you are describing, we would have to actually add wildcard capability to libconfuse. libconfuse is instructed to use our implementation for includes and that uses a temporary file, so this is fixable in our code. a fix to the problem reported by Ofer only needs our handler modified so that failures to create temporary files to handle includes are not treated as fatal as Committed revision 1922 No, libconfuse doesn't work that way. The include handler only allows gmond to manipulate the input into a form that libconfuse can handle. In this case the input is a single wildcard file path that needs to be translated into a single absolute file path. libconfuse can not handle wild card paths. Also libconfuse only knows how to get its input from a file. The gmond include handler is only manipulating the wildcard path into an absolute path to a file that contains all of the resolved paths. At that point libconfuse is able to read and process all of the included files through absolute paths. The include handler has nothing to do with just translating a single wildcard path into multiple absolute paths and then handing them back to libconfuse in memory. These include paths have to be written to a file first and then libconfuse has to be told where the new file is. This problem can't be fixed by just changing the include handler, otherwise I would have done it that way. Revision 1922 currently breaks the configuration file handling and needs to be reverted. Brad - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On 11/25/2008 at 10:14 AM, in message [EMAIL PROTECTED], Ofer Inbar [EMAIL PROTECTED] wrote: Brad Nicholes [EMAIL PROTECTED] wrote: It needs a temp directory to get around some issues with libconfuse. Libconfuse doesn't actually support wildcard paths or files. A libconfuse include statement must have a full path to the file that it is going include. So gmond makes up for this problem by creating a temp file, resolving all of the file paths and names and then writing them as separate includes in a temp file. Then it tells libconfuse to include the temp file directly. Without the ability to resolve the wildcard paths and write them to a temp file, the wildcarding feature of gmond wouldn't work. To solve the problem that you are describing, we would have to actually add wildcard capability to libconfuse. Might this be cleaner workaround that would work for gmond as well? - override libconfuse's include function as you're already doing - resolve file paths and names as you're already doing - instead of writing that to a temp file and telling libconfuse to include that file, just tell libconfuse to include each individual file (the same filenames you're now writing to the temp file) No, libconfuse doesn't work that way. The include handler can only manipulate the file path that it is handed. So the result of the handler has to be a single absolute file path. There isn't any way to take a single file path as input into the handler and return multiple file paths back to libconfuse. The only way to do it was to write all of the individual file paths to a file and then hand libconfuse back a single file path to the new include file. Brad - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On 11/25/2008 at 10:14 AM, in message [EMAIL PROTECTED], Ofer Inbar [EMAIL PROTECTED] wrote: Brad Nicholes [EMAIL PROTECTED] wrote: It needs a temp directory to get around some issues with libconfuse. Libconfuse doesn't actually support wildcard paths or files. A libconfuse include statement must have a full path to the file that it is going include. So gmond makes up for this problem by creating a temp file, resolving all of the file paths and names and then writing them as separate includes in a temp file. Then it tells libconfuse to include the temp file directly. Without the ability to resolve the wildcard paths and write them to a temp file, the wildcarding feature of gmond wouldn't work. To solve the problem that you are describing, we would have to actually add wildcard capability to libconfuse. Might this be cleaner workaround that would work for gmond as well? - override libconfuse's include function as you're already doing - resolve file paths and names as you're already doing - instead of writing that to a temp file and telling libconfuse to include that file, just tell libconfuse to include each individual file (the same filenames you're now writing to the temp file) At one point I had tried to do exactly what is being suggested here. See revision http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=revrevision=813 The problem that I ran into was that libconfuse thought that each call to cfg_include() meant that the include was nested deeper rather than at the same level. The result was that if the wildcard produced more than 10 included files (which it easily does even in our default configuration), libconfuse choked because it thought it had hit the maximum nesting level even through we were still at a nesting level of one. Brad - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On 11/21/2008 at 9:33 PM, in message [EMAIL PROTECTED], Ofer Inbar [EMAIL PROTECTED] wrote: One of our servers encountered an I/O error that put its root filesystem into read only mode. Both /var and /tmp are on that filesystem, so all logging stopped and most everything stopped. However, gmond kept on running, and reporting metrics. Great! This is yet another way in which Ganglia wins over most other monitoring systems that involve scripts that write things to disk or otherwise depend on things (such as ssh logins) that need to write to disk. However, a program I have that feeds custom metrics to gmond via gmetric stopped working when the / filesystem went read-only. I tried running it in debug mode, and got this error: /etc/ganglia/gmond.conf:94: failed to determine the temp dir Parse error for '/etc/ganglia/gmond.conf' Line 94 of gmond.conf is: include ('/etc/ganglia/conf.d/*.conf') We've never had an /etc/ganglia/conf.d directory, it always ignores that. I tried feeding one of my custom metrics by hand: [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' /etc/ganglia/gmond.conf:94: failed to determine the temp dir Parse error for '/etc/ganglia/gmond.conf' Then, I cd'ed over to a filesystem that is still in read/write mode: [root /otherfilesys]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' No error, and it worked. What's the dependency that causes gmetric to require that the filesystem the CWD is on be writeable? Does it really need that dependency? It's great that Ganglia is so robust in the face of failures, but it'd be even better if gmetric were also as robust. -- Cos Both gmetric and gmond read the same .conf file. If the .conf file has an include() statement that specifies a wildcard file path, processing the wildcard path requires a temp directory. If you aren't loading any files from the wildcard include path (ie. /etc/gmond/conf.d/*) then just remove the include statement from the .conf file and everything should work fine in a readonly environment. The reason why gmond kept running but you had problems with gmetric is because gmond had already processed the wildcard path before the filesystem switched to readonly. Every time gmetric starts, it needs to re-read the .conf and process the wildcard path. Brad - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
I tried feeding one of my custom metrics by hand: [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' /etc/ganglia/gmond.conf:94: failed to determine the temp dir Parse error for '/etc/ganglia/gmond.conf' Then, I cd'ed over to a filesystem that is still in read/write mode: [root /otherfilesys]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' No error, and it worked. What's the dependency that causes gmetric to require that the filesystem the CWD is on be writeable? Does it really need that dependency? It's great that Ganglia is so robust in the face of failures, but it'd be even better if gmetric were also as robust. Someone wrote me to suggest running it with strace, which is an obvious thing to do but unfortunately I didn't think of it at the time of the failure (it was late at night). However, Brad knows the answer: Brad Nicholes [EMAIL PROTECTED] wrote: Both gmetric and gmond read the same .conf file. If the .conf file has an include() statement that specifies a wildcard file path, processing the wildcard path requires a temp directory. If you Removing the wildcard doesn't seem ideal, since it's something one might want to use and it's part of the standard config, so removing it and then forgetting that seems like a likely cause of confusion. Also, most people would never think to investigate something that's in the supplied conf file and doesn't seem to cause harm. If we want robustness in the face of failure, having gmetric and gmond able to run without having to write to disk sounds like a better goal. Is it doable? Why does it need to write to a temp directory to process a wildcard? Are there any other parts of gmond or gmetric that depend on being able to write to disk? It seems that both of these programs should be able to avoid writing to disk entirely (except for swap/paging space on a memory-starved host). -- Cos - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetric fails when disk is unwriteable?
On 11/24/2008 at 3:47 PM, in message [EMAIL PROTECTED], Ofer Inbar [EMAIL PROTECTED] wrote: I tried feeding one of my custom metrics by hand: [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' /etc/ganglia/gmond.conf:94: failed to determine the temp dir Parse error for '/etc/ganglia/gmond.conf' Then, I cd'ed over to a filesystem that is still in read/write mode: [root /otherfilesys]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' No error, and it worked. What's the dependency that causes gmetric to require that the filesystem the CWD is on be writeable? Does it really need that dependency? It's great that Ganglia is so robust in the face of failures, but it'd be even better if gmetric were also as robust. Someone wrote me to suggest running it with strace, which is an obvious thing to do but unfortunately I didn't think of it at the time of the failure (it was late at night). However, Brad knows the answer: Brad Nicholes [EMAIL PROTECTED] wrote: Both gmetric and gmond read the same .conf file. If the .conf file has an include() statement that specifies a wildcard file path, processing the wildcard path requires a temp directory. If you Removing the wildcard doesn't seem ideal, since it's something one might want to use and it's part of the standard config, so removing it and then forgetting that seems like a likely cause of confusion. Also, most people would never think to investigate something that's in the supplied conf file and doesn't seem to cause harm. If we want robustness in the face of failure, having gmetric and gmond able to run without having to write to disk sounds like a better goal. Is it doable? Why does it need to write to a temp directory to process a wildcard? Are there any other parts of gmond or gmetric that depend on being able to write to disk? It seems that both of these programs should be able to avoid writing to disk entirely (except for swap/paging space on a memory-starved host). -- Cos It needs a temp directory to get around some issues with libconfuse. Libconfuse doesn't actually support wildcard paths or files. A libconfuse include statement must have a full path to the file that it is going include. So gmond makes up for this problem by creating a temp file, resolving all of the file paths and names and then writing them as separate includes in a temp file. Then it tells libconfuse to include the temp file directly. Without the ability to resolve the wildcard paths and write them to a temp file, the wildcarding feature of gmond wouldn't work. To solve the problem that you are describing, we would have to actually add wildcard capability to libconfuse. Brad - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] gmetric fails when disk is unwriteable?
One of our servers encountered an I/O error that put its root filesystem into read only mode. Both /var and /tmp are on that filesystem, so all logging stopped and most everything stopped. However, gmond kept on running, and reporting metrics. Great! This is yet another way in which Ganglia wins over most other monitoring systems that involve scripts that write things to disk or otherwise depend on things (such as ssh logins) that need to write to disk. However, a program I have that feeds custom metrics to gmond via gmetric stopped working when the / filesystem went read-only. I tried running it in debug mode, and got this error: /etc/ganglia/gmond.conf:94: failed to determine the temp dir Parse error for '/etc/ganglia/gmond.conf' Line 94 of gmond.conf is: include ('/etc/ganglia/conf.d/*.conf') We've never had an /etc/ganglia/conf.d directory, it always ignores that. I tried feeding one of my custom metrics by hand: [root ~]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' /etc/ganglia/gmond.conf:94: failed to determine the temp dir Parse error for '/etc/ganglia/gmond.conf' Then, I cd'ed over to a filesystem that is still in read/write mode: [root /otherfilesys]$ gmetric --name net_smtp_fin_wait2_out --value 0 --type uint8 --units 'connections' No error, and it worked. What's the dependency that causes gmetric to require that the filesystem the CWD is on be writeable? Does it really need that dependency? It's great that Ganglia is so robust in the face of failures, but it'd be even better if gmetric were also as robust. -- Cos - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general