Re: [Ganglia-developers] RE: First prerelease of ganglia-3.0.3 ready for testing

Ian Cunningham Fri, 17 Mar 2006 10:43:45 -0800

Martin,

Now I have played around with delay code, this version has anexponential back off. The mode for the number of loops was 1 loop whichimplys your assumption is correct. The next highest frequency ofoccurrence for number of loops was 22, so waiting a very short biteither works, or doesn't and you have to wait a lot longer.

One idea we been throwing around is to idea is to not call send() foreach metric, but instead, get all the data for one host, and then sendit as one huge. We may run benchmarks on that later and see whats betterin terms of wall clock time.


Ian

/* this function wraps calls to apr_send_socket to handle EAGAIN */

apr_status_t socket_send_full(apr_socket_t *sock, const char *buf,apr_size_t *len)

{
 apr_status_t rv;
 int loop = 0;
 apr_size_t start_len;
 apr_interval_time_t t;

 start_len = (*len);
 (*len) = start_len;
 rv = apr_socket_send( sock, buf, len);

 while (loop++ < 33 && APR_STATUS_IS_EAGAIN(rv))
 {
   t = loop * loop * 100;
   apr_sleep(t);
   (*len) = start_len;
   rv = apr_socket_send( sock, buf, len);
 }
 return rv;
}


Martin Knoblauch wrote:

Hi Ian,

thanks for updation the patch.

Puuhhh. That behaviour you describe is bad indeed. Seems either Cygwin
or M$ are doing something stupid.

One thought - you are calling apr_socket_send() at a high frequency in
that loop. Have you played with inserting some delay code in the loop?
Maybe waiting a ms or so would increase the chance of success?

Cheers
Martin

--- Ian Cunningham <[EMAIL PROTECTED]> wrote:

Martin,
Non-scientific numbers here for you. Connecting to the tcp port 600times, print_host_metric() called apr_socket_send() at least 90,624times. Of those 90,624 times, we got stuck in a EAGAIN while loop1,190times. On average that while loop looped 29,116.66 times, withmaximumof 525,705 loops.
Pretty bad in my opinion. But the workaround... works :/

I have refactored all of the apr_socket_sends to use the workaround.
Ihave it error out if it loops more than 750,000 times *shakes head*.I've posted a patch to the bug that seems to work, it only bombed out
once in 600 tries.

http://bugzilla.ganglia.info/cgi-bin/bugzilla/attachment.cgi?id=27&action=view

Ian

Martin Knoblauch wrote:

Hi Richard,

correct. I was waiting for a comment from Ian on my concerns about
possible endless loops before committing the patch.

Ian: what do you think. Do you have any data how often you iterate
those EAGAIN loops?

Cheers
Martin


--- [EMAIL PROTECTED] wrote:

Gee,

I thought that was fixed with this patch:
http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=50

Actually, looking at 3.0.3 gmond.c, it looks like the patch did not
make
it
into the release - that's a shame.

Even looking at the patch, it looks as if it is a partial fix,
because
while
the patched metric printing is protected like this (gmond.c,
process_tcp_accept_channel):
<snip>
      rv = print_host_metric(client, metric, now);
      while(rv == EAGAIN)
      {
        rv = print_host_metric(client, metric, now);
      }
        if(rv != APR_SUCCESS)
          {
            goto close_accept_socket;
          }
      }
</snip>

the gmetric printing in the same function is not protected:
<snip>

    /* Send the gmetric info for this particular host */
    for(metric_hi = apr_hash_first(client_context, ((Ganglia_host
*)val)->gmetrics);
        metric_hi;
        metric_hi = apr_hash_next(metric_hi))
      {
        void *metric;
        apr_hash_this(metric_hi, NULL, NULL, &metric);

        /* Print each of the metrics from gmetric for this

host...

*/
        if(print_host_gmetric(client, metric, now) !=

APR_SUCCESS)

          {
            goto close_accept_socket;
          }
      }

It may be best to talk to the original owner of the patch,
I'm not confident to submit a patch myself, although I will try
to submit a bugzill entry.

kind regards,
Richard

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf

Of

Gilad Raphaelli
Sent: 14 March 2006 18:35
To: ganglia-developers@lists.sourceforge.net
Subject: [Ganglia-developers] RE: First prerelease of ganglia-3.0.3
ready for testing

I have tried the new release 3.0.3.200602231926
without success on FreeBSD 4.11 - the xml is still
truncated when attempting to access the data from a
remote host.  Interestingly, this is not the case when
trying from the host running gmond.  Based on the
strace, my colleague commented:

Default socket buffer is 64K.  It appears that
socket is non-blocking.  That last write is failing
(EAGAIN) because the socket buffer is full.  The
application is ignoring that fact and shutting down
the socket.  Looks to me like an application bug that
just accidentally works on rhel.

Please let me know if you need any more information.

Thank you,

Gil
-----------------------------------------------------

Running an strace on gmond (on the target host) while
trying to retrieve the data shows:
71160 write(10, "<METRIC NAME=\"swap_free\"
VAL=\"41"..., 124) = 124
71160 write(10, "<METRIC NAME=\"bytes_in\"
VAL=\"608"..., 129) = -1 EAGAIN
(Resource temporarily unavailable)
71160 shutdown(10, 0 /* receive */)     = 0

What this looks like from the requester (not the
exact same transaction):

<METRIC NAME="mem_buffers" VAL="204096" TYPE="uint32" UNITS="KB"
TN="119" TMAX="180" DMAX="0" SLOPE="both" SOURCE="gmond"/>  <METRIC
NAME="swap_free" VAL="4194136" TYPE="uint32" UNITS="KB" TN="119"
TMAX="180" DMAX="0" SLOPE="both" SOURCE="gmond"/>  Connection

closed

by
foreign host.

A normal transaction closes with a closing tag: </GANGLIA_XML>

__________________________________________________
Do You Yahoo!?

Tired of spam? Yahoo! Mail has the best spam protection aroundhttp://mail.yahoo.com


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language that extends applications into web and mobile media.

Attend

the
live webcast and join the prime developer group breaking into this
new
coding territory!

http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642

_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

------------------------------------------------------------------------

For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.
Internet communications are not secure and therefore the BarclaysGroup does not accept legal responsibility for the contents of this
message.  Although the Barclays Group operates anti-virus

programmes,

it does not accept responsibility for any damage whatsoever that is

caused by viruses being passed.  Any views or opinions presented

are

solely those of the author and do not necessarily represent those

of

theBarclays Group. Replies to this email may be monitored by theBarclaysGroup for operational or business reasons.

------------------------------------------------------------------------

-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the

live

webcast
and join the prime developer group breaking into this new coding
territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-developers] RE: First prerelease of ganglia-3.0.3 ready for testing

Reply via email to