Funny you mention it. I am seeing that exact issue even though I am not running grid of grids. About a week we added bunch more machines and top grid __SummaryInfo__ is now updated only occasionally ie. I may see data once an hour. This happens even with 3.5.0 so I am suspecting there is something much deeper than this. For a while now I have been suspecting that summarization code causes major slow downs. For us this was particularly exacerbated by having couple thousand unique metric names. I have eliminated bulk of metrics from being summarized using the

unsummarized_metrics

which has radicallly improved performance however I suspect problem is much deeper. Unfortunately I have been busy on other stuff however I am hoping to spend some time next week working on that.

Vladimir

On 11/17/2013 07:17 PM, Nicholas Satterly wrote:
Hi Adam,

Our experience was that the summary RRDs were actually generated but then rarely updated. Only very occasionally would we see metrics suddenly get written to the RRD and only for a few intervals and then there would be large gaps again.

Do graphs based on the RRDs you are getting in your tests look right?

Regards,
Nick


On Fri, Nov 15, 2013 at 8:15 PM, Adam Compton <acomp...@quantcast.com> wrote:
Nicholas, I'm the person who submitted #92. I've attempted to replicate the problem and I'm still seeing summary RRDs being written for the top grid in a grid-of-grids configuration (assuming you mean "/var/lib/ganglia/rrds/__SummaryInfo__/*.rrd").

Can you please share the configs you used to reproduce this issue? I'd like to fix the bug and submit a patch, but I don't know how to replicate the problem.

Thanks,
Adam



On 11/3/13 2:04 PM, Nicholas Satterly wrote:
Hi Bernard,

I think this is the bug in federation that you might be thinking of as I've mentioned it before. I don't have a fix for this. It's quite a large patch and I've never looked at this part of the codebase before.

Regards,
Nick


On Sun, Nov 3, 2013 at 5:10 PM, Bernard Li <bern...@vanhpc.org> wrote:
My $0.02 is that Grid of Grids (federation) is still a widely used feature so we should attempt to fix it.

Nick -- do you still have another outstanding pull request to fix a bug in federation?  If so, what's the hold up?  Just waiting for someone with authorization to accept it?

Thanks!

Bernard


On Sat, Nov 2, 2013 at 5:14 PM, Nicholas Satterly <nfsatte...@gmail.com> wrote:
I have confirmed that this patch [1] broke writing of the root summaries for the top-level gmetad when in a grid-of-grids setup. What should we do? Revert the patch, attempt to debug it, or just log a github issue to track it for now?

Regards,
Nick



On Tue, Sep 24, 2013 at 12:40 PM, Nicholas Satterly <nfsatte...@gmail.com> wrote:
Hi Illydth,

You might have missed that the pull request that added the break back also added more logic to the endElement_GRID() function to fix double-writing of the last cluster. So yes, that break is meant to be there again. See https://github.com/ganglia/monitor-core/pull/73

However, what isn't clear is why there is a new grid-of-grids problem. I suspect that it relates to this pull request but I haven't been able to confirm this yet. See https://github.com/ganglia/monitor-core/pull/92

Regards,
Nick


On Fri, Sep 20, 2013 at 7:41 PM, Douglas Wagner <dougla...@gmail.com> wrote:
So the last time I tried this upgrade thing (3.1.7 -> 3.4.0) I was getting no grid of grids information.  Ran across the fix with the help of others on the list and documented it here:

http://sourceforge.net/apps/phpbb/ganglia/viewtopic.php?f=4&t=16&p=28

So now I've upgraded from 3.4.0 to 3.6.0.  I have 2 new clients (RHEL6) that I'm implementing.  Went through the build process and built out RPMs for RHEL6.

Turned on GMOND and I'm not seeing either of the two systems reporting into the associated GMETAD.  The Web Interface isn't updating with the new boxes.

As I start going back through some of my past issues, I ran back across this where in 3.4.0 Grid of Grids was broken.  And when I check the reported file and problem again I see the same old code (the "break;" at the end of the first switch block).

Is this broken again in 3.6?  or is this the correct code and I should be looking somewhere else for why my new RHEL6 clients aren't reporting to my GMETAD system?

--Illydth

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
ganglia-gene...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general




--
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
      Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid                  Nicholas Satterly (Debian Key) <nfsatte...@gmail.com>
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]




--
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
      Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid                  Nicholas Satterly (Debian Key) <nfsatte...@gmail.com>
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]


------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers





--
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
      Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid                  Nicholas Satterly (Debian Key) <nfsatte...@gmail.com>
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]





--
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
      Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid                  Nicholas Satterly (Debian Key) <nfsatte...@gmail.com>
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]



------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk


_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers



------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to