Greetings, I have developed a shell script that runs on top of the collectl code and can be used to aggregate metrics at the node/cluster level. Details: https://github.com/saurabhska/collectl-cluster-monitoring
Regards, -Saurabh On Fri, Jun 19, 2015 at 11:09 AM, Saurabh Agrawal <[email protected]> wrote: > I see. Let me study things and evaluate the options. I will reach out to > you soon (if at all I am able to figure it out). > > On Fri, Jun 19, 2015 at 10:57 AM, Mark Seger <[email protected]> wrote: > >> I do hear what you're saying, I'm just saying i don't know when I might >> get around to this and that it would be easier to pipe the output through >> something else. it would also potentially keep colmux a little cleaner. >> One of the added complexities is knowing which variable out of hundreds are >> counters vs absolute values, keeping in mind that colmux also has to deal >> with plugins it may know nothing about. IF you had your own script that >> only dealt with collectl native variables that would be easiest and I'm >> still wondering how to deal with plugins and nothing immediately comes to >> mind. >> -mark >> >> On Fri, Jun 19, 2015 at 1:32 PM, Saurabh Agrawal <[email protected]> >> wrote: >> >>> Greetings, >>> >>> Thanks Mark for your reply and Vishal for your input (that's exactly the >>> next thing I'll need:)). >>> >>> @Mark, >>> As of now, I don't know perl scripting. But I am taking this as an >>> opportunity to learn perl and >>> contribute back to the open source community. So I am still trying to >>> figure things out. I believe >>> that instead of using pipes and then doing the totals, a better way to >>> do this should be to modify >>> the original scripts or maybe add a new switch. >>> >>> Not sure if I am right, but logically you should be getting these >>> numbers in some variables in your >>> scripts before you write them to the file/terminal. So I believe that >>> aggregating these numbers to >>> compute the totals should be the right way to do this. Please let me >>> know your thoughts and any >>> pointers that you think can be helpful. >>> >>> Again, thanks for writing this great tool and making our life easier! >>> >>> Regards, >>> -Saurabh >>> >>> >>> On Fri, Jun 19, 2015 at 7:04 AM, Mark Seger <[email protected]> wrote: >>> >>>> I agree, that would be useful. I'll add it to my 'todo' list and maybe >>>> some day you'll even see it. The biggest challenge I see, and when I do >>>> something I want to cover all cases which can be difficult, is how to deal >>>> with potential holes in the data. Another thing in my todo list is to >>>> report missing data, which almost never happens but when there's a problem >>>> sometimes collectl gets starved by higher priority processes like >>>> oomkiller, and you get gaps. So, if you do get gaps what does one do? >>>> simply leave those stats out of the calculations or fill in the blanks >>>> with the last know values? hmm, maybe another switch ;) >>>> >>>> hmm, but wait - if saurabh were to write a totaller for colmux, you >>>> could then simply write the data to a file and plot it, couldn't you? >>>> After all, coimux can play back data from multiple logs as well as in >>>> real-time so it sounds like all the pieces may already be there. >>>> >>>> -mark >>>> >>>> On Fri, Jun 19, 2015 at 9:16 AM, Vishal Gupta <[email protected]> >>>> wrote: >>>> >>>>> Mark, >>>>> >>>>> Along with with Saurabh is talking about display the total in colmux, >>>>> It would also be cool to add the aggregation of stats in colplot for a >>>>> cluster of servers. So that one could see 1 graph for entire cluster. This >>>>> is something i could have used with my Oracle Exadata clusters with 22 >>>>> servers in each. When you have coupld of racks clustered together, thats >>>>> 44 >>>>> servers. collect/colmux/colplot are such great utilities, its has been >>>>> life >>>>> saver for me. Though aggregation of stats in colmux and colplot, would >>>>> have >>>>> be so much useful. >>>>> >>>>> Regards, >>>>> >>>>> Vishal Gupta >>>>> >>>>> On 19 June 2015 at 12:38, Mark Seger <[email protected]> wrote: >>>>> >>>>>> That's a cool idea. So if I understand you correctly you might see >>>>>> some sort of total line at the bottom? I can always add to my wish list >>>>>> but no promises. But I think I may also have a solution if you don't >>>>>> mind >>>>>> doing a little extra work on your own ;) btw - can I assume you've >>>>>> installed readkey so you can change sort columns with the arrow keys? >>>>>> >>>>>> If you run colmux with --noesc, it will take it out of full screen >>>>>> more and simply print everything as scrolling output. If you then also >>>>>> include "--lines 99999" (or some big number) it will print all the output >>>>>> from all the remote systems so you don't miss anything. Finally you can >>>>>> pipe the output through perl, python, bash, or whatever your favorite >>>>>> scripting tool might be and do the totals yourself. Then whenever you >>>>>> see >>>>>> a new header fly by, print the totals and reset the counters to 0. You >>>>>> could even add timestamps and maybe even ultimately make it your own >>>>>> opensource project. I bet others would find it useful too. >>>>>> >>>>>> -mark >>>>>> >>>>>> On Thu, Jun 18, 2015 at 9:30 PM, Saurabh Agrawal < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Greetings, >>>>>>> >>>>>>> I am evaluating various system monitoring tools to use one to >>>>>>> monitor my hadoop cluster. >>>>>>> One of the tools I am impressed by is collectl. I have been playing >>>>>>> around with it since a couple of days. >>>>>>> >>>>>>> I am struggling to find how can we aggregate the metrics captured by >>>>>>> collectl when using colmux? >>>>>>> >>>>>>> Say, I have 10 nodes in my hadoop cluster each running collectl as a >>>>>>> service. Using colmux I can see the >>>>>>> performance metrics of each node in a single view (in single and >>>>>>> multi-line formats). Great! >>>>>>> >>>>>>> But what if I am considering aggregate of CPU, IO etc on all the >>>>>>> nodes in the cluster. That is I want to find >>>>>>> how my cluster as a whole is performing by aggregating the >>>>>>> performance metrics from each node into corresponding >>>>>>> numbers, thereby giving me cluster-level metrics instead of >>>>>>> node-level. >>>>>>> >>>>>>> Any help is greatly appreciated. >>>>>>> >>>>>>> Regards, >>>>>>> -Saurabh >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Saurabh S. Agrawal >>>>>>> Memoir <http://saurabhska.wordpress.com/> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Collectl-interest mailing list >>>>>>> [email protected] >>>>>>> https://lists.sourceforge.net/lists/listinfo/collectl-interest >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Collectl-interest mailing list >>>>>> [email protected] >>>>>> https://lists.sourceforge.net/lists/listinfo/collectl-interest >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Regards, >>> Saurabh S. Agrawal >>> Memoir <http://saurabhska.wordpress.com/> >>> >> >> > > > -- > Regards, > Saurabh S. Agrawal > Memoir <http://saurabhska.wordpress.com/> > -- Regards, Saurabh S. Agrawal Memoir <http://saurabhska.wordpress.com/>
------------------------------------------------------------------------------
_______________________________________________ Collectl-interest mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/collectl-interest
