Greetings,

I have developed a shell script that runs on top of the collectl code and
can be used to aggregate metrics at the node/cluster level.
Details: https://github.com/saurabhska/collectl-cluster-monitoring

Regards,
-Saurabh

On Fri, Jun 19, 2015 at 11:09 AM, Saurabh Agrawal <[email protected]>
wrote:

> I see. Let me study things and evaluate the options. I will reach out to
> you soon (if at all I am able to figure it out).
>
> On Fri, Jun 19, 2015 at 10:57 AM, Mark Seger <[email protected]> wrote:
>
>> I do hear what you're saying, I'm just saying i don't know when I might
>> get around to this and that it would be easier to pipe the output through
>> something else.  it would also potentially keep colmux a little cleaner.
>> One of the added complexities is knowing which variable out of hundreds are
>> counters vs absolute values, keeping in mind that colmux also has to deal
>> with plugins it may know nothing about.  IF you had your own script that
>> only dealt with collectl native variables that would be easiest and I'm
>> still wondering how to deal with plugins and nothing immediately comes to
>> mind.
>> -mark
>>
>> On Fri, Jun 19, 2015 at 1:32 PM, Saurabh Agrawal <[email protected]>
>> wrote:
>>
>>> Greetings,
>>>
>>> Thanks Mark for your reply and Vishal for your input (that's exactly the
>>> next thing I'll need:)).
>>>
>>> @Mark,
>>> As of now, I don't know perl scripting. But I am taking this as an
>>> opportunity to learn perl and
>>> contribute back to the open source community. So I am still trying to
>>> figure things out. I believe
>>> that instead of using pipes and then doing the totals, a better way to
>>> do this should be to modify
>>> the original scripts or maybe add a new switch.
>>>
>>> Not sure if I am right, but logically you should be getting these
>>> numbers in some variables in your
>>> scripts before you write them to the file/terminal. So I believe that
>>> aggregating these numbers to
>>> compute the totals should be the right way to do this. Please let me
>>> know your thoughts and any
>>> pointers that you think can be helpful.
>>>
>>> Again, thanks for writing this great tool and making our life easier!
>>>
>>> Regards,
>>> -Saurabh
>>>
>>>
>>> On Fri, Jun 19, 2015 at 7:04 AM, Mark Seger <[email protected]> wrote:
>>>
>>>> I agree, that would be useful.  I'll add it to my 'todo' list and maybe
>>>> some day you'll even see it.  The biggest challenge I see, and when I do
>>>> something I want to cover all cases which can be difficult, is how to deal
>>>> with potential holes in the data.  Another thing in my todo list is to
>>>> report missing data, which almost never happens but when there's a problem
>>>> sometimes collectl gets starved by higher priority processes like
>>>> oomkiller, and you get gaps.  So, if you do get gaps what does one do?
>>>>  simply leave those stats out of the calculations or fill in the blanks
>>>> with the last know values?  hmm, maybe another switch  ;)
>>>>
>>>> hmm, but wait - if saurabh were to write a totaller for colmux, you
>>>> could then simply write the data to a file and plot it, couldn't you?
>>>> After all, coimux can play back data from multiple logs as well as in
>>>> real-time so it sounds like all the pieces may already be there.
>>>>
>>>> -mark
>>>>
>>>> On Fri, Jun 19, 2015 at 9:16 AM, Vishal Gupta <[email protected]>
>>>> wrote:
>>>>
>>>>> Mark,
>>>>>
>>>>> Along with with Saurabh is talking about display the total in colmux,
>>>>> It would also be cool to add the aggregation of stats in colplot for a
>>>>> cluster of servers. So that one could see 1 graph for entire cluster. This
>>>>> is something i could have used with my Oracle Exadata clusters with 22
>>>>> servers in each. When you have coupld of racks clustered together, thats 
>>>>> 44
>>>>> servers. collect/colmux/colplot are such great utilities, its has been 
>>>>> life
>>>>> saver for me. Though aggregation of stats in colmux and colplot, would 
>>>>> have
>>>>> be so much useful.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Vishal Gupta
>>>>>
>>>>> On 19 June 2015 at 12:38, Mark Seger <[email protected]> wrote:
>>>>>
>>>>>> That's a cool idea.  So if I understand you correctly you might see
>>>>>> some sort of total line at the bottom?  I can always add to my wish list
>>>>>> but no promises.  But I think I may also have a solution if you don't 
>>>>>> mind
>>>>>> doing a little extra work on your own  ;)  btw - can I assume you've
>>>>>> installed readkey so you can change sort columns with the arrow keys?
>>>>>>
>>>>>> If you run colmux with --noesc, it will take it out of full screen
>>>>>> more and simply print everything as scrolling output.  If you then also
>>>>>> include "--lines 99999" (or some big number) it will print all the output
>>>>>> from all the remote systems so you don't miss anything.  Finally you can
>>>>>> pipe the output through perl, python, bash, or whatever your favorite
>>>>>> scripting tool might be and do the totals yourself.  Then whenever you 
>>>>>> see
>>>>>> a new header fly by, print the totals and reset the counters to 0.  You
>>>>>> could even add timestamps and maybe even ultimately make it your own
>>>>>> opensource project.  I bet others would find it useful too.
>>>>>>
>>>>>> -mark
>>>>>>
>>>>>> On Thu, Jun 18, 2015 at 9:30 PM, Saurabh Agrawal <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I am evaluating various system monitoring tools to use one to
>>>>>>> monitor my hadoop cluster.
>>>>>>> One of the tools I am impressed by is collectl. I have been playing
>>>>>>> around with it since a couple of days.
>>>>>>>
>>>>>>> I am struggling to find how can we aggregate the metrics captured by
>>>>>>> collectl when using colmux?
>>>>>>>
>>>>>>> Say, I have 10 nodes in my hadoop cluster each running collectl as a
>>>>>>> service. Using colmux I can see the
>>>>>>> performance metrics of each node in a single view (in single and
>>>>>>> multi-line formats). Great!
>>>>>>>
>>>>>>> But what if I am considering aggregate of CPU, IO etc on all the
>>>>>>> nodes in the cluster. That is I want to find
>>>>>>> how my cluster as a whole is performing by aggregating the
>>>>>>> performance metrics from each node into corresponding
>>>>>>> numbers, thereby giving me cluster-level metrics instead of
>>>>>>> node-level.
>>>>>>>
>>>>>>> Any help is greatly appreciated.
>>>>>>>
>>>>>>> Regards,
>>>>>>> -Saurabh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Saurabh S. Agrawal
>>>>>>> Memoir <http://saurabhska.wordpress.com/>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Collectl-interest mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> Collectl-interest mailing list
>>>>>> [email protected]
>>>>>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Saurabh S. Agrawal
>>> Memoir <http://saurabhska.wordpress.com/>
>>>
>>
>>
>
>
> --
> Regards,
> Saurabh S. Agrawal
> Memoir <http://saurabhska.wordpress.com/>
>



-- 
Regards,
Saurabh S. Agrawal
Memoir <http://saurabhska.wordpress.com/>
------------------------------------------------------------------------------
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest

Reply via email to