Re: [Ganglia plugin] Next steps

Rajith Siriwardana Thu, 27 Jun 2013 07:29:51 -0700

Hi all,

Attached/linked diagram [1] shows how the GangliaResourceMonitorFactory
will be integrated to AssignmentMonitor to calculate load.
In here in AssignmentMonitor it keeps the node's load in a static hashmap
(<nodeId, load>) so I guess the *loadMap should be updated in a timely
manner* (ex: 1 min interval) by parsing the ganglia XML right?


Since the load we need is not a traditional value and it's a value which
says how many of these jobs can fit on a machine. So as I understood, the
load calculation should happen a way that, which the most relevant metrics
are taken into calculation and weights should be added to the values. then
the load value should normalize within the range of 0 and 1.
I guess following metrics are the most relevant ones with the default
Ganglia metrics for the calculation.

load_one = one minute load average
load_five = five minutes load average
load_fifteen = fifteen minutes load average

mem_free = amount of available memory
swap_free = amount of available swap memory

Followings are the models currently have in mind.
(I). weight the 1 min, 5 min and 15 min load numbers and normalize the
value.
(II). adding the mem_free and swap_free metrics to the calculation with
model I.

more weight should goes to either 5 or 15. according to [3].
#1. *but how can I rationalize the weights i give?*
#2. furthermore what is the capacity of a Node? since we are talking
about *normalization
what is the role of this capacity?* how it affects this calculation. (when
assigning load to a particular node it calculate something like "if
(loadValue <= (loadCap - curLoad))" inhere loadCap = node.getCapacity() and
curLoad=loadMap.get(node.getNodeId())).intValue() )

Other considerations
#3. what should be the value if the node is offline?

We can say a particular Node is offline by TN and TMAX value. gmetad, a
host is considered offline and is ignored if TN > 4 * TMAX.[2]

(TN :  TN value is the number of seconds since the metric was last
updated TMAX:
The maximum time in seconds between gmetric calls)

*default  ganglia metrics is listed here and your thoughts are welcome.*
disk_free = Disk Space Available
machine_type = System architecture
bytes_out = Number of bytes out per second
gexec = DESC VAL = gexec available
proc_total = Total number of processes
cpu_nice = Percentage of CPU utilization that occurred while executing at
the user level with nice priority
pkts_in = Packets in per second
cpu_speed = CPU Speed in terms of MHz
boottime = The last time that the system was started
cpu_wio = Percentage of time that the CPU or CPUs were idle during which
the system had an outstanding disk I/O request
os_name = Operating system name
load_one = One minute load average
os_release = Operating system release date
disk_total = Total available disk space
cpu_user = Percentage of CPU utilization that occurred while executing at
the user level
cpu_idle = Percentage of time that the CPU or CPUs were idle and the system
did not have an outstanding disk I/O request
swap_free = Amount of available swap memory
mem_cached = Amount of cached memory
pkts_out = Packets out per second
load_five = Five minute load average
cpu_num = Total number of CPUs
load_fifteen  = Fifteen minute load average
mem_free = Amount of available memory
cpu_system = Percentage of CPU utilization that occurred while executing at
the system level
proc_run = Total number of running processes
mem_total = Total amount of memory displayed in KBs
cpu_aidle = Percent of time since boot idle CPU
bytes_in  = Number of bytes in per second
mem_buffers  = Amount of buffered memory
mem_shared = Amount of shared memory
swap_total = Total amount of swap space displayed in KBs
part_max_used = Maximum percent used for all partitions

[1] https://issues.apache.org/jira/secure/attachment/12589911/diagram1.png
[2] http://entropy.gforge.inria.fr/ganglia.html
[3] http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages


Cheers,
Rajith



On Fri, Jun 21, 2013 at 7:22 PM, Rajith Siriwardana <
rajithsiriward...@gmail.com> wrote:

>
> moving the conversation to dev.
>
> Cheers,
> Rajith
>
> On Thu, Jun 20, 2013 at 11:10 AM, Chris Mattmann <chris.mattm...@gmail.com
> > wrote:
>
>> Hi Rajith,
>>
>> RE: #1 yep that's the next step.
>>
>> RE: #2, I would create a pluggable function/class that allows
>> different "Besting" algorithms to be plugged in. One simple one
>> would be AverageLoad (avg between the 3 load values). Another
>> simple would be FiveMinuteLoad; another OneMinLoad; etc. I would
>> also imagine allowing ArbitraryMetricWeightedAvgLoad where it takes
>> in maybe a List<String> specifying the metric names, and then also
>> maybe a HashMap<String, Double> that identifies the metric name,
>> and then the weight to apply in the weighted average, e.g., maybe
>> {{"1minload", "3.0"}, {"5minload", "10.0"}, {"15minload", "1.0"}}
>>
>> indicating that the final load should be calculated as:
>>
>> 3*[val of 1minLoad] + 10*[val of 5minLoad] + 1*[val of 15minLoad]
>> -----------------------------------------------------------------
>>                           3
>> Or something like the above
>>
>> for #3 (use casting and maybe Math.max)?
>>
>> for #4, see above.
>>
>> Also this should all probably go on dev@oodt.apache.org so can
>> you move the conversation there?
>>
>> Cheers,
>> Chris
>>
>> ------------------------
>> Chris Mattmann
>> chris.mattm...@gmail.com
>>
>>
>>
>>
>> -----Original Message-----
>> From: Rajith Siriwardana <rajithsiriward...@gmail.com>
>> Date: Wednesday, June 19, 2013 11:32 AM
>> To: jpluser <chris.a.mattm...@jpl.nasa.gov>, jpluser
>> <chris.mattm...@gmail.com>
>> Subject: [Ganglia plugin] Next steps
>>
>> >Hi Chris,
>> >My next steps would be
>> >
>> >Adding the capability of creating a GangliaAssignmentMonitor from the
>> >GangliaAssignmentMonitorFactory to AssignmentMonitor.
>> >
>> >in that case I have few questions,
>> >
>> >1. GangliaAssignmentMonitor should get the XML downloaded and parsed when
>> >the AssignmentMonitor requests about the nodes current load right? and
>> >this should update the loadMap in AssignmentMonitor?
>> >
>> >2. About the current load, what it should be 15 mins ? 5 mins ? 1 min ?
>> >or should it be an average load. (since the requirement is the current
>> >load, i guess this should be a weighted average of these three load
>> >values)
>> >
>> >3. Ganglia provides the load values as percentage values.  loadMap uses
>> >Integer, how the mapping should happen?
>> >
>> >4. I couldn't find anywhere which require any metric other than the load
>> >of a resource node.
>> >
>> >
>> >
>> >Thank you,
>> >Rajith
>> >
>>
>>
>>
>

Re: [Ganglia plugin] Next steps

Reply via email to