Re: [Pacemaker] Resource capacity limit

2010-03-04 Thread Andrew Beekhof
On Fri, Oct 30, 2009 at 12:41 PM, Yan Gao  wrote:
> Hi Andrew and Lars,
> The attachment is the first try to implement "Resource capacity limit"
> which is issued by Lars from:
> https://fate.novell.com/303384
>
> Description:
> We need a mechanism for the PE to take resource weight into account to
> prevent nodes from being overloaded.
>
> Resources would require certain minimal values for node attributes
> (this is available right now); however, they would also "consume" them,
> reducing the value of the node attributes for further resource placement.
> (This could be a special flag in the rsc_location rule, for example.)
> If a node does not have enough capacity available, it is not considered.
> ..
>
> User case:
> Xen guests have memory requirements; nodes cannot host more guests than
> the node has physical memory installed.
>
>
> Configuration example:
>
> node yingying \
>        attributes capacity="100"
> primitive dummy0 ocf:heartbeat:Dummy \
>        meta weight="90" priority="2"
> primitive dummy1 ocf:heartbeat:Dummy \
>        meta weight="60" priority="1"
> ..
> property $id="cib-bootstrap-options" \
>        limit-capacity="true"
> ..
>
> Because dummy0 has the higher priority, it'll be running on node "yingying".
> While this node only has "10" (100-90) capacity remaining now, so dummy1 
> cannot
> be running on this node. If there's no other node where it can be running on,
> dummy1 will be stopped.
>
> If we don't want to enable capacity limit. We could set property
> "limit-capacity" to "false", or default it.
>
>
> What do you think about the way it's implemented? Did I do it right?

Just one question, why the new cluster property?
Didn't we already have placement-strategy for that purpose?

>
> I also noticed a likely similar planned feature described in
> http://clusterlabs.org/wiki/Planned_Features
>
> "Implement adaptive service placement (based on the RAM, CPU etc.
> required by the service and made available by the nodes) "
>
> Indeed, this try only supports single kind of capacity, and it's not
> adaptive... Do you already have a thorough consideration about this
> feature?
> Any comments or suggestions are appreciated. Thanks!
>
> Regards,
>  Yan
> --
> y...@novell.com
> Software Engineer
> China Server Team, OPS Engineering
>
> Novell, Inc.
> Making IT Work As One™
>
>
>
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2010-03-04 Thread Yan Gao
Hi Andrew,
I added utilization support for crm_attribute and crm_resource.
Attached the patch. Please let me know if you have any comments or
suggestions on that.

BTW, LF#2351 has been fixed. Attached the patch there:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2351

Thanks,
  Yan
-- 
Yan Gao 
Software Engineer
China Server Team, OPS Engineering, Novell, Inc.
# HG changeset patch
# User Yan Gao 
# Date 1267696081 -28800
# Node ID 692f8d2fa65b1e956f450ccb0664fc50f6f8b7bb
# Parent  a6b66ac53fa969658860964b186548bc514c9455
Dev: Tools: Add utilization support for crm_attribute and crm_resource

diff -r a6b66ac53fa9 -r 692f8d2fa65b crmd/control.c
--- a/crmd/control.c	Thu Mar 04 08:40:17 2010 +0100
+++ b/crmd/control.c	Thu Mar 04 17:48:01 2010 +0800
@@ -86,7 +86,7 @@
 		} else {
 		int rc = update_attr(
 			fsa_cib_conn, cib_quorum_override|cib_scope_local|cib_inhibit_notify,
-			XML_CIB_TAG_CRMCONFIG, NULL, NULL, NULL, XML_ATTR_EXPECTED_VOTES, votes, FALSE);
+			XML_CIB_TAG_CRMCONFIG, NULL, NULL, NULL, NULL, XML_ATTR_EXPECTED_VOTES, votes, FALSE);
 
 		crm_info("Setting expected votes to %s", votes);
 		if(cib_ok > rc) {
diff -r a6b66ac53fa9 -r 692f8d2fa65b crmd/election.c
--- a/crmd/election.c	Thu Mar 04 08:40:17 2010 +0100
+++ b/crmd/election.c	Thu Mar 04 17:48:01 2010 +0800
@@ -444,10 +444,10 @@
 	add_cib_op_callback(fsa_cib_conn, rc, FALSE, NULL, feature_update_callback);
 
 	update_attr(fsa_cib_conn, cib_none, XML_CIB_TAG_CRMCONFIG,
-		NULL, NULL, NULL, "dc-version", VERSION"-"BUILD_VERSION, FALSE);
+		NULL, NULL, NULL, NULL, "dc-version", VERSION"-"BUILD_VERSION, FALSE);
 
 	update_attr(fsa_cib_conn, cib_none, XML_CIB_TAG_CRMCONFIG,
-		NULL, NULL, NULL, "cluster-infrastructure", cluster_type, FALSE);
+		NULL, NULL, NULL, NULL, "cluster-infrastructure", cluster_type, FALSE);
 	
 	mainloop_set_trigger(config_read);
 	free_xml(cib);
diff -r a6b66ac53fa9 -r 692f8d2fa65b crmd/lrm.c
--- a/crmd/lrm.c	Thu Mar 04 08:40:17 2010 +0100
+++ b/crmd/lrm.c	Thu Mar 04 17:48:01 2010 +0800
@@ -1156,7 +1156,7 @@
 	  from_sys, rsc->id);
 
 update_attr(fsa_cib_conn, cib_none, XML_CIB_TAG_CRMCONFIG,
-	NULL, NULL, NULL, "last-lrm-refresh", now_s, FALSE);
+	NULL, NULL, NULL, NULL, "last-lrm-refresh", now_s, FALSE);
 crm_free(now_s);
 			}
 			
diff -r a6b66ac53fa9 -r 692f8d2fa65b include/crm/cib_util.h
--- a/include/crm/cib_util.h	Thu Mar 04 08:40:17 2010 +0100
+++ b/include/crm/cib_util.h	Thu Mar 04 17:48:01 2010 +0800
@@ -50,21 +50,21 @@
 
 extern enum cib_errors update_attr(
 	cib_t *the_cib, int call_options,
-	const char *section, const char *node_uuid, const char *set_name,
+	const char *section, const char *node_uuid, const char *set_type, const char *set_name,
 	const char *attr_id, const char *attr_name, const char *attr_value, gboolean to_console);
 
 extern enum cib_errors find_nvpair_attr(
-cib_t *the_cib, const char *attr, const char *section, const char *node_uuid, const char *set_name,
-const char *attr_id, const char *attr_name, gboolean to_console, char **value);
+cib_t *the_cib, const char *attr, const char *section, const char *node_uuid, const char *set_type,
+const char *set_name, const char *attr_id, const char *attr_name, gboolean to_console, char **value);
 
 extern enum cib_errors read_attr(
 	cib_t *the_cib,
-	const char *section, const char *node_uuid, const char *set_name,
+	const char *section, const char *node_uuid, const char *set_type, const char *set_name,
 	const char *attr_id, const char *attr_name, char **attr_value, gboolean to_console);
 
 extern enum cib_errors delete_attr(
 	cib_t *the_cib, int options, 
-	const char *section, const char *node_uuid, const char *set_name,
+	const char *section, const char *node_uuid, const char *set_type, const char *set_name,
 	const char *attr_id, const char *attr_name, const char *attr_value, gboolean to_console);
 
 extern enum cib_errors query_node_uuid(
diff -r a6b66ac53fa9 -r 692f8d2fa65b lib/cib/cib_attrs.c
--- a/lib/cib/cib_attrs.c	Thu Mar 04 08:40:17 2010 +0100
+++ b/lib/cib/cib_attrs.c	Thu Mar 04 17:48:01 2010 +0800
@@ -47,8 +47,8 @@
 
 extern enum cib_errors
 find_nvpair_attr(
-cib_t *the_cib, const char *attr, const char *section, const char *node_uuid, const char *set_name,
-const char *attr_id, const char *attr_name, gboolean to_console, char **value)
+cib_t *the_cib, const char *attr, const char *section, const char *node_uuid, const char *attr_set_type,
+const char *set_name, const char *attr_id, const char *attr_name, gboolean to_console, char **value)
 {
 int offset = 0;
 static int xpath_max = 1024;
@@ -56,7 +56,13 @@
 
 char *xpath_string = NULL;
 xmlNode *xml_search = NULL;
-const char *set_type = XML_TAG_ATTR_SETS;
+const char *set_type = NULL;
+   
+if (attr_set_type) {
+	set_type = attr_set_type;
+} else {
+	set_type = XML_TAG_ATTR_SETS;
+}
 
 CRM_ASSERT(value != NULL);
 *value = NULL;
@@ 

Re: [Pacemaker] Resource capacity limit

2009-12-13 Thread Yan Gao
Hi Andrew,

Yan Gao wrote:
> On 12/10/09 12:56, Yan Gao wrote:
>> Hi Andrew,
>> Attached the hg export patch against the devel branch for that. Hope
>> that's easier to be merged:-)
> And the patch including the test cases.
The *.score files were adopted the wrong file name extension rather than
"scores". :-\
Attached the patches to rename them. Sorry about that!

Thanks,
  Yan

-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™

# HG changeset patch
# User Yan Gao 
# Date 1260773202 -28800
# Node ID ab416ebb0734839d6e0e51e7f2fb9dac4832a50f
# Parent  f182beaeedab79278301ffb1bb2207e20f25f87f
Low: PE: Repair the file name extensions for several test scores files

diff -r f182beaeedab -r ab416ebb0734 pengine/test10/balanced.score
--- a/pengine/test10/balanced.score	Fri Dec 11 20:19:24 2009 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +
@@ -1,5 +0,0 @@
-Allocation scores:
-native_color: rsc1 allocation score on host1: 0
-native_color: rsc1 allocation score on host2: 0
-native_color: rsc2 allocation score on host1: 0
-native_color: rsc2 allocation score on host2: 0
diff -r f182beaeedab -r ab416ebb0734 pengine/test10/balanced.scores
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/pengine/test10/balanced.scores	Mon Dec 14 14:46:42 2009 +0800
@@ -0,0 +1,5 @@
+Allocation scores:
+native_color: rsc1 allocation score on host1: 0
+native_color: rsc1 allocation score on host2: 0
+native_color: rsc2 allocation score on host1: 0
+native_color: rsc2 allocation score on host2: 0
diff -r f182beaeedab -r ab416ebb0734 pengine/test10/minimal.score
--- a/pengine/test10/minimal.score	Fri Dec 11 20:19:24 2009 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +
@@ -1,5 +0,0 @@
-Allocation scores:
-native_color: rsc1 allocation score on host1: 0
-native_color: rsc1 allocation score on host2: 0
-native_color: rsc2 allocation score on host1: 0
-native_color: rsc2 allocation score on host2: 0
diff -r f182beaeedab -r ab416ebb0734 pengine/test10/minimal.scores
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/pengine/test10/minimal.scores	Mon Dec 14 14:46:42 2009 +0800
@@ -0,0 +1,5 @@
+Allocation scores:
+native_color: rsc1 allocation score on host1: 0
+native_color: rsc1 allocation score on host2: 0
+native_color: rsc2 allocation score on host1: 0
+native_color: rsc2 allocation score on host2: 0
diff -r f182beaeedab -r ab416ebb0734 pengine/test10/utilization.score
--- a/pengine/test10/utilization.score	Fri Dec 11 20:19:24 2009 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +
@@ -1,5 +0,0 @@
-Allocation scores:
-native_color: rsc2 allocation score on host1: 0
-native_color: rsc2 allocation score on host2: 0
-native_color: rsc1 allocation score on host1: 0
-native_color: rsc1 allocation score on host2: 0
diff -r f182beaeedab -r ab416ebb0734 pengine/test10/utilization.scores
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/pengine/test10/utilization.scores	Mon Dec 14 14:46:42 2009 +0800
@@ -0,0 +1,5 @@
+Allocation scores:
+native_color: rsc2 allocation score on host1: 0
+native_color: rsc2 allocation score on host2: 0
+native_color: rsc1 allocation score on host1: 0
+native_color: rsc1 allocation score on host2: 0
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-12-10 Thread Yan Gao
Andrew Beekhof wrote:
> On Thu, Dec 10, 2009 at 9:52 AM, Yan Gao  wrote:
>> On 12/10/09 12:56, Yan Gao wrote:
>>> Hi Andrew,
>>> Attached the hg export patch against the devel branch for that. Hope
>>> that's easier to be merged:-)
>> And the patch including the test cases.
> 
> Done.  Thanks for your efforts!
Thanks for taking care of them!

-- 
Regards,
Yan Gao

y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-12-10 Thread Andrew Beekhof
On Thu, Dec 10, 2009 at 9:52 AM, Yan Gao  wrote:
> On 12/10/09 12:56, Yan Gao wrote:
>> Hi Andrew,
>> Attached the hg export patch against the devel branch for that. Hope
>> that's easier to be merged:-)
> And the patch including the test cases.

Done.  Thanks for your efforts!

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-12-10 Thread Yan Gao
On 12/10/09 12:56, Yan Gao wrote:
> Hi Andrew,
> Attached the hg export patch against the devel branch for that. Hope
> that's easier to be merged:-)
And the patch including the test cases.

Thanks,
  Yan
-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™
# HG changeset patch
# User Yan Gao 
# Date 1260434891 -28800
# Node ID c8013f5b53f018eb5ca2667e0170810e45257489
# Parent  456f25dc72b805e12e5dc32fb23ea8dbe5b8103c
Low: PE: Add regression tests for the new placement strategies

diff -r 456f25dc72b8 -r c8013f5b53f0 pengine/regression.sh
--- a/pengine/regression.sh	Thu Dec 10 12:37:46 2009 +0800
+++ b/pengine/regression.sh	Thu Dec 10 16:48:11 2009 +0800
@@ -328,5 +328,10 @@
 do_test systemhealthp3 "System Health (Progessive) #3"
 
 echo ""
+do_test utilization "Placement Strategy - utilization"
+do_test minimal "Placement Strategy - minimal"
+do_test balanced"Placement Strategy - balanced"
+
+echo ""
 
 test_results
diff -r 456f25dc72b8 -r c8013f5b53f0 pengine/test10/balanced.dot
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/pengine/test10/balanced.dot	Thu Dec 10 16:48:11 2009 +0800
@@ -0,0 +1,19 @@
+digraph "g" {
+"probe_complete host1" -> "probe_complete" [ style = bold]
+"probe_complete host1" [ style=bold color="green" fontcolor="black"  ]
+"probe_complete host2" -> "probe_complete" [ style = bold]
+"probe_complete host2" [ style=bold color="green" fontcolor="black"  ]
+"probe_complete" -> "rsc1_start_0 host2" [ style = bold]
+"probe_complete" -> "rsc2_start_0 host1" [ style = bold]
+"probe_complete" [ style=bold color="green" fontcolor="orange"  ]
+"rsc1_monitor_0 host1" -> "probe_complete host1" [ style = bold]
+"rsc1_monitor_0 host1" [ style=bold color="green" fontcolor="black"  ]
+"rsc1_monitor_0 host2" -> "probe_complete host2" [ style = bold]
+"rsc1_monitor_0 host2" [ style=bold color="green" fontcolor="black"  ]
+"rsc1_start_0 host2" [ style=bold color="green" fontcolor="black"  ]
+"rsc2_monitor_0 host1" -> "probe_complete host1" [ style = bold]
+"rsc2_monitor_0 host1" [ style=bold color="green" fontcolor="black"  ]
+"rsc2_monitor_0 host2" -> "probe_complete host2" [ style = bold]
+"rsc2_monitor_0 host2" [ style=bold color="green" fontcolor="black"  ]
+"rsc2_start_0 host1" [ style=bold color="green" fontcolor="black"  ]
+}
diff -r 456f25dc72b8 -r c8013f5b53f0 pengine/test10/balanced.exp
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/pengine/test10/balanced.exp	Thu Dec 10 16:48:11 2009 +0800
@@ -0,0 +1,110 @@
+
+  
+
+  
+
+
+  
+
+
+  
+  
+
+  
+
+
+  
+
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+  
+
+  
+
+
+  
+
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+
+  
+
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+
+  
+
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+
+  
+
+  
+  
+
+  
+
+  
+
+
diff -r 456f25dc72b8 -r c8013f5b53f0 pengine/test10/balanced.score
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/pengine/test10/balanced.score	Thu Dec 10 16:48:11 2009 +0800
@@ -0,0 +1,5 @@
+Allocation scores:
+native_color: rsc1 allocation score on host1: 0
+native_color: rsc1 allocation score on host2: 0
+native_color: rsc2 allocation score on host1: 0
+native_color: rsc2 allocation score on host2: 0
diff -r 456f25dc72b8 -r c8013f5b53f0 pengine/test10/balanced.xml
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/pengine/test10/balanced.xml	Thu Dec 10 16:48:11 2009 +0800
@@ -0,0 +1,44 @@
+
+  
+
+  
+
+
+
+  
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+
+
+  
+  
+
+
+  
+
diff -r 456f25dc72b8 -r c8013f5b53f0 pengine/test10/minimal.dot
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/pengine/test10/minimal.dot	Thu Dec 10 16:48:11 2009 +0800
@@ -0,0 +1,19 @@
+digraph "g" {
+"probe_complete host1" -> "probe_complete" [ style = bold]
+"probe_complete host1" [ style=bold color="green" fontcolor="black"  ]
+"probe_complete host2" -> "probe_complete" [ style = bold]
+"probe_complete host2" [ style=bold color="green" fontcolor="black"  ]
+"probe_complete" -> "rsc1_start_0 host1" [ style = bold]
+"probe_complete" -> "rsc2_start_0 host1" [ style = bold]
+"probe_complete" [ style=bold color="green" fontcolor="orange"  ]
+"rsc1_monitor_0 host1" -> "probe_complete host1" [ style = bold]

Re: [Pacemaker] Resource capacity limit

2009-12-09 Thread Yan Gao
Hi Andrew,
Attached the hg export patch against the devel branch for that. Hope
that's easier to be merged:-)

Thanks,
  Yan
-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™
# HG changeset patch
# User Yan Gao 
# Date 1260419866 -28800
# Node ID 456f25dc72b805e12e5dc32fb23ea8dbe5b8103c
# Parent  0fbf9c62b0555d4d105f6a038f6846af93c1d9e3
Dev: PE: Implement more resource placement strategies: utilization, minimal and balanced

diff -r 0fbf9c62b055 -r 456f25dc72b8 include/crm/msg_xml.h
--- a/include/crm/msg_xml.h	Mon Aug 10 13:57:42 2009 +0200
+++ b/include/crm/msg_xml.h	Thu Dec 10 12:37:46 2009 +0800
@@ -130,6 +130,7 @@
 #define XML_TAG_ATTRS			"attributes"
 #define XML_TAG_PARAMS			"parameters"
 #define XML_TAG_PARAM			"param"
+#define XML_TAG_UTILIZATION		"utilization"
 
 #define XML_TAG_RESOURCE_REF		"resource_ref"
 #define XML_CIB_TAG_RESOURCE	  	"primitive"
diff -r 0fbf9c62b055 -r 456f25dc72b8 include/crm/pengine/status.h
--- a/include/crm/pengine/status.h	Mon Aug 10 13:57:42 2009 +0200
+++ b/include/crm/pengine/status.h	Thu Dec 10 12:37:46 2009 +0800
@@ -68,6 +68,7 @@
 		char *dc_uuid;
 		node_t *dc_node;
 		const char *stonith_action;
+		const char *placement_strategy;
 
 		unsigned long long flags;
 
@@ -116,6 +117,8 @@
 		
 		GHashTable *attrs;	/* char* => char* */
 		enum node_type type;
+
+		GHashTable *utilization;
 }; 
 
 struct node_s { 
@@ -186,6 +189,7 @@
 
 		GHashTable *meta;	   
 		GHashTable *parameters;
+		GHashTable *utilization;
 
 		GListPtr children;	  /* resource_t* */	
 };
diff -r 0fbf9c62b055 -r 456f25dc72b8 lib/pengine/common.c
--- a/lib/pengine/common.c	Mon Aug 10 13:57:42 2009 +0200
+++ b/lib/pengine/common.c	Thu Dec 10 12:37:46 2009 +0800
@@ -80,6 +80,24 @@
 	return FALSE;
 }
 
+static gboolean
+check_placement_strategy(const char *value)
+{
+	if(safe_str_eq(value, "default")) {
+		return TRUE;
+
+	} else if(safe_str_eq(value, "utilization")) {
+		return TRUE;
+
+	} else if(safe_str_eq(value, "minimal")) {
+		return TRUE;
+
+	} else if(safe_str_eq(value, "balanced")) {
+		return TRUE;
+	}
+	return FALSE;
+}
+
 pe_cluster_option pe_opts[] = {
 	/* name, old-name, validate, default, description */
 	{ "no-quorum-policy", "no_quorum_policy", "enum", "stop, freeze, ignore, suicide", "stop", &check_quorum,
@@ -147,6 +165,10 @@
 	{ "node-health-red", NULL, "integer", NULL, "-INFINITY", &check_number,
 	  "The score 'red' translates to in rsc_location constraints",
 	  "Only used when node-health-strategy is set to custom or progressive." },
+
+	/*Placement Strategy*/
+	{ "placement-strategy", NULL, "enum", "default, utilization, minimal, balanced", "default", &check_placement_strategy,
+	  "The strategy to determine resource placement", NULL},
 };
 
 void
diff -r 0fbf9c62b055 -r 456f25dc72b8 lib/pengine/complex.c
--- a/lib/pengine/complex.c	Mon Aug 10 13:57:42 2009 +0200
+++ b/lib/pengine/complex.c	Thu Dec 10 12:37:46 2009 +0800
@@ -371,6 +371,12 @@
 	if(safe_str_eq(class, "stonith")) {
 	set_bit_inplace(data_set->flags, pe_flag_have_stonith_resource);
 	}
+
+	(*rsc)->utilization = g_hash_table_new_full(
+		g_str_hash, g_str_equal, g_hash_destroy_str, g_hash_destroy_str);
+
+	unpack_instance_attributes(data_set->input, (*rsc)->xml, XML_TAG_UTILIZATION, NULL,
+   (*rsc)->utilization, NULL, FALSE, data_set->now);
 	
 /* 	data_set->resources = g_list_append(data_set->resources, (*rsc)); */
 	return TRUE;
@@ -451,6 +457,9 @@
 	if(rsc->meta != NULL) {
 		g_hash_table_destroy(rsc->meta);
 	}
+	if(rsc->utilization != NULL) {
+		g_hash_table_destroy(rsc->utilization);
+	}
 	if(rsc->parent == NULL && is_set(rsc->flags, pe_rsc_orphan)) {
 		free_xml(rsc->xml);
 	}
diff -r 0fbf9c62b055 -r 456f25dc72b8 lib/pengine/status.c
--- a/lib/pengine/status.c	Mon Aug 10 13:57:42 2009 +0200
+++ b/lib/pengine/status.c	Thu Dec 10 12:37:46 2009 +0800
@@ -159,6 +159,9 @@
 			if(details->attrs != NULL) {
 g_hash_table_destroy(details->attrs);
 			}
+			if(details->utilization != NULL) {
+g_hash_table_destroy(details->utilization);
+			}
 			pe_free_shallow_adv(details->running_rsc, FALSE);
 			pe_free_shallow_adv(details->allocated_rsc, FALSE);
 			crm_free(details);
diff -r 0fbf9c62b055 -r 456f25dc72b8 lib/pengine/unpack.c
--- a/lib/pengine/unpack.c	Mon Aug 10 13:57:42 2009 +0200
+++ b/lib/pengine/unpack.c	Thu Dec 10 12:37:46 2009 +0800
@@ -165,6 +165,9 @@
 	crm_info("Node scores: 'red' = %s, 'yellow' = %s, 'green' = %s",
 		 score2char(node_score_red),score2char(node_score_yellow),
 		 score2char(node_score_green));
+
+	data_set->placement_strategy = pe_pref(data_set->config_hash, "placement-strategy");
+	crm_debug_2("Placement strategy: %s", data_set->placement_strategy);	
 	
 	return TRUE;
 }
@@ -233,6 +236,9 @@
 		new_node->details->attrs= g_hash_table_new_full(
 			g_str_hash, g_str_equal,
 			g_hash_destroy_str, g_hash_destroy_str);
+		new_node->details->utilization  = g_hash_table_new_full(
+			g_str_hash, g_str_equal,

Re: [Pacemaker] Resource capacity limit

2009-11-20 Thread Yan Gao
Hi Andrew,

On 11/20/09 04:10, Andrew Beekhof wrote:
> Btw. You're still missing some test cases ;-)
Oh, right:-) I created some. Hope I created them in the correct way.
Sorry for so many attachments...

Thanks,
  Yan
-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™
digraph "g" {
"probe_complete host1" -> "probe_complete" [ style = bold]
"probe_complete host1" [ style=bold color="green" fontcolor="black"  ]
"probe_complete host2" -> "probe_complete" [ style = bold]
"probe_complete host2" [ style=bold color="green" fontcolor="black"  ]
"probe_complete" -> "rsc1_start_0 host2" [ style = bold]
"probe_complete" -> "rsc2_start_0 host1" [ style = bold]
"probe_complete" [ style=bold color="green" fontcolor="orange"  ]
"rsc1_monitor_0 host1" -> "probe_complete host1" [ style = bold]
"rsc1_monitor_0 host1" [ style=bold color="green" fontcolor="black"  ]
"rsc1_monitor_0 host2" -> "probe_complete host2" [ style = bold]
"rsc1_monitor_0 host2" [ style=bold color="green" fontcolor="black"  ]
"rsc1_start_0 host2" [ style=bold color="green" fontcolor="black"  ]
"rsc2_monitor_0 host1" -> "probe_complete host1" [ style = bold]
"rsc2_monitor_0 host1" [ style=bold color="green" fontcolor="black"  ]
"rsc2_monitor_0 host2" -> "probe_complete host2" [ style = bold]
"rsc2_monitor_0 host2" [ style=bold color="green" fontcolor="black"  ]
"rsc2_start_0 host1" [ style=bold color="green" fontcolor="black"  ]
}

  

  


  


  
  

  


  


  
  

  


  


  

  

  
  

  


  


  
  

  


  


  
  

  


  


  

  

  
  

  

  


  

  
  

  

  
  

  

  


  

  
  

  

  
  

  

  


  

  
  

  

  


Allocation scores:
native_color: rsc1 allocation score on host1: 0
native_color: rsc1 allocation score on host2: 0
native_color: rsc2 allocation score on host1: 0
native_color: rsc2 allocation score on host2: 0

  

  



  


  

  
  

  
  

  
  

  


  

  
  

  
  

  
  

  


  
  


  

digraph "g" {
"probe_complete host1" -> "probe_complete" [ style = bold]
"probe_complete host1" [ style=bold color="green" fontcolor="black"  ]
"probe_complete host2" -> "probe_complete" [ style = bold]
"probe_complete host2" [ style=bold color="green" fontcolor="black"  ]
"probe_complete" -> "rsc1_start_0 host1" [ style = bold]
"probe_complete" -> "rsc2_start_0 host1" [ style = bold]
"probe_complete" [ style=bold color="green" fontcolor="orange"  ]
"rsc1_monitor_0 host1" -> "probe_complete host1" [ style = bold]
"rsc1_monitor_0 host1" [ style=bold color="green" fontcolor="black"  ]
"rsc1_monitor_0 host2" -> "probe_complete host2" [ style = bold]
"rsc1_monitor_0 host2" [ style=bold color="green" fontcolor="black"  ]
"rsc1_start_0 host1" [ style=bold color="green" fontcolor="black"  ]
"rsc2_monitor_0 host1" -> "probe_complete host1" [ style = bold]
"rsc2_monitor_0 host1" [ style=bold color="green" fontcolor="black"  ]
"rsc2_monitor_0 host2" -> "probe_complete host2" [ style = bold]
"rsc2_monitor_0 host2" [ style=bold color="green" fontcolor="black"  ]
"rsc2_start_0 host1" [ style=bold color="green" fontcolor="black"  ]
}

  

  


  


  
  

  


  


  
  

  


  


  

  

  
  

  


  


  
  

  


  


  
  

  


  


  

  

  
  

  

  


  

  
  

  

  
  

  

  


  

  
  

  

  
  

  

  


  

  
  

  

  


Allocation scores:
native_color: rsc1 allocation score on host1: 0
native_color: rsc1 allocation score on host2: 0
native_color: rsc2 allocation score on host1: 0
native_color: rsc2 allocation score on host2: 0

  

  



  


  

  
  

  
  

  
  

  


  

  
  

  
  

  
  

  


  
  


  

digraph "

Re: [Pacemaker] Resource capacity limit

2009-11-19 Thread Andrew Beekhof
Btw. You're still missing some test cases ;-)

On Fri, Nov 13, 2009 at 8:23 AM, Yan Gao  wrote:
> Hi Andrew, Lars,
>
> Andrew Beekhof wrote:
>> I'd like to see the while-block from native_color() be a function that
>> is called from native_assign_node().
> It seems to be too late to filter out the nodes without enough capacity from
> native_assign_node(). I wrote a have_enough_capacity() function which is
> called from native_choose_node() to achieve that.
>
>> And instead of a limit-utilization option, we'd have
>> placement-strategy=(default|utilization|minimal)
> Done. And added a "balanced" option as Lars advised.
>
>>
>> Default ::= what we do now
>> Utilization ::= what you've implemented
>> Minimal ::= what you've implemented _without_ the load balancing we
>> currently do.
>>
>> (Maybe the names could be improved, but hopefully you get the idea).
>>
>> The last one is interesting because it allows us to concentrate
>> services on the minimum number of required nodes (and potentially
>> power some of the others down).
> Done.
>
> Minimal:
> Consider the utilization of nodes and resources. While if a resource has
> the same score for several available nodes, do _not_ balance the load.
> That implies that the resources  will be concentrated to minimal number of 
> nodes.
>
> Balanced:
> Consider the utilization of nodes and resources. If a resource has
> the same score for several available nodes:
> * First, balance the load according to the remaining capacity of nodes.
> (implemented from compare_capacity())
> * If the nodes still have the equal remaining capacity, then balance
> the load according to the numbers of resources that the nodes will run.
>
> The strategies are determined mainly from sort_node_weight(), so I changed the
> prototypes of some functions a bit.
>
> Please help to review and test it. Any comments and suggestions are welcome:-)
>
> Thanks,
>  Yan
>
> --
> y...@novell.com
> Software Engineer
> China Server Team, OPS Engineering
>
> Novell, Inc.
> Making IT Work As One™
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-17 Thread Lars Marowsky-Bree
On 2009-11-13T15:23:20, Yan Gao  wrote:

> Minimal:
> Consider the utilization of nodes and resources. While if a resource has
> the same score for several available nodes, do _not_ balance the load.
> That implies that the resources  will be concentrated to minimal number of 
> nodes.
> 
> Balanced:
> Consider the utilization of nodes and resources. If a resource has
> the same score for several available nodes:
> * First, balance the load according to the remaining capacity of nodes.
> (implemented from compare_capacity())
> * If the nodes still have the equal remaining capacity, then balance
> the load according to the numbers of resources that the nodes will run.
> 
> The strategies are determined mainly from sort_node_weight(), so I changed the
> prototypes of some functions a bit.

Hi Yan Gao,

great work!

But Minimal or Balanced don't quite do what is described above. A linear
assignment doesn't provide anything close to an optimal solution, in
particular if combined with (anti-)collocation rules; solving this
optimally is NP-complete (rucksack problem for the Minimal policy, for
example), though heuristics to get close in sane time exists.

At least this is worth understanding and describing as a limitation of
the current algorithm.


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-13 Thread Andrew Beekhof
On Thu, Nov 12, 2009 at 11:58 PM, Lars Marowsky-Bree  wrote:
> On 2009-11-12T14:53:24, Andrew Beekhof  wrote:
>
>> At this point in time, I can't see us going back to the way heartbeat
>> releases were done.
>> If there was a single thing that I'd credit Pacemaker's current
>> reliability to, it would be our release strategy.
>
> Well, exactly, and that's what pacemaker has been doing, right? Phasing
> in features over time? Successfully? ;-)
>
>> > With increasing coverage of the regression tests, the existing
>> > functionality is protected; which is really the important bit. This
>> > encourages a smooth forward transition.
>> One simply can't test everything.
>
> True, but we do a pretty good job of it.
>
> Or are there any fundamental changes you've queued up?

Yes, stonith and possibly the lrmd will be seeing some changes in the
near future.
There are also a number of configuration changes I want to make.

>
>> > There's a point in having a devel tree (similar to linux-next) before
>> > merging back major features into the trunk, but I don't really subscribe
>> > to the major version flow. That just means that there's a lot of testing
>> > that needs to happen at once, which means more things slip through than
>> > with incremental testing. In my experience, major updates make them a
>> > royal PITA for users.
>> Noted. But for now, I don't think we'll go in that direction.
>
> So you want to change away from a successful model (as in the 1.0.x
> series so far) to a more disruptive one? ;-)

No, I'm suggesting that we won't be changing from what we do now.
I'd just document it.


> If you're saying we don't have resources for people to test a
> development tree, that's true either for one that periodically gets
> merged back into "mainline", as well as for one that gets merged back in
> much larger intervals. In fact, I'd predict it'll be worse for the
> latter model.

Except that no-ones putting a gun to people's head making them use the
new stuff.
Thats the point of cutting off development at some point, so that
there is always something stable to use while we (and other people
that must have whatever cool new features we added) get the next
series into shape.

You'd have a point if 0.6 was deleted the second 1.0 came out, but its
been a year and I've still not turned away a 0.6 bug yet.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-13 Thread Andrew Beekhof
On Fri, Nov 13, 2009 at 8:23 AM, Yan Gao  wrote:
> Hi Andrew, Lars,
>
> Andrew Beekhof wrote:
>> I'd like to see the while-block from native_color() be a function that
>> is called from native_assign_node().
> It seems to be too late to filter out the nodes without enough capacity from
> native_assign_node(). I wrote a have_enough_capacity() function which is
> called from native_choose_node() to achieve that.

Ah, yes, thats what I meant.
Well done interpreting my vague design :-)

>
>> And instead of a limit-utilization option, we'd have
>> placement-strategy=(default|utilization|minimal)
> Done. And added a "balanced" option as Lars advised.
>
>>
>> Default ::= what we do now
>> Utilization ::= what you've implemented
>> Minimal ::= what you've implemented _without_ the load balancing we
>> currently do.
>>
>> (Maybe the names could be improved, but hopefully you get the idea).
>>
>> The last one is interesting because it allows us to concentrate
>> services on the minimum number of required nodes (and potentially
>> power some of the others down).
> Done.
>
> Minimal:
> Consider the utilization of nodes and resources. While if a resource has
> the same score for several available nodes, do _not_ balance the load.
> That implies that the resources  will be concentrated to minimal number of 
> nodes.
>
> Balanced:
> Consider the utilization of nodes and resources. If a resource has
> the same score for several available nodes:
> * First, balance the load according to the remaining capacity of nodes.
> (implemented from compare_capacity())
> * If the nodes still have the equal remaining capacity, then balance
> the load according to the numbers of resources that the nodes will run.
>
> The strategies are determined mainly from sort_node_weight(), so I changed the
> prototypes of some functions a bit.
>
> Please help to review and test it. Any comments and suggestions are welcome:-)

Will do.  Thanks!

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-12 Thread Yan Gao
Hi Andrew, Lars,

Andrew Beekhof wrote:
> I'd like to see the while-block from native_color() be a function that
> is called from native_assign_node().
It seems to be too late to filter out the nodes without enough capacity from
native_assign_node(). I wrote a have_enough_capacity() function which is
called from native_choose_node() to achieve that.

> And instead of a limit-utilization option, we'd have
> placement-strategy=(default|utilization|minimal)
Done. And added a "balanced" option as Lars advised.

> 
> Default ::= what we do now
> Utilization ::= what you've implemented
> Minimal ::= what you've implemented _without_ the load balancing we
> currently do.
> 
> (Maybe the names could be improved, but hopefully you get the idea).
> 
> The last one is interesting because it allows us to concentrate
> services on the minimum number of required nodes (and potentially
> power some of the others down).
Done.

Minimal:
Consider the utilization of nodes and resources. While if a resource has
the same score for several available nodes, do _not_ balance the load.
That implies that the resources  will be concentrated to minimal number of 
nodes.

Balanced:
Consider the utilization of nodes and resources. If a resource has
the same score for several available nodes:
* First, balance the load according to the remaining capacity of nodes.
(implemented from compare_capacity())
* If the nodes still have the equal remaining capacity, then balance
the load according to the numbers of resources that the nodes will run.

The strategies are determined mainly from sort_node_weight(), so I changed the
prototypes of some functions a bit.

Please help to review and test it. Any comments and suggestions are welcome:-)

Thanks,
  Yan

-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™
diff -r f49a0cab20aa include/crm/msg_xml.h
--- a/include/crm/msg_xml.h	Thu Nov 12 12:18:10 2009 +0100
+++ b/include/crm/msg_xml.h	Fri Nov 13 14:08:16 2009 +0800
@@ -130,6 +130,7 @@
 #define XML_TAG_ATTRS			"attributes"
 #define XML_TAG_PARAMS			"parameters"
 #define XML_TAG_PARAM			"param"
+#define XML_TAG_UTILIZATION		"utilization"
 
 #define XML_TAG_RESOURCE_REF		"resource_ref"
 #define XML_CIB_TAG_RESOURCE	  	"primitive"
diff -r f49a0cab20aa include/crm/pengine/status.h
--- a/include/crm/pengine/status.h	Thu Nov 12 12:18:10 2009 +0100
+++ b/include/crm/pengine/status.h	Fri Nov 13 14:08:16 2009 +0800
@@ -68,6 +68,7 @@
 		char *dc_uuid;
 		node_t *dc_node;
 		const char *stonith_action;
+		const char *placement_strategy;
 
 		unsigned long long flags;
 
@@ -116,6 +117,8 @@
 		
 		GHashTable *attrs;	/* char* => char* */
 		enum node_type type;
+
+		GHashTable *utilization;
 }; 
 
 struct node_s { 
@@ -186,6 +189,7 @@
 
 		GHashTable *meta;	   
 		GHashTable *parameters;
+		GHashTable *utilization;
 
 		GListPtr children;	  /* resource_t* */	
 };
diff -r f49a0cab20aa lib/pengine/common.c
--- a/lib/pengine/common.c	Thu Nov 12 12:18:10 2009 +0100
+++ b/lib/pengine/common.c	Fri Nov 13 14:08:16 2009 +0800
@@ -80,6 +80,24 @@
 	return FALSE;
 }
 
+static gboolean
+check_placement_strategy(const char *value)
+{
+	if(safe_str_eq(value, "default")) {
+		return TRUE;
+
+	} else if(safe_str_eq(value, "utilization")) {
+		return TRUE;
+
+	} else if(safe_str_eq(value, "minimal")) {
+		return TRUE;
+
+	} else if(safe_str_eq(value, "balanced")) {
+		return TRUE;
+	}
+	return FALSE;
+}
+
 pe_cluster_option pe_opts[] = {
 	/* name, old-name, validate, default, description */
 	{ "no-quorum-policy", "no_quorum_policy", "enum", "stop, freeze, ignore, suicide", "stop", &check_quorum,
@@ -147,6 +165,10 @@
 	{ "node-health-red", NULL, "integer", NULL, "-INFINITY", &check_number,
 	  "The score 'red' translates to in rsc_location constraints",
 	  "Only used when node-health-strategy is set to custom or progressive." },
+
+	/*Placement Strategy*/
+	{ "placement-strategy", NULL, "enum", "default, utilization, minimal, balanced", "default", &check_placement_strategy,
+	  "The strategy to determine resource placement", NULL},
 };
 
 void
diff -r f49a0cab20aa lib/pengine/complex.c
--- a/lib/pengine/complex.c	Thu Nov 12 12:18:10 2009 +0100
+++ b/lib/pengine/complex.c	Fri Nov 13 14:08:16 2009 +0800
@@ -371,6 +371,12 @@
 	if(safe_str_eq(class, "stonith")) {
 	set_bit_inplace(data_set->flags, pe_flag_have_stonith_resource);
 	}
+
+	(*rsc)->utilization = g_hash_table_new_full(
+		g_str_hash, g_str_equal, g_hash_destroy_str, g_hash_destroy_str);
+
+	unpack_instance_attributes(data_set->input, (*rsc)->xml, XML_TAG_UTILIZATION, NULL,
+   (*rsc)->utilization, NULL, FALSE, data_set->now);
 	
 /* 	data_set->resources = g_list_append(data_set->resources, (*rsc)); */
 	return TRUE;
@@ -451,6 +457,9 @@
 	if(rsc->meta != NULL) {
 		g_hash_table_destroy(rsc->meta);
 	}
+	if(rsc->utilization != NULL) {
+		g_hash_table_destroy(rsc->utilization);
+	}
 	if(rsc->parent == NULL && is_set(rsc->flags, pe_rsc_orphan)) {
 		free_xm

Re: [Pacemaker] Resource capacity limit

2009-11-12 Thread Steven Dake
On Thu, 2009-11-12 at 14:53 +0100, Andrew Beekhof wrote:
> On Wed, Nov 11, 2009 at 1:36 PM, Lars Marowsky-Bree  wrote:
> > On 2009-11-05T14:45:36, Andrew Beekhof  wrote:
> >
> >> Lastly, I would really like to defer this for 1.2
> >> I know I've bent the rules a bit for 1.0 in the past, but its really
> >> late in the game now.
> >
> > Personally, I think the Linux kernel model works really well. ie, no
> > "major releases" any more, but bugfixes and features alike get merged
> > over time and constantly.
> 
> Thats a great model if you've got hoards of developers and testers.
> Of which we have neither.
> 
> At this point in time, I can't see us going back to the way heartbeat
> releases were done.
> If there was a single thing that I'd credit Pacemaker's current
> reliability to, it would be our release strategy.

Maintaining corosync and openais, I'd surely like to only have one tree
where all work is done and never have a "stable" branch.  Andrew is
right though, this model only works if there is large downstream
adoption and support and distros take on the work of stabilizing the
efforts of the trunk development.

Talking with distros I know this is generally not the case with any
package other then kernel.org and maybe some related bits like xen/kvm
(which has forced this model upon them).

Regards
-steve




___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-12 Thread Lars Marowsky-Bree
On 2009-11-12T14:53:24, Andrew Beekhof  wrote:

> At this point in time, I can't see us going back to the way heartbeat
> releases were done.
> If there was a single thing that I'd credit Pacemaker's current
> reliability to, it would be our release strategy.

Well, exactly, and that's what pacemaker has been doing, right? Phasing
in features over time? Successfully? ;-)

> > With increasing coverage of the regression tests, the existing
> > functionality is protected; which is really the important bit. This
> > encourages a smooth forward transition.
> One simply can't test everything.

True, but we do a pretty good job of it.

Or are there any fundamental changes you've queued up?

> > There's a point in having a devel tree (similar to linux-next) before
> > merging back major features into the trunk, but I don't really subscribe
> > to the major version flow. That just means that there's a lot of testing
> > that needs to happen at once, which means more things slip through than
> > with incremental testing. In my experience, major updates make them a
> > royal PITA for users.
> Noted. But for now, I don't think we'll go in that direction.

So you want to change away from a successful model (as in the 1.0.x
series so far) to a more disruptive one? ;-)

If you're saying we don't have resources for people to test a
development tree, that's true either for one that periodically gets
merged back into "mainline", as well as for one that gets merged back in
much larger intervals. In fact, I'd predict it'll be worse for the
latter model.

I mean, sure, it's your project, but I really wonder if it's a good
direction to go. Having done this for over a decade, I can honestly tell
that major upgrades are always a pain. They are never smooth. Many small
steps over time are better. Just consider that and make the bets
choices ;-)


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-12 Thread Andrew Beekhof
On Wed, Nov 11, 2009 at 1:36 PM, Lars Marowsky-Bree  wrote:
> On 2009-11-05T14:45:36, Andrew Beekhof  wrote:
>
>> Lastly, I would really like to defer this for 1.2
>> I know I've bent the rules a bit for 1.0 in the past, but its really
>> late in the game now.
>
> Personally, I think the Linux kernel model works really well. ie, no
> "major releases" any more, but bugfixes and features alike get merged
> over time and constantly.

Thats a great model if you've got hoards of developers and testers.
Of which we have neither.

At this point in time, I can't see us going back to the way heartbeat
releases were done.
If there was a single thing that I'd credit Pacemaker's current
reliability to, it would be our release strategy.

>
> With increasing coverage of the regression tests, the existing
> functionality is protected; which is really the important bit. This
> encourages a smooth forward transition.

One simply can't test everything.

> There's a point in having a devel tree (similar to linux-next) before
> merging back major features into the trunk, but I don't really subscribe
> to the major version flow. That just means that there's a lot of testing
> that needs to happen at once, which means more things slip through than
> with incremental testing. In my experience, major updates make them a
> royal PITA for users.

Noted. But for now, I don't think we'll go in that direction.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-12 Thread Andrew Beekhof
On Wed, Nov 11, 2009 at 1:42 PM, Lars Marowsky-Bree  wrote:
> On 2009-11-06T12:45:17, Andrew Beekhof  wrote:
>
>> And instead of a limit-utilization option, we'd have
>> placement-strategy=(default|utilization|minimal)
>>
>> Default ::= what we do now
>> Utilization ::= what you've implemented
>
> These two are obvious, since we can already do them with existing code.
>
> The following:
>
>> Minimal ::= what you've implemented _without_ the load balancing we
>> currently do.
>
> (Basically, concentrate load on as few nodes as possible. Rucksack
> problem.)
>
> To this I'd like to add
>
> Balanced ::= try to spread the load as evenly as possible. This is hard
> to define - perhaps "maximise average free resources on nodes".
>
> These latter two are harder, and basically require a linear optimization
> engine to be integrated. But I'd, of course, love to see them.

Of no question there, just trying to at least be prepared for it so
that we don't have to change the option name(s).

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-11 Thread Lars Marowsky-Bree
On 2009-11-09T10:52:03, Michael Schwartzkopff  wrote:

> I just think it would be cool solution to make the cluster itself
> doing the work if condfigured to do so. So the CRM (or the RAs) should
> have the abaility to monitor the resource consumption of resources
> dynamically. This automatism would make the live of admins much easier
> and they would not be forced to do the scripting by their own.

Automatically, and possibly dynamically, figuring out the load incurred
by a specific resource, and its min/avg/peak limits, is an extremely
hard problem.

Yes, it is very cool, but outside the scope of Pacemaker itself. With
these patches to take the load into account, Pacemaker is equipped to
take such input from monitoring frameworks, but I don't think Pacemaker
itself should be this monitoring tool.

For VMs, this is somewhat easier (compared to individual resources) to
monitor, since the hypervisor/Dom0 has access to this data: memory
consumption, CPU utilization over N minutes, disk IO/network etc. I'd
very very much love to see this added.

Perhaps the "monitor" op for the RA could handle this.



Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-11 Thread Lars Marowsky-Bree
On 2009-11-06T12:45:17, Andrew Beekhof  wrote:

> And instead of a limit-utilization option, we'd have
> placement-strategy=(default|utilization|minimal)
> 
> Default ::= what we do now
> Utilization ::= what you've implemented

These two are obvious, since we can already do them with existing code.

The following:

> Minimal ::= what you've implemented _without_ the load balancing we
> currently do.

(Basically, concentrate load on as few nodes as possible. Rucksack
problem.)

To this I'd like to add 

Balanced ::= try to spread the load as evenly as possible. This is hard
to define - perhaps "maximise average free resources on nodes".

These latter two are harder, and basically require a linear optimization
engine to be integrated. But I'd, of course, love to see them. 

(Automatically powering down nodes is not that trivial, since we'd need
some way to wake them up in-time; STONITH actually can do that, but it
needs some thinking to get right. At least though those nodes could go
to power savings mode, so it'd definitely help.)

With those, Pacemaker would be a full-scale replacement for certain data
center management and automation frameworks ;-)


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-11 Thread Lars Marowsky-Bree
On 2009-11-05T14:45:36, Andrew Beekhof  wrote:

> Lastly, I would really like to defer this for 1.2
> I know I've bent the rules a bit for 1.0 in the past, but its really
> late in the game now.

Personally, I think the Linux kernel model works really well. ie, no
"major releases" any more, but bugfixes and features alike get merged
over time and constantly.

With increasing coverage of the regression tests, the existing
functionality is protected; which is really the important bit. This
encourages a smooth forward transition.

There's a point in having a devel tree (similar to linux-next) before
merging back major features into the trunk, but I don't really subscribe
to the major version flow. That just means that there's a lot of testing
that needs to happen at once, which means more things slip through than
with incremental testing. In my experience, major updates make them a
royal PITA for users.

Just my few euro cents ;-)


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-09 Thread Michael Schwartzkopff
Am Freitag, 6. November 2009 08:49:06 schrieb Andrew Beekhof:
> On Fri, Nov 6, 2009 at 8:31 AM, Michael Schwartzkopff  
wrote:
> > Am Donnerstag, 5. November 2009 21:37:23 schrieb Andrew Beekhof:
> >> On Thu, Nov 5, 2009 at 8:50 PM, Michael Schwartzkopff
> >> 
> >
> > wrote:
> >> > Hi,
> >> >
> >> > on the list was a discussion about resource capacity limits. Yan Gao
> >> > also implemented it.
> >> >
> >> > As far as I understood the discussion the solution is to attach nodes
> >> > and resources capacity limits. Resources are distributed on the nodes
> >> > of a cluster according to its capacatiy needs. They would be migrated
> >> > or shutdown if the capacity limits on the node are not met.
> >> >
> >> > My question is: Can the capacity figures of the resources be made
> >> > dynamically?
> >>
> >> You can, but you probably don't want to.
> >> For example, free RAM and CPU load are two things that absolutely make
> >> no sense to include in such calculations.
> >>
> >> Consider how it works:
> >>
> >> - The node starts and reports 2Gb of RAM
> >> - We place a service there that reserves 512Mb
> >> - The cluster knows there is 1.5Gb remaining
> >> - We place two more services there that also reserve 512Mb each
> >>
> >> If the amount of RAM at the beginning was the amount free, then when
> >> you updated it to be 512Mb the PE would run and stop two of the
> >> resources!
> >
> > Stop. I do not want to make the capacity of the nodes dynamically but the
> > actual resource consumption of the resources (i.e the database).
>
> Ah, ok, that makes more sense.
> Well like any part of the configuration it can be changed at any time.
> What we don't have though is a nice cli tool like crm_attribute for doing
> so.

I know the crm_attribute command. Of course someone can always do somw 
scripting to acchieve the aim I described above.

I just think it would be cool solution to make the cluster itself doing the 
work if condfigured to do so. So the CRM (or the RAs) should have the abaility 
to monitor the resource consumption of resources dynamically. This automatism 
would make the live of admins much easier and they would not be forced to do 
the scripting by their own.

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-08 Thread Yan Gao
Hi,
Andrew Beekhof wrote:
> 
> I've been thinking about this more and while this will work, I think
> we can make it better.
> 
> I'd like to see the while-block from native_color() be a function that
> is called from native_assign_node().
Okay.

> And instead of a limit-utilization option, we'd have
> placement-strategy=(default|utilization|minimal)
> 
> Default ::= what we do now
> Utilization ::= what you've implemented
> Minimal ::= what you've implemented _without_ the load balancing we
> currently do.
> 
> (Maybe the names could be improved, but hopefully you get the idea).
Great! There would be more policy options for users.

> 
> The last one is interesting because it allows us to concentrate
> services on the minimum number of required nodes (and potentially
> power some of the others down).
Right. I'll look it into.

Thanks,
  Yan

-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-06 Thread Dejan Muhamedagic
Hi,

On Thu, Nov 05, 2009 at 07:28:16PM +0100, Andrew Beekhof wrote:
> On Thu, Nov 5, 2009 at 5:25 PM, Dejan Muhamedagic  wrote:
> > Hi,
> >> Which reminds me, I need to get devel sorted out...
> >
> > While you're at it, perhaps it would be good to rethink the
> > release policy. For me in particular it would be great to know at
> > least one week in advance when there'll be a release. For the
> > general public as well, since they'll have a chance to do some
> > testing of the new code.
> 
> The idea is that people can always pull from stable-1.0, in theory it
> should never be broken.
> If you're in the middle of stuff, keep it locally until you're done.
> 
> Generally though, I start testing on the 15th of every month.
> I thought I said that somewhere... I know the releases page indicates
> the month (if its delayed as it was due to my move).
> 
> But I'm thinking of moving to a bi-monthly cycle.  Thoughts?

Agreed. In principle, somewhat slower release process should
result in better releases.

> > I know that you do test before
> > releasing, but the more people test in various environments, the
> > more bugs found. Also, it may be good to introduce and announce
> > the feature freeze point, after which only bug fixes will be
> > accepted.
> 
> Well in theory that point is x.y.0
> I've been turning a blind eye to your changes in the shell because its
> still very immature (I don't mean that negatively, its just new code).

That should of course change as soon as the shell supports all
CIB constructs (which is not far away). But we all understood
that it made no sense to keep those changes out :)

Thanks,

Dejan

> Though I've a history of allowing isolated, non-invasive features if
> we've not yet planned the next stable series (basically what happened
> for the node health stuff from IBM).   Its a case-by-case thing, but
> I'd agree that we could do with documenting this.
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-05 Thread Andrew Beekhof
On Fri, Nov 6, 2009 at 8:31 AM, Michael Schwartzkopff  wrote:
> Am Donnerstag, 5. November 2009 21:37:23 schrieb Andrew Beekhof:
>> On Thu, Nov 5, 2009 at 8:50 PM, Michael Schwartzkopff 
> wrote:
>> > Hi,
>> >
>> > on the list was a discussion about resource capacity limits. Yan Gao also
>> > implemented it.
>> >
>> > As far as I understood the discussion the solution is to attach nodes and
>> > resources capacity limits. Resources are distributed on the nodes of a
>> > cluster according to its capacatiy needs. They would be migrated or
>> > shutdown if the capacity limits on the node are not met.
>> >
>> > My question is: Can the capacity figures of the resources be made
>> > dynamically?
>>
>> You can, but you probably don't want to.
>> For example, free RAM and CPU load are two things that absolutely make
>> no sense to include in such calculations.
>>
>> Consider how it works:
>>
>> - The node starts and reports 2Gb of RAM
>> - We place a service there that reserves 512Mb
>> - The cluster knows there is 1.5Gb remaining
>> - We place two more services there that also reserve 512Mb each
>>
>> If the amount of RAM at the beginning was the amount free, then when
>> you updated it to be 512Mb the PE would run and stop two of the
>> resources!
>
> Stop. I do not want to make the capacity of the nodes dynamically but the
> actual resource consumption of the resources (i.e the database).

Ah, ok, that makes more sense.
Well like any part of the configuration it can be changed at any time.
What we don't have though is a nice cli tool like crm_attribute for doing so.

>
> It sometimes happens that resource (i.e. RAM) consumption of resources varies
> and I wanted to make that number dynamical. So after some kind of damping the
> node that started swapping out the resources could migrate resources to a node
> with free capacities.
>
>> You always want to feed the cluster the total amount of RAM installed
>> which, at most, you'd query when the cluster starts on that node.
>
> The amount of capacity of any resource (RAM, CPU, ...) of a node should be
> fixed. That makes sense because a node does not get more resources if switched
> on.
>
> Greetings,
>
> Michael.
>
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-05 Thread Michael Schwartzkopff
Am Donnerstag, 5. November 2009 21:37:23 schrieb Andrew Beekhof:
> On Thu, Nov 5, 2009 at 8:50 PM, Michael Schwartzkopff  
wrote:
> > Hi,
> >
> > on the list was a discussion about resource capacity limits. Yan Gao also
> > implemented it.
> >
> > As far as I understood the discussion the solution is to attach nodes and
> > resources capacity limits. Resources are distributed on the nodes of a
> > cluster according to its capacatiy needs. They would be migrated or
> > shutdown if the capacity limits on the node are not met.
> >
> > My question is: Can the capacity figures of the resources be made
> > dynamically?
>
> You can, but you probably don't want to.
> For example, free RAM and CPU load are two things that absolutely make
> no sense to include in such calculations.
>
> Consider how it works:
>
> - The node starts and reports 2Gb of RAM
> - We place a service there that reserves 512Mb
> - The cluster knows there is 1.5Gb remaining
> - We place two more services there that also reserve 512Mb each
>
> If the amount of RAM at the beginning was the amount free, then when
> you updated it to be 512Mb the PE would run and stop two of the
> resources!

Stop. I do not want to make the capacity of the nodes dynamically but the 
actual resource consumption of the resources (i.e the database).

It sometimes happens that resource (i.e. RAM) consumption of resources varies 
and I wanted to make that number dynamical. So after some kind of damping the 
node that started swapping out the resources could migrate resources to a node 
with free capacities.

> You always want to feed the cluster the total amount of RAM installed
> which, at most, you'd query when the cluster starts on that node.

The amount of capacity of any resource (RAM, CPU, ...) of a node should be 
fixed. That makes sense because a node does not get more resources if switched 
on.

Greetings,

Michael.


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-05 Thread Yan Gao
Hi Andrew,

Andrew Beekhof wrote:
> On Tue, Nov 3, 2009 at 12:15 PM, Yan Gao  wrote:
>> Hi again,
>>
>> Yan Gao wrote:
>>
>> XML sample:
>> ..
>>
>>  
>>
>>  
>>  
>>
>>  
>>
>> ..
>>
>>  
>>
>>  
>>  
>>
>>  
>>  
>>
>>  
>>  
>>
>>  
>>
>> ..
>>
>> Please kindly review it...
>> Any suggestions are appreciated!
> 
> Whats the behavior if a node has either no utilization block or no
> value for that attribute?
If so, it would be regarded that the node has no any capacity or no that
specific capacity.

> Can a resource with a utilization block still be placed there?
No, unless the utilization block is blank. As long as any attribute is set
in resource utilization, which means the resource requests some kind of
capacity, while a node has no that capacity, the node would not be considered.

A interesting case is, if a resource has no utilization block, it would be 
regarded
that the resource doesn't consume any capacity. so it could be placed on any 
node,
even the node has no utilization block (no any capacity).

Do you think the behavior is reasonable?

Thanks,
  Yan
-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-05 Thread Yan Gao
Hi Andrew,
Thanks for your reply!

Andrew Beekhof wrote:
> On Wed, Nov 4, 2009 at 5:41 PM, Lars Marowsky-Bree  wrote:
>> On 2009-11-03T19:15:59, Yan Gao  wrote:
>>
>>> XML sample:
>>> ..
>>> 
>>>   
>>> 
>>>   
>>>   
>>> 
>>>   
>>> 
>>> ..
>>> 
>>>   
>>> 
>>>   
>>>   
>>> 
>>>   
>>>   
>>> 
>>>   
>>>   
>>> 
>>>   
>>> 
>>> ..
>>>
>>> Please kindly review it...
>>> Any suggestions are appreciated!
>> I think this is exactly what we need. Great job!
>>
>> Code looks good too.
>>
>> Andrew?
> 
> Four things...
> 
> Do we still need the limit-utilization option?
> I guess it might be nice to be able to turn it off globally... was
> that the intention here?
Sorry, missed it in the sample, while it has been implemented in the codes:-)
Yes, it's "limit-utilization" property, and default to "false".
So the working XML sample should be :
..

  
...

  


  

  
  

  
...

  

  
  

  
  

  
  

  

..

> 
> The next one is minor, there should at least be a debug message when
> we filter out a node in native_color()
> Thats the sort of thing thats going to mess with people :-)
Indeed :-) Added one and attached the revised patch.
> 
> There also needs to be some PE regression tests for this (and be sure
> to run the existing ones to make sure they don't break).
Right.

> 
> Lastly, I would really like to defer this for 1.2
Agree too.

> I know I've bent the rules a bit for 1.0 in the past, but its really
> late in the game now.
> 
> Which reminds me, I need to get devel sorted out...
:-)

Thanks again!

Best regards,
  Yan
-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™
diff -r c81e55653fba include/crm/msg_xml.h
--- a/include/crm/msg_xml.h	Fri Oct 16 14:26:27 2009 +0200
+++ b/include/crm/msg_xml.h	Fri Nov 06 15:20:33 2009 +0800
@@ -130,6 +130,7 @@
 #define XML_TAG_ATTRS			"attributes"
 #define XML_TAG_PARAMS			"parameters"
 #define XML_TAG_PARAM			"param"
+#define XML_TAG_UTILIZATION		"utilization"
 
 #define XML_TAG_RESOURCE_REF		"resource_ref"
 #define XML_CIB_TAG_RESOURCE	  	"primitive"
diff -r c81e55653fba include/crm/pengine/status.h
--- a/include/crm/pengine/status.h	Fri Oct 16 14:26:27 2009 +0200
+++ b/include/crm/pengine/status.h	Fri Nov 06 15:20:33 2009 +0800
@@ -58,6 +58,8 @@
 #define pe_flag_start_failure_fatal	0x1000ULL
 #define pe_flag_remove_after_stop	0x2000ULL
 
+#define pe_flag_limit_utilization	0x0001ULL
+
 
 typedef struct pe_working_set_s 
 {
@@ -116,6 +118,8 @@
 		
 		GHashTable *attrs;	/* char* => char* */
 		enum node_type type;
+
+		GHashTable *utilization;
 }; 
 
 struct node_s { 
@@ -186,6 +190,7 @@
 
 		GHashTable *meta;	   
 		GHashTable *parameters;
+		GHashTable *utilization;
 
 		GListPtr children;	  /* resource_t* */	
 };
diff -r c81e55653fba lib/pengine/common.c
--- a/lib/pengine/common.c	Fri Oct 16 14:26:27 2009 +0200
+++ b/lib/pengine/common.c	Fri Nov 06 15:20:33 2009 +0800
@@ -147,6 +147,10 @@
 	{ "node-health-red", NULL, "integer", NULL, "-INFINITY", &check_number,
 	  "The score 'red' translates to in rsc_location constraints",
 	  "Only used when node-health-strategy is set to custom or progressive." },
+
+	/*Resource utilization*/
+	{ "limit-utilization", NULL, "boolean", NULL, "false", &check_boolean,
+	  "Limit the resource utilization of nodes to avoid being overloaded", NULL},
 };
 
 void
diff -r c81e55653fba lib/pengine/complex.c
--- a/lib/pengine/complex.c	Fri Oct 16 14:26:27 2009 +0200
+++ b/lib/pengine/complex.c	Fri Nov 06 15:20:33 2009 +0800
@@ -371,6 +371,12 @@
 	if(safe_str_eq(class, "stonith")) {
 	set_bit_inplace(data_set->flags, pe_flag_have_stonith_resource);
 	}
+
+	(*rsc)->utilization = g_hash_table_new_full(
+		g_str_hash, g_str_equal, g_hash_destroy_str, g_hash_destroy_str);
+
+	unpack_instance_attributes(data_set->input, (*rsc)->xml, XML_TAG_UTILIZATION, NULL,
+   (*rsc)->utilization, NULL, FALSE, data_set->now);
 	
 /* 	data_set->resources = g_list_append(data_set->resources, (*rsc)); */
 	return TRUE;
@@ -451,6 +457,9 @@
 	if(rsc->meta != NULL) {
 		g_hash_table_destroy(rsc->meta);
 	}
+	if(rsc->utilization != NULL) {
+		g_hash_table_destroy(rsc->utilization);
+	}
 	if(rsc->parent == NULL && is_set(rsc->flags, pe_rsc_orphan)) {
 		free_xml(rsc->xml);
 	}
diff -r c81e55653fba lib/pengine/status.c
--- a/lib/pengine/status.c	Fri Oct 16 14:26:27 2009 +0200
+++ b/lib/pengine/status.c	Fri Nov 06 15:20:33 2009 +0800
@@ -159,6 +159,9 @@
 			if(details->attrs != NULL) {
 g_hash_table_destroy(details->attrs);
 			}
+			if(details->utilization != NULL) {
+g_hash_table_destroy(details->utilization);
+			}
 			pe_free_shallow_adv(details->running_rsc, FALSE);
 			pe_free_s

Re: [Pacemaker] Resource capacity limit

2009-11-05 Thread Andrew Beekhof
On Thu, Nov 5, 2009 at 8:50 PM, Michael Schwartzkopff  wrote:
> Hi,
>
> on the list was a discussion about resource capacity limits. Yan Gao also
> implemented it.
>
> As far as I understood the discussion the solution is to attach nodes and
> resources capacity limits. Resources are distributed on the nodes of a cluster
> according to its capacatiy needs. They would be migrated or shutdown if the
> capacity limits on the node are not met.
>
> My question is: Can the capacity figures of the resources be made dynamically?

You can, but you probably don't want to.
For example, free RAM and CPU load are two things that absolutely make
no sense to include in such calculations.

Consider how it works:

- The node starts and reports 2Gb of RAM
- We place a service there that reserves 512Mb
- The cluster knows there is 1.5Gb remaining
- We place two more services there that also reserve 512Mb each

If the amount of RAM at the beginning was the amount free, then when
you updated it to be 512Mb the PE would run and stop two of the
resources!

You always want to feed the cluster the total amount of RAM installed
which, at most, you'd query when the cluster starts on that node.
>
> So i.e. for every monitor operation the CRM updates the capacity usage figures
> of the resource. So the cluster could react dynamically oin the actual
> capacity of a resource.
>
> Greetings,
>
> Michael.
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Resource capacity limit

2009-11-05 Thread Michael Schwartzkopff
Hi,

on the list was a discussion about resource capacity limits. Yan Gao also 
implemented it.

As far as I understood the discussion the solution is to attach nodes and 
resources capacity limits. Resources are distributed on the nodes of a cluster 
according to its capacatiy needs. They would be migrated or shutdown if the 
capacity limits on the node are not met.

My question is: Can the capacity figures of the resources be made dynamically?

So i.e. for every monitor operation the CRM updates the capacity usage figures 
of the resource. So the cluster could react dynamically oin the actual 
capacity of a resource.

Greetings,

Michael.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-04 Thread Lars Marowsky-Bree
On 2009-11-03T19:15:59, Yan Gao  wrote:

> XML sample:
> ..
> 
>   
> 
>   
>   
> 
>   
> 
> ..
> 
>   
> 
>   
>   
> 
>   
>   
> 
>   
>   
> 
>   
> 
> ..
> 
> Please kindly review it...
> Any suggestions are appreciated!

I think this is exactly what we need. Great job!

Code looks good too.

Andrew?


For added kicks, which may be something Andrew can add more readily, I
wonder if utilization should also be subject to time-based evaluation.
Think of a database needing more horsepower on weekdays - but perhaps
that is something that should wait until dynamic load balancing
happens.


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-03 Thread Yan Gao
Hi again,

Yan Gao wrote:
> Hi Lars,
> Thanks for the great suggestions!
> 
> Lars Marowsky-Bree wrote:
>> On 2009-10-30T19:41:35, Yan Gao  wrote:
>>> Configuration example:
>>>
>>> node yingying \
>>> attributes capacity="100"
>>> primitive dummy0 ocf:heartbeat:Dummy \
>>> meta weight="90" priority="2"
>>> primitive dummy1 ocf:heartbeat:Dummy \
>>> meta weight="60" priority="1"
>>> ..
>>> property $id="cib-bootstrap-options" \
>>> limit-capacity="true"
>> First, I would prefer not to contaminate the regular node attribute
>> namespace; the word "capacity" might already be used. Second, the
>> "weight" is just one dimension, which is somewhat difficult.
>>
>> I'd propose to introduce a new XML element, "resource_utilization" (name
>> to be decided ;-) containing a "nvset", and which can be used in a node
>> element or a resource primitive.
>>
>> This creates a new namespace, avoiding clashes, and distinguishes the
>> utilization parameters from the other various attributes.
>>
>> Further, it trivially allows for several user-defined metrics.
> Right, great idea! I'll try to implement it if Andrew is OK with that either:)
> 
Done and attached.

XML sample:
..

  

  
  

  

..

  

  
  

  
  

  
  

  

..

Please kindly review it...
Any suggestions are appreciated!

Thanks,
  Yan
-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™
diff -r c81e55653fba include/crm/msg_xml.h
--- a/include/crm/msg_xml.h	Fri Oct 16 14:26:27 2009 +0200
+++ b/include/crm/msg_xml.h	Tue Nov 03 19:02:22 2009 +0800
@@ -130,6 +130,7 @@
 #define XML_TAG_ATTRS			"attributes"
 #define XML_TAG_PARAMS			"parameters"
 #define XML_TAG_PARAM			"param"
+#define XML_TAG_UTILIZATION		"utilization"
 
 #define XML_TAG_RESOURCE_REF		"resource_ref"
 #define XML_CIB_TAG_RESOURCE	  	"primitive"
diff -r c81e55653fba include/crm/pengine/status.h
--- a/include/crm/pengine/status.h	Fri Oct 16 14:26:27 2009 +0200
+++ b/include/crm/pengine/status.h	Tue Nov 03 19:02:22 2009 +0800
@@ -58,6 +58,8 @@
 #define pe_flag_start_failure_fatal	0x1000ULL
 #define pe_flag_remove_after_stop	0x2000ULL
 
+#define pe_flag_limit_utilization	0x0001ULL
+
 
 typedef struct pe_working_set_s 
 {
@@ -116,6 +118,8 @@
 		
 		GHashTable *attrs;	/* char* => char* */
 		enum node_type type;
+
+		GHashTable *utilization;
 }; 
 
 struct node_s { 
@@ -186,6 +190,7 @@
 
 		GHashTable *meta;	   
 		GHashTable *parameters;
+		GHashTable *utilization;
 
 		GListPtr children;	  /* resource_t* */	
 };
diff -r c81e55653fba lib/pengine/common.c
--- a/lib/pengine/common.c	Fri Oct 16 14:26:27 2009 +0200
+++ b/lib/pengine/common.c	Tue Nov 03 19:02:22 2009 +0800
@@ -147,6 +147,10 @@
 	{ "node-health-red", NULL, "integer", NULL, "-INFINITY", &check_number,
 	  "The score 'red' translates to in rsc_location constraints",
 	  "Only used when node-health-strategy is set to custom or progressive." },
+
+	/*Resource utilization*/
+	{ "limit-utilization", NULL, "boolean", NULL, "false", &check_boolean,
+	  "Limit the resource utilization of nodes to avoid being overloaded", NULL},
 };
 
 void
diff -r c81e55653fba lib/pengine/complex.c
--- a/lib/pengine/complex.c	Fri Oct 16 14:26:27 2009 +0200
+++ b/lib/pengine/complex.c	Tue Nov 03 19:02:22 2009 +0800
@@ -371,6 +371,12 @@
 	if(safe_str_eq(class, "stonith")) {
 	set_bit_inplace(data_set->flags, pe_flag_have_stonith_resource);
 	}
+
+	(*rsc)->utilization = g_hash_table_new_full(
+		g_str_hash, g_str_equal, g_hash_destroy_str, g_hash_destroy_str);
+
+	unpack_instance_attributes(data_set->input, (*rsc)->xml, XML_TAG_UTILIZATION, NULL,
+   (*rsc)->utilization, NULL, FALSE, data_set->now);
 	
 /* 	data_set->resources = g_list_append(data_set->resources, (*rsc)); */
 	return TRUE;
@@ -451,6 +457,9 @@
 	if(rsc->meta != NULL) {
 		g_hash_table_destroy(rsc->meta);
 	}
+	if(rsc->utilization != NULL) {
+		g_hash_table_destroy(rsc->utilization);
+	}
 	if(rsc->parent == NULL && is_set(rsc->flags, pe_rsc_orphan)) {
 		free_xml(rsc->xml);
 	}
diff -r c81e55653fba lib/pengine/status.c
--- a/lib/pengine/status.c	Fri Oct 16 14:26:27 2009 +0200
+++ b/lib/pengine/status.c	Tue Nov 03 19:02:22 2009 +0800
@@ -159,6 +159,9 @@
 			if(details->attrs != NULL) {
 g_hash_table_destroy(details->attrs);
 			}
+			if(details->utilization != NULL) {
+g_hash_table_destroy(details->utilization);
+			}
 			pe_free_shallow_adv(details->running_rsc, FALSE);
 			pe_free_shallow_adv(details->allocated_rsc, FALSE);
 			crm_free(details);
diff -r c81e55653fba lib/pengine/unpack.c
--- a/lib/pengine/unpack.c	Fri Oct 16 14:26:27 2009 +0200
+++ b/lib/pengine/unpack.c	Tue Nov 03 19:02:22 2009 +0800
@@ -165,6 +165,10 @@
 	crm_info("Node scores: 'red' = %s, 'yellow' = %s, 'green' = %s",
 		 score2char(node_score_red),score2char(node_score_yellow),
 		 score2char(node_s

Re: [Pacemaker] Resource capacity limit

2009-11-01 Thread Yan Gao
Hi Lars,
Thanks for the great suggestions!

Lars Marowsky-Bree wrote:
> On 2009-10-30T19:41:35, Yan Gao  wrote:
>>
>> Configuration example:
>>
>> node yingying \
>>  attributes capacity="100"
>> primitive dummy0 ocf:heartbeat:Dummy \
>>  meta weight="90" priority="2"
>> primitive dummy1 ocf:heartbeat:Dummy \
>>  meta weight="60" priority="1"
>> ..
>> property $id="cib-bootstrap-options" \
>>  limit-capacity="true"
> 
> First, I would prefer not to contaminate the regular node attribute
> namespace; the word "capacity" might already be used. Second, the
> "weight" is just one dimension, which is somewhat difficult.
> 
> I'd propose to introduce a new XML element, "resource_utilization" (name
> to be decided ;-) containing a "nvset", and which can be used in a node
> element or a resource primitive.
> 
> This creates a new namespace, avoiding clashes, and distinguishes the
> utilization parameters from the other various attributes.
> 
> Further, it trivially allows for several user-defined metrics.
Right, great idea! I'll try to implement it if Andrew is OK with that either:)

> 
> node hex-0 \
>   utilization memory="4096" cpu="8"
> ...
> primitive dummy0 ocf:heartbeat:Dummy \
>   meta priority="2"
>   utilization memory="2048" cpu="2"
> primitive dummy1 ocf:heartbeat:Dummy \
>   utilization memory="3012"
> primitive dummy2 ocf:heartbeat:Dummy \
>   utilization cpu="6"
> 
> dummy0 + dummy2 could both be placed on hex-0, or dummy1+dummy2, but not
> dummy0 + dummy1.
> 
> "Placement allowed where none of the utilization parameters would become
> negative." (ie, iterate over the utilization attributes specified for
> the resource.)
> 
>> I also noticed a likely similar planned feature described in
>> http://clusterlabs.org/wiki/Planned_Features
>>
>> "Implement adaptive service placement (based on the RAM, CPU etc.
>> required by the service and made available by the nodes) "
>>
>> Indeed, this try only supports single kind of capacity, and it's not
>> adaptive... Do you already have a thorough consideration about this
>> feature?
> 
> I think this is a two phase feature for the PE: The first phase is what
> you propose - make sure we do not overload any given node, basically
> implementing hard limits.
> 
> The second phase would be for the PE to actually try to "optimize"
> placement, and try to solve the constraints imposed by the utilization
> versus capacity scores to a) place as many resources as possible
> successfully, and b) to either spread them thinly (load distribution) or
> condensed (load concentration, think power savings by being able to put
> some nodes to sleep).
> 
> The first phase should, IMHO, be quite easy to implement. The second one
> is significantly more difficult, and we'd need to pull in an
> optimization library to solve this for us. It's conceivable that for
> this to happen, we'd need to disable the normal "rsc_location" rules
> altogether because they'd interfere badly. (And interesting to note that
> the rsc_collocation constraints can be mapped into this scheme and
> entirely handled by this solver.)
> 
> There is the "adaptive" bit, of course, where the utilization of the
> resources and the nodes is automatically determined and adjusted based
> on utilization monitoring. This is even more challenging and frequently
> considered a research problem.
> 
> 
> In summary, I think phase one is urgently needed; thankfully, it is
> straightforward to solve too, and the admin can influence placement with
> priorities and scoring sufficiently to avoid resources being offlined
> due to resource collisions too frequently.
> 
> Phase two is a "solved problem" from an algorithmic point of view, but
> implementing it is probably not quite as trivial. I'd welcome to see
> this happening too.
> 
Thanks for the information!

Best regards,
  Yan

-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-10-30 Thread Raoul Bhatia [IPAX]
hi,

On 10/30/2009 01:20 PM, Lars Marowsky-Bree wrote:
> I think this is a two phase feature for the PE: The first phase is what
> you propose - make sure we do not overload any given node, basically
> implementing hard limits.
> 
> The second phase would be for the PE to actually try to "optimize"
> placement, and try to solve the constraints imposed by the utilization
> versus capacity scores to a) place as many resources as possible
> successfully, and b) to either spread them thinly (load distribution) or
> condensed (load concentration, think power savings by being able to put
> some nodes to sleep).

i just want to let you know that i think that this is a marvelous
addition to pacemaker!

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-10-30 Thread Lars Marowsky-Bree
On 2009-10-30T19:41:35, Yan Gao  wrote:

Hi Yan Gao,

excellent!

Before reviewing the code, lets review the interface/configuration
though.

> User case:
> Xen guests have memory requirements; nodes cannot host more guests than
> the node has physical memory installed.
> 
> 
> Configuration example:
> 
> node yingying \
>   attributes capacity="100"
> primitive dummy0 ocf:heartbeat:Dummy \
>   meta weight="90" priority="2"
> primitive dummy1 ocf:heartbeat:Dummy \
>   meta weight="60" priority="1"
> ..
> property $id="cib-bootstrap-options" \
>   limit-capacity="true"

First, I would prefer not to contaminate the regular node attribute
namespace; the word "capacity" might already be used. Second, the
"weight" is just one dimension, which is somewhat difficult.

I'd propose to introduce a new XML element, "resource_utilization" (name
to be decided ;-) containing a "nvset", and which can be used in a node
element or a resource primitive.

This creates a new namespace, avoiding clashes, and distinguishes the
utilization parameters from the other various attributes.

Further, it trivially allows for several user-defined metrics.

node hex-0 \
utilization memory="4096" cpu="8"
...
primitive dummy0 ocf:heartbeat:Dummy \
meta priority="2"
utilization memory="2048" cpu="2"
primitive dummy1 ocf:heartbeat:Dummy \
utilization memory="3012"
primitive dummy2 ocf:heartbeat:Dummy \
utilization cpu="6"

dummy0 + dummy2 could both be placed on hex-0, or dummy1+dummy2, but not
dummy0 + dummy1.

"Placement allowed where none of the utilization parameters would become
negative." (ie, iterate over the utilization attributes specified for
the resource.)

> If we don't want to enable capacity limit. We could set property
> "limit-capacity" to "false", or default it.

Right, a cluster property to globally disable/enable this is a very good
idea.

> I also noticed a likely similar planned feature described in
> http://clusterlabs.org/wiki/Planned_Features
> 
> "Implement adaptive service placement (based on the RAM, CPU etc.
> required by the service and made available by the nodes) "
> 
> Indeed, this try only supports single kind of capacity, and it's not
> adaptive... Do you already have a thorough consideration about this
> feature?

I think this is a two phase feature for the PE: The first phase is what
you propose - make sure we do not overload any given node, basically
implementing hard limits.

The second phase would be for the PE to actually try to "optimize"
placement, and try to solve the constraints imposed by the utilization
versus capacity scores to a) place as many resources as possible
successfully, and b) to either spread them thinly (load distribution) or
condensed (load concentration, think power savings by being able to put
some nodes to sleep).

The first phase should, IMHO, be quite easy to implement. The second one
is significantly more difficult, and we'd need to pull in an
optimization library to solve this for us. It's conceivable that for
this to happen, we'd need to disable the normal "rsc_location" rules
altogether because they'd interfere badly. (And interesting to note that
the rsc_collocation constraints can be mapped into this scheme and
entirely handled by this solver.)

There is the "adaptive" bit, of course, where the utilization of the
resources and the nodes is automatically determined and adjusted based
on utilization monitoring. This is even more challenging and frequently
considered a research problem.


In summary, I think phase one is urgently needed; thankfully, it is
straightforward to solve too, and the admin can influence placement with
priorities and scoring sufficiently to avoid resources being offlined
due to resource collisions too frequently.

Phase two is a "solved problem" from an algorithmic point of view, but
implementing it is probably not quite as trivial. I'd welcome to see
this happening too.

Adaptive placement ... anyone who wants to write a master or phd thesis
around? ;-)


Best,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Resource capacity limit

2009-10-30 Thread Yan Gao
Hi Andrew and Lars,
The attachment is the first try to implement "Resource capacity limit"
which is issued by Lars from:
https://fate.novell.com/303384

Description:
We need a mechanism for the PE to take resource weight into account to
prevent nodes from being overloaded.

Resources would require certain minimal values for node attributes
(this is available right now); however, they would also "consume" them,
reducing the value of the node attributes for further resource placement.
(This could be a special flag in the rsc_location rule, for example.)
If a node does not have enough capacity available, it is not considered.
..

User case:
Xen guests have memory requirements; nodes cannot host more guests than
the node has physical memory installed.


Configuration example:

node yingying \
attributes capacity="100"
primitive dummy0 ocf:heartbeat:Dummy \
meta weight="90" priority="2"
primitive dummy1 ocf:heartbeat:Dummy \
meta weight="60" priority="1"
..
property $id="cib-bootstrap-options" \
limit-capacity="true"
..

Because dummy0 has the higher priority, it'll be running on node "yingying".
While this node only has "10" (100-90) capacity remaining now, so dummy1 cannot
be running on this node. If there's no other node where it can be running on,
dummy1 will be stopped.

If we don't want to enable capacity limit. We could set property
"limit-capacity" to "false", or default it.


What do you think about the way it's implemented? Did I do it right?

I also noticed a likely similar planned feature described in
http://clusterlabs.org/wiki/Planned_Features

"Implement adaptive service placement (based on the RAM, CPU etc.
required by the service and made available by the nodes) "

Indeed, this try only supports single kind of capacity, and it's not
adaptive... Do you already have a thorough consideration about this
feature?
Any comments or suggestions are appreciated. Thanks!

Regards,
  Yan
-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™



diff -r 462f1569a437 include/crm/msg_xml.h
--- a/include/crm/msg_xml.h	Mon Aug 10 16:42:41 2009 +0200
+++ b/include/crm/msg_xml.h	Fri Oct 30 19:06:34 2009 +0800
@@ -155,6 +155,7 @@
 #define XML_RSC_ATTR_FAIL_TIMEOUT	"failure-timeout"
 #define XML_RSC_ATTR_MULTIPLE		"multiple-active"
 #define XML_RSC_ATTR_PRIORITY		"priority"
+#define XML_RSC_ATTR_WEIGHT		"weight"
 #define XML_OP_ATTR_ON_FAIL		"on-fail"
 #define XML_OP_ATTR_START_DELAY		"start-delay"
 #define XML_OP_ATTR_ALLOW_MIGRATE	"allow-migrate"
diff -r 462f1569a437 include/crm/pengine/status.h
--- a/include/crm/pengine/status.h	Mon Aug 10 16:42:41 2009 +0200
+++ b/include/crm/pengine/status.h	Fri Oct 30 19:06:34 2009 +0800
@@ -58,6 +58,8 @@
 #define pe_flag_start_failure_fatal	0x1000ULL
 #define pe_flag_remove_after_stop	0x2000ULL
 
+#define pe_flag_limit_capacity		0x0001ULL
+
 
 typedef struct pe_working_set_s 
 {
@@ -111,6 +113,7 @@
 		gboolean expected_up;
 		gboolean is_dc;
 		int	 num_resources;
+		int	 remain_capacity;
 		GListPtr running_rsc;	/* resource_t* */
 		GListPtr allocated_rsc;	/* resource_t* */
 		
@@ -168,6 +171,7 @@
 		int	 failure_timeout;
 		int	 effective_priority; 
 		int	 migration_threshold;
+		int	 weight;
 
 		unsigned long long flags;
 	
diff -r 462f1569a437 lib/pengine/common.c
--- a/lib/pengine/common.c	Mon Aug 10 16:42:41 2009 +0200
+++ b/lib/pengine/common.c	Fri Oct 30 19:06:34 2009 +0800
@@ -147,6 +147,10 @@
 	{ "node-health-red", NULL, "integer", NULL, "-INFINITY", &check_number,
 	  "The score 'red' translates to in rsc_location constraints",
 	  "Only used when node-health-strategy is set to custom or progressive." },
+
+	/*Capacity*/
+	{ "limit-capacity", NULL, "boolean", NULL, "false", &check_boolean,
+	  "Limit the capacity of nodes to avoid being overloaded", NULL},
 };
 
 void
diff -r 462f1569a437 lib/pengine/complex.c
--- a/lib/pengine/complex.c	Mon Aug 10 16:42:41 2009 +0200
+++ b/lib/pengine/complex.c	Fri Oct 30 19:06:34 2009 +0800
@@ -352,6 +352,11 @@
 	/* call crm_get_msec() and convert back to seconds */
 	(*rsc)->failure_timeout = (crm_get_msec(value) / 1000);
 	}
+
+	value = g_hash_table_lookup((*rsc)->meta, XML_RSC_ATTR_WEIGHT);
+	if(value != NULL) {
+	(*rsc)->weight = crm_parse_int(value, "0");
+	}
 	
 	value = g_hash_table_lookup((*rsc)->meta, XML_RSC_ATTR_TARGET_ROLE);
 	if(is_set(data_set->flags, pe_flag_stop_everything)) {
diff -r 462f1569a437 lib/pengine/unpack.c
--- a/lib/pengine/unpack.c	Mon Aug 10 16:42:41 2009 +0200
+++ b/lib/pengine/unpack.c	Fri Oct 30 19:06:34 2009 +0800
@@ -165,6 +165,10 @@
 	crm_info("Node scores: 'red' = %s, 'yellow' = %s, 'green' = %s",
 		 score2char(node_score_red),score2char(node_score_yellow),
 		 score2char(node_score_green));
+
+	set_config_flag(data_set, "limit-capacity", pe_flag_limit_capacity);
+	crm_debug_2("Limit capacity: %s",
+		is_set(data_set->flags, pe_flag_limit_capacity)?"true":"false");	
 	
 	retur