[Pacemaker] Pacemaker failover delays (followup)

2013-03-08 Thread Michael Powell
Andrew,

Thanks for the feedback to my earlier questions from March 6th.  I've done some 
further investigation wrt the timing of what I'd call the "simple" failover 
case:   where an SSID that is master on the DC node is killed, and it takes 
10-12 seconds before the slave SSID on the other node transitions to master.  
(Recall that "SSID" is a SliceServer app instance, each of which is abstracted 
as a Pacemaker resource.)

Before going into my findings, I want to clear up a couple of misstatements on 
my part.

* WRT my mention of "notifications" in my earlier e-mail, I misused the 
term.  I was simply referring to the "notify" events passed from the DC to the 
other node.

* I also misspoke when I said that the failed SSID was subsequently 
restarted as a result of a monitor event.  In fact, the SSID process is 
restarted by the "ss" resource agent script in response  to a "start" event 
from lrmd.

The key issue, however, is the time required - 10 to 12 seconds - from the time 
the master SSID is killed until the slave fails over to become master.  You 
opined that the time required would largely depend upon the behavior of the 
resource agent, which in our case is a script called "ss".  To determine what 
effect the ss script's execution would be, I modified it to log the current 
monotonic system clock value each time it starts, and just before it exits.  
The log messages specify the clock value in ms.

>From this, I did find several instances where the ss script would take just 
>over a second to complete execution.  In each such case, the "culprit" is an 
>exec of "crm_node -p", which is called to determine how many nodes are 
>presently in the cluster.  (I've verified this timing independently by 
>executing "crm_node -p" from a command line when the cluster is quiescent.)  
>This seems like a rather long time for a simple objective.  What would 
>"crm_node -p" do that would take so long?

That notwithstanding, from the POV of the slave during the failover, there are 
delays of several hundred to about 1400ms between the completion of the ss 
script and its invocation for the next event.  To explain, I've attached an 
Excel spreadsheet (which I've verified is virus-free), that documents two 
experiments.  In each case, there's an SSID instance that's master on node-0, 
the DC, and which is killed.  The spreadsheet includes a synopsis of the log 
message that follows on both cans, interleaved into a timeline.

By way of explanation, columns B-D contain timestamp information for node-0 and 
columns E-G for node 1.  Columns B/E show the current time of day, C/F show the 
monotonic clock value when the ss script begins execution (in ms, truncated to 
the least 5 digits), and D/G show the duration of the ss script execution for 
the relevant event.  Column H is text extracted from the log, showing the key 
text.  In some cases there is a significant amount of information in the log 
file relating to pengine behavior, but I omitted such information from the 
spreadsheet.  Column I contains explanatory comments.

Realizing that we need to look forward to upgrading our Pacemaker version (from 
1.0.9), I wonder if you can clear up a couple of questions.  We are presently 
using Heartbeat, which I believe restricts our upgrade to the 1.0 branch, 
correct?  In other words, if we want to upgrade to the 1.1 branch, are we 
required to replace Heartbeat with Corosync?  Secondly, when upgrading, are 
there kernel dependencies to worry about?  We are presently running on the open 
source kernel version 2.6.18.  We plan to migrate to the most current 2.8 or 
3.0 version later this year, at which time it would probably make sense to 
bring Pacemaker up to date.

I apologize for the length of this posting, and again appreciate any assistance 
you can offer.

Regards,
  Michael Powell

[cid:image001.gif@01CE1C07.13969EB0]

Michael Powell
Staff Engineer

15220 NW Greenbrier Pkwy
Suite 290
Beaverton, OR   97006
T 503-372-7327M 503-789-3019   H 503-625-5332

www.harmonicinc.com

<>

07MarTimeline.xlsx
Description: 07MarTimeline.xlsx
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] placement-strategy=minimal - placing and logging

2013-03-08 Thread Vladimir
On Fri, 8 Mar 2013 19:48:11 +0100
Lars Marowsky-Bree  wrote:

> > What are your experience? Is it possible to combine Resource Groups
> > and colocations? Or do I have to give up Resource Groups when using
> > colocations? If so I maybe have to restructure my resource setup to
> > colocations only.
> 
> You can combine them fine.
> 
> > That could be an approach. Sorry if a beginner question but I know
> > policy engine only in form of ptest command. Did you think to parse
> > the output of ptest and crm_mon?
> 
> Yes.
> 
> > Or do you see a more straight forward way to monitor the cluster
> > state?
> 
> It depends on what you want to do. You can add a log message to
> pengine source code to indicate when a resource wasn't placed based on
> utilization; or similarly, indicate the reason why a resource is "not
> running" (as far as the PE can tell) in crm_mon (and then possibly
> hawk). That makes perfect sense, but none of the tools do that today.
> 
> So it's either modifying the code or writing some code that deduces it
> from ptest output.

Thanks for the quick help.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Remove a preferential location of a service

2013-03-08 Thread Michael Schwartzkopff
Am Freitag, 8. März 2013, 16:37:26 schrieb Cristiane França:
> Hi,
> I have a problem when one of the server cluster starts.
> How do I remove a preferential location of a service?
> I want to remove these two configurations shown below:
> 
> 
> ...
>   
> 
>operation="eq" value="primario"/>
> 
>   
>   
> 
>operation="eq" value="primario"/>
> 
>   
> ...
> 
> 
> 
> Regards,
> Cristiane

crm resource unmigrate home_fs
crm resource unmigrate postgresql

-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Joerg Heidrich___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Remove a preferential location of a service

2013-03-08 Thread Cristiane França
Hi,
I have a problem when one of the server cluster starts.
How do I remove a preferential location of a service?
I want to remove these two configurations shown below:


...
  

  

  
  

  

  
...



Regards,
Cristiane
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker is initializing the service before mounting the partition

2013-03-08 Thread Cristiane França
Hi Emmanuel,

Thank you!

Cristiane.


On Fri, Mar 8, 2013 at 2:55 PM, emmanuel segura  wrote:

> You need a order constrain
>
> order fs_after_ms inf: drbd_sistema:promote sistema_fs:start
> order pgsql_afterLfs inf: sistema_fs postgresql
>
> Or maybe you can put fs and pgsql in a group, like that you can use a
> contrais like this
>
> order foo inf: drbd_sistema:promote myservicegroup:start
>
>
>
> 2013/3/8 Cristiane França 
>
>> Hi,
>> My cluster is presenting error on startup of the service postgresql
>> because this service is being initialized before mounting the partition
>> /sistema.
>>
>> How can I configure Pacemaker to start the Postgresql only after mounting
>> the partition /sistema?
>>
>>
>> My server configuration:
>>
>> primitive drbd_sistema ocf:linbit:drbd \
>> params drbd_resource="sistema" \
>> op monitor interval="15s"
>> primitive sistema_fs ocf:heartbeat:Filesystem \
>> params device="/dev/drbd2" directory="/sistema" fstype="ext4"
>>
>>
>> ...
>>
>> primitive postgresql ocf:heartbeat:pgsql \
>> params pgctl="/usr/bin/pg_ctl" psql="/usr/bin/psql" start_opt=""
>> pgdata="/sistema/pgsql/data/" config="/sistema/pgsql/data/postgresq
>> l.conf" pgdba="postgres" \
>> op start interval="0" timeout="120s" \
>> op stop interval="0" timeout="120s" \
>> op monitor interval="30s" timeout="30s" depth="0" \
>> meta target-role="Started"
>>
>>
>>
>> Regards,
>> Cristiane
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] placement-strategy=minimal - placing and logging

2013-03-08 Thread Lars Marowsky-Bree
On 2013-03-08T17:43:54, Vladimir  wrote:

> I already had to work around utilization by defining utilization for
> the first resource in the Resource Group. Defining utilization for
> Resource Groups didn't work for me. Furthermore I can remember that
> there were also problems by colocating Resource Groups. I would like to
> stay by the given resource layout.

Allocation of resource groups with utilization has been improved in
1.1.9. (i.e., it now sums up the resources properly and allocates the
group as a whole; even though it internally works just as you describe
by transferring the whole load to the first resource.)

> What are your experience? Is it possible to combine Resource Groups and
> colocations? Or do I have to give up Resource Groups when using
> colocations? If so I maybe have to restructure my resource setup to
> colocations only.

You can combine them fine.

> That could be an approach. Sorry if a beginner question but I know
> policy engine only in form of ptest command. Did you think to parse
> the output of ptest and crm_mon?

Yes.

> Or do you see a more straight forward way to monitor the cluster
> state?

It depends on what you want to do. You can add a log message to pengine
source code to indicate when a resource wasn't placed based on
utilization; or similarly, indicate the reason why a resource is "not
running" (as far as the PE can tell) in crm_mon (and then possibly
hawk). That makes perfect sense, but none of the tools do that today.

So it's either modifying the code or writing some code that deduces it
from ptest output.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker is initializing the service before mounting the partition

2013-03-08 Thread emmanuel segura
You need a order constrain

order fs_after_ms inf: drbd_sistema:promote sistema_fs:start
order pgsql_afterLfs inf: sistema_fs postgresql

Or maybe you can put fs and pgsql in a group, like that you can use a
contrais like this

order foo inf: drbd_sistema:promote myservicegroup:start



2013/3/8 Cristiane França 

> Hi,
> My cluster is presenting error on startup of the service postgresql
> because this service is being initialized before mounting the partition
> /sistema.
>
> How can I configure Pacemaker to start the Postgresql only after mounting
> the partition /sistema?
>
>
> My server configuration:
>
> primitive drbd_sistema ocf:linbit:drbd \
> params drbd_resource="sistema" \
> op monitor interval="15s"
> primitive sistema_fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd2" directory="/sistema" fstype="ext4"
>
>
> ...
>
> primitive postgresql ocf:heartbeat:pgsql \
> params pgctl="/usr/bin/pg_ctl" psql="/usr/bin/psql" start_opt=""
> pgdata="/sistema/pgsql/data/" config="/sistema/pgsql/data/postgresq
> l.conf" pgdba="postgres" \
> op start interval="0" timeout="120s" \
> op stop interval="0" timeout="120s" \
> op monitor interval="30s" timeout="30s" depth="0" \
> meta target-role="Started"
>
>
>
> Regards,
> Cristiane
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker is initializing the service before mounting the partition

2013-03-08 Thread Cristiane França
Hi,
My cluster is presenting error on startup of the service postgresql because
this service is being initialized before mounting the partition /sistema.

How can I configure Pacemaker to start the Postgresql only after mounting
the partition /sistema?


My server configuration:

primitive drbd_sistema ocf:linbit:drbd \
params drbd_resource="sistema" \
op monitor interval="15s"
primitive sistema_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd2" directory="/sistema" fstype="ext4"


...

primitive postgresql ocf:heartbeat:pgsql \
params pgctl="/usr/bin/pg_ctl" psql="/usr/bin/psql" start_opt=""
pgdata="/sistema/pgsql/data/" config="/sistema/pgsql/data/postgresq
l.conf" pgdba="postgres" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="30s" timeout="30s" depth="0" \
meta target-role="Started"



Regards,
Cristiane
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] placement-strategy=minimal - placing and logging

2013-03-08 Thread Vladimir
On Fri, 8 Mar 2013 12:08:05 +0100
Lars Marowsky-Bree  wrote:

> On 2013-03-08T11:59:33, Vladimir  wrote:
> 
> > Collocations were exactly what I try to avoid. The setup is planned
> > to get >15 resources (and an upper limit is not defined). I think
> > it would get pretty hard to consider all possible collocations,
> > especially if a kind of automated deployment is regarded. Using
> > larger sets of collocation makes the configuration more difficult
> > to read an especially to maintain.
> 
> I see your point. But the collocations don't really get more difficult
> with the number of resources, but mostly node size.

I think node size means the number of nodes. Sorry, I didn't describe it
properly. The >15 resources I mentioned are distributed over at least 3
nodes >(probably more + 1 failover node). The problem I see is that my
setup uses Resource Groups because in my opinion the config stays much
cleaner than colocating multiple resources. The setup looks something
like:

Resource Group: group-01
res-storage-01
res-filesystem-01
res-network-01
res-application-01
Resource Group: group-02
res-storage-02
res-filesystem-02
res-network-02
res-application-02
Resource Group: group-03
res-storage-03
res-filesystem-03
res-network-03
res-application-03

Maybe Collocation Sets could decrease the count of config entries.
Especially to get an overview about the resources I think crm_mon
output is much clearer when using Resource Groups. I haven't found an
efficient way to show the relations between colocations beside wrapper
scripts which parse "crm configure show" output.

I already had to work around utilization by defining utilization for
the first resource in the Resource Group. Defining utilization for
Resource Groups didn't work for me. Furthermore I can remember that
there were also problems by colocating Resource Groups. I would like to
stay by the given resource layout.

What are your experience? Is it possible to combine Resource Groups and
colocations? Or do I have to give up Resource Groups when using
colocations? If so I maybe have to restructure my resource setup to
colocations only.

> > Ok, I see but I'm looking for a possibility to monitor such states
> > to be informed if a resource can't be started because of lack of
> > provided utilization. 
> > 
> > Does anybody has an idea about that issue? 
> 
> Running the PE is the only choice right now. I think with crm_mon
> you'll also be informed about stopped resources; basically you'd want
> to be told about everything not explicitly stopped (e.g.,
> target-role != stopped), right?

That could be an approach. Sorry if a beginner question but I know
policy engine only in form of ptest command. Did you think to parse the
output of ptest and crm_mon? Or do you see a more straight forward way
to monitor the cluster state?





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-03-08 Thread Dejan Muhamedagic
Hi Hideo-san,

On Thu, Mar 07, 2013 at 10:18:09AM +0900, renayama19661...@ybb.ne.jp wrote:
> Hi Dejan,
> 
> The problem was settled with your patch.
> 
> However, I have a question.
> I want to use "resource_set" which Mr. Andrew proposed, but do not understand 
> a method to use with crm shell.
> 
> I read two next cib.xml and confirmed it with crm shell.
> 
> Case 1) sequential="false". 
> (snip)
> 
> 
> 
> 
> 
> 
> 
> 
> (snip)
>  * When I confirm it with crm shell ...
> (snip)
> group master-group vip-master vip-rep
> order test-order : _rsc_set_ ( vip-master vip-rep )
> (snip)

Yes. All size two resource sets get the _rsc_set_ keyword,
otherwise it's not possible to distinguish them from "normal"
constraints. Resource sets are supposed to help cases when it is
necessary to express relation between three or more resources.
Perhaps this case should be an exception.

> Case 2) sequential="true"
> (snip)
> 
>   
> 
>   
>   
> 
>   
> 
> (snip)
>  * When I confirm it with crm shell ...
> (snip)
>group master-group vip-master vip-rep
>xml  \
>  \
>  \
>  \
>  \
> 
> (snip)
> 
> Does the designation of "sequential=true" have to describe it in xml?

sequential=true is the default. In that case it's not possible to
have an unequivocal representation for the same construct and, in
this particular case, the conversion XML->CLI->XML yields a
different XML. There's a later commit which helps here, I think
that it should be possible to backport it to 1.0:

changeset:   789:916d1b15edc3
user:Dejan Muhamedagic 
date:Thu Aug 16 17:01:24 2012 +0200
summary: Medium: cibconfig: drop attributes set to default on cib import

> Is there a right method to appoint an attribute of "resource_set" with crm 
> shell?
> Possibly is not "resource_set" usable with crm shell of Pacemaker1.0.13?

Should work. It's just that using it with two resources, well,
it's sort of unusual use case.

Cheers,

Dejan

> Best Regards,
> Hideo Yamauchi.
> 
> --- On Thu, 2013/3/7, renayama19661...@ybb.ne.jp  
> wrote:
> 
> > Hi Dejan,
> > Hi Andrew,
> > 
> > Thank you for comment.
> > I confirm the movement of the patch and report it.
> > 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > --- On Wed, 2013/3/6, Dejan Muhamedagic  wrote:
> > 
> > > Hi Hideo-san,
> > > 
> > > On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp 
> > > wrote:
> > > > Hi Dejan,
> > > > Hi Andrew,
> > > > 
> > > > As for the crm shell, the check of the meta attribute was revised with 
> > > > the next patch.
> > > > 
> > > >  * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
> > > > 
> > > > This patch was backported in Pacemaker1.0.13.
> > > > 
> > > >  * 
> > > >https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
> > > > 
> > > > However, the ordered,colocated attribute of the group resource is 
> > > > treated as an error when I use crm Shell which adopted this patch.
> > > > 
> > > > --
> > > > (snip)
> > > > ### Group Configuration ###
> > > > group master-group \
> > > >         vip-master \
> > > >         vip-rep \
> > > >         meta \
> > > >                 ordered="false"
> > > > (snip)
> > > > 
> > > > [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
> > > > INFO: building help index
> > > > crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: 
> > > > not fencing unseen nodes
> > > > WARNING: vip-master: specified timeout 60s for start is smaller than 
> > > > the advised 90
> > > > WARNING: vip-master: specified timeout 60s for stop is smaller than the 
> > > > advised 100
> > > > WARNING: vip-rep: specified timeout 60s for start is smaller than the 
> > > > advised 90
> > > > WARNING: vip-rep: specified timeout 60s for stop is smaller than the 
> > > > advised 100
> > > > ERROR: master-group: attribute ordered does not exist  -> WHY?
> > > > Do you still want to commit? y
> > > > --
> > > > 
> > > > If it chooses `yes` by a confirmation message, it is reflected, but it 
> > > > is a problem that error message is displayed.
> > > >  * The error occurs in the same way when I appoint colocated attribute.
> > > > AndI noticed that there was not explanation of ordered,colocated of 
> > > > the group resource in online help of Pacemaker.
> > > > 
> > > > I think that the designation of the ordered,colocated attribute should 
> > > > not become the error in group resource.
> > > > In addition, I think that ordered,colocated should be added to online 
> > > > help.
> > > 
> > > These attributes are not listed in crmsh. Does the attached patch
> > > help?
> > > 
> > > Thanks,
> > > 
> > > Dejan
> > > > 
> > > > Best Regards,
> > > 

Re: [Pacemaker] placement-strategy=minimal - placing and logging

2013-03-08 Thread Lars Marowsky-Bree
On 2013-03-08T11:59:33, Vladimir  wrote:

> Collocations were exactly what I try to avoid. The setup is planned to
> get >15 resources (and an upper limit is not defined). I think it would
> get pretty hard to consider all possible collocations, especially if a
> kind of automated deployment is regarded. Using larger sets of
> collocation makes the configuration more difficult to read an
> especially to maintain.

I see your point. But the collocations don't really get more difficult
with the number of resources, but mostly node size.

And the way you described your setup (N nodes + 1 standby), chunking
them up accordingly isn't that difficult. Assuming identical nodes.

But yes, of course a fully automated redistribution would be
desirable.

Like I said, we take patches, and I think people could be found to
convert cash to patches ;-)

> Ok, I see but I'm looking for a possibility to monitor such states to
> be informed if a resource can't be started because of lack of provided
> utilization. 
> 
> Does anybody has an idea about that issue? 

Running the PE is the only choice right now. I think with crm_mon you'll
also be informed about stopped resources; basically you'd want to be
told about everything not explicitly stopped (e.g., target-role !=
stopped), right?


Good luck,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] placement-strategy=minimal - placing and logging

2013-03-08 Thread Vladimir
On Fri, 8 Mar 2013 11:05:01 +0100
Lars Marowsky-Bree  wrote:

> On 2013-03-07T21:34:47, Vladimir  wrote:
> 
> > All resources are only able to run if they are distributed in the
> > right combination. A working example could like:
> 
> The algorithm is somewhat simplistic, which has the advantage of being
> fast. It works "quite well" in scenarios where there's a number of
> nodes available and differences between resources are not so large,
> but it won't always find the "optimal" solution.
> 
> Optimal placement (rucksack problem) is NP-complete, so all realistic
> algorithms implement heuristics.
> 
> In theory, the case you describe is probably one that can be solved
> easily enough, but the current one doesn't. We do accept patches ;-)
> 
> (If someone is looking for a master's thesis topic, transforming the
> constraints into linear equations and applying an appropriate
> optimization function for location scores/utilization and finding a
> discrete solution using one of the available libraries is probably the
> way to go.)

I understand the main problem. I just hoped there is another approach.

> > Is there a possibility to configure it in another way?
> 
> You can influence the placement of resources using resource
> priorities, but of course that gets a bit hackish for larger
> configurations and not exactly automatic.
> 
> If you know the work packaes exactly, you can also use collocation
> sets. That's quite feasible for a low node count, which is where the
> current algorithm is least effective.

Collocations were exactly what I try to avoid. The setup is planned to
get >15 resources (and an upper limit is not defined). I think it would
get pretty hard to consider all possible collocations, especially if a
kind of automated deployment is regarded. Using larger sets of
collocation makes the configuration more difficult to read an
especially to maintain.

> > Furthermore if there is a lack of configured cores and thus a
> > resource cannot be started I don't see any log messages or crm_mon
> > output.
> 
> This part is normal, yes. You don't get an error message if -inf
> colocation constraints prevent a resource from being placed, either.

Ok, I see but I'm looking for a possibility to monitor such states to
be informed if a resource can't be started because of lack of provided
utilization. 

Does anybody has an idea about that issue? 

crm_simulate got mentioned by Michael but I'm not sure wether it is the
right tool for a monitoring purpose.

> >dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> 
> There have been some fixes/improvements to utilization based placement
> since. But I'm not sure if they'd help you.

I'm not sure either. Furthermore I "have to" use packages provided by
the "regular" ubuntu 12.04 repository.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] placement-strategy=minimal - placing and logging

2013-03-08 Thread Lars Marowsky-Bree
On 2013-03-07T21:34:47, Vladimir  wrote:

> All resources are only able to run if they are distributed in the right
> combination. A working example could like:

The algorithm is somewhat simplistic, which has the advantage of being
fast. It works "quite well" in scenarios where there's a number of nodes
available and differences between resources are not so large, but it
won't always find the "optimal" solution.

Optimal placement (rucksack problem) is NP-complete, so all realistic
algorithms implement heuristics.

In theory, the case you describe is probably one that can be solved
easily enough, but the current one doesn't. We do accept patches ;-)

(If someone is looking for a master's thesis topic, transforming the
constraints into linear equations and applying an appropriate
optimization function for location scores/utilization and finding a
discrete solution using one of the available libraries is probably the
way to go.)

> Is there a possibility to configure it in another way?

You can influence the placement of resources using resource priorities,
but of course that gets a bit hackish for larger configurations and not
exactly automatic.

If you know the work packaes exactly, you can also use collocation sets.
That's quite feasible for a low node count, which is where the current
algorithm is least effective.

> Furthermore if there is a lack of configured cores and thus a resource
> cannot be started I don't see any log messages or crm_mon output.

This part is normal, yes. You don't get an error message if -inf
colocation constraints prevent a resource from being placed, either.

>dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \

There have been some fixes/improvements to utilization based placement
since. But I'm not sure if they'd help you.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org