Re: [ClusterLabs] Pacemaker with Zookeeper??

2016-06-14 Thread Andrew Beekhof
tl;dr - dont port pacemaker, use pacemaker-remote instead

On Wed, May 18, 2016 at 5:20 PM, Jan Friesse  wrote:
> Ken Gaillot napsal(a):
>
>> On 05/17/2016 09:54 AM, Digimer wrote:
>>>
>>> On 16/05/16 04:35 AM, Bogdan Dobrelya wrote:

 On 05/16/2016 09:23 AM, Jan Friesse wrote:
>>
>> Hi,
>>
>> I have an idea: use Pacemaker with Zookeeper (instead of Corosync). Is
>> it possible?
>> Is there any examination about that?


 Indeed, would be *great* to have a Pacemaker based control plane on top
 of other "pluggable" distributed KVS & messaging systems, for example
 etcd as well :)
 I'm looking forward to joining any dev efforts around that, although I'm
 not a Java or Go developer.
>>>
>>>
>>> Part of open source is the freedom to do whatever you want, of course.
>>>
>>> Let me ask though; What problems would zookeeper, etcd or other systems
>>> solve that can't be solved in corosync?
>>>
>>> I ask because the HA community just finished a multi-year effort to
>>> merge different projects into one common HA stack. This has a lot of
>>> benefits to the user base, not least of which is lack of confusion.
>>>
>>> Strikes me that the significant time investment in supporting a new
>>> comms layer would be much more beneficially spent on improving the
>>> existing stack.
>>>
>>> Again, anyone is free to do whatever they want... I just don't see the
>>> motivator personally.
>>>
>>> digimer
>>
>>
>> I see one big difference that is both a strength and a weakness: these
>> other packages have a much wider user base beyond the HA cluster use
>> case. The strength is that there will be many more developers working to
>> fix bugs, add features, etc. The weakness is that most of those
>
>
> This is exactly what I was thinking about during 2.x developement. If
> replacement of Corosync wouldn't make more sense than continue developing of
> Corosync. I was able to accept implementing some features. Sadly, there was
> exactly ONE project which would be able to replace corosync (Spread toolkit)
> which is even less widespread than Corosync.
>
> From my point of view, replacement of corosync must be (at least) able to:
> - Work without quorum

Agreed. It's a non-starter if messaging stops when quorum is lost.

> - Support 2 node clusters
> - Allow multiple links (something like RRP)

Doesn't bonding (and the imminent arrival of knet) make this somewhat optional?

> - Don't include SPOF (so nothing like configuration stored on one node only
> and/or different machine on network)

There can be subtle variations on this.  The pattern in OpenStack is
to have a "management node".
Which sounds like a SPOF but they also require that the service be
able to function without it.  So its a grey area.

> - Provide EVS/VS

Pacemaker could live without this.  Heartbeat didn't provide it either.

> - Provide something like qdevice

Or the ability to create it.  In fairness, Pacemaker has gotten by for
a long long time without it :-)


It would be nice to be considered for the kinds of scaled deployments
that kubernetes and etcd are built for because that's where all the
excitement and mindshare is.  Zookeeper was one of the options I
thought of too, however realistically Pacemaker is not what those
folks are looking for.  At those scales our stack's smarts take a back
seat to the idea that there are so many copies that dozens can die and
the only recovery you need is to maybe start some more copies (because
with so many, there is always a master around somewhere).


For those of us with a need to scale _and_ an appreciation of "real"
resource orchestration, I would argue that a better architecture is a
small traditional cluster managing a much larger pool of
pacemaker-remote nodes.  Putting effort into making that really shine
(especially since its pretty solid already) is likely to have a better
payoff than porting to another messaging layer.


>
> Both zookeeper and etcd builds on top of quite simple to understand
> membership mechanism (zookeeper = elected master, something like amoeba,
> etcd = raft), what's nice, because it means more contributors. Sadly bare
> metal HA must work even in situations where "simple" quorum is not enough.
>
>
>
>> developers are ignorant of HA clustering and could easily cause more
>> problems for the HA use case than they fix.
>>
>> Another potential benefit is the familiarity factor -- people are more
>> comfortable with things they recognize from somewhere else. So it might
>> help Pacemaker adoption, especially in the communities that already use
>> these packages.
>>
>> I'm not aware of any technical advantages, and I wouldn't expect any,
>> given corosync's long HA focus.
>>
>  From my point of view (and yes, I'm biased), biggest problem of
> Zookeper
> is need to have quorum
>
> (https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_designing).
> Direct consequence is inability to 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-14 Thread Andrew Beekhof
On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers  wrote:
> Andrew Beekhof  wrote:
>> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers  wrote:
>> > Andrew Beekhof  wrote:
>> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers  wrote:
>> >> > Ken Gaillot  wrote:
>> >> >> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> >> >> > Maybe your point was that if the expected start never happens (so
>> >> >> > never even gets a chance to fail), we still want to do a nova
>> >> >> > service-disable?
>> >> >>
>> >> >> That is a good question, which might mean it should be done on every
>> >> >> stop -- or could that cause problems (besides delays)?
>> >> >
>> >> > No, the whole point of adding this feature is to avoid a
>> >> > service-disable on every stop, and instead only do it on the final
>> >> > stop.  If there are corner cases where we never reach the final stop,
>> >> > that's not a disaster because nova will eventually figure it out and
>> >> > do the right thing when the server-agent connection times out.
>> >> >
>> >> >> Another aspect of this is that the proposed feature could only look at 
>> >> >> a
>> >> >> single transition. What if stop is called with start_expected=false, 
>> >> >> but
>> >> >> then Pacemaker is able to start the service on the same node in the 
>> >> >> next
>> >> >> transition immediately afterward? Would having called service-disable
>> >> >> cause problems for that start?
>> >> >
>> >> > We would also need to ensure that service-enable is called on start
>> >> > when necessary.  Perhaps we could track the enable/disable state in a
>> >> > local temporary file, and if the file indicates that we've previously
>> >> > done service-disable, we know to run service-enable on start.  This
>> >> > would avoid calling service-enable on every single start.
>> >>
>> >> feels like an over-optimization
>> >> in fact, the whole thing feels like that if i'm honest.
>> >
>> > Huh ... You didn't seem to think that when we discussed automating
>> > service-disable at length in Austin.
>>
>> I didn't feel the need to push back because RH uses the systemd agent
>> instead so you're only hanging yourself, but more importantly because
>> the proposed implementation to facilitate it wasn't leading RA writers
>> down a hazardous path :-)
>
> I'm a bit confused by that statement, because the only proposed
> implementation we came up with in Austin was adding this new feature
> to Pacemaker.

_A_ new feature, not _this_ new feature.
The one we discussed was far less prone to being abused but, as it
turns out, also far less useful for what you were trying to do.

 Prior to that, AFAICR, you, Dawid, and I had a long
> afternoon discussion in the sun where we tried to figure out a way to
> implement it just by tweaking the OCF RAs, but every approach we
> discussed turned out to have fundamental issues.  That's why we
> eventually turned to the idea of this new feature in Pacemaker.
>
> But anyway, it's water under the bridge now :-)
>
>> > What changed?  Can you suggest a better approach?
>>
>> Either always or never disable the service would be my advice.
>> "Always" specifically getting my vote.
>
> OK, thanks.  We discussed that at the meeting this morning, and it
> looks like we'll give it a try.
>
>> >> why are we trying to optimise the projected performance impact
>> >
>> > It's not really "projected"; we know exactly what the impact is.  And
>> > it's not really a performance impact either.  If nova-compute (or a
>> > dependency) is malfunctioning on a compute node, there will be a
>> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
>> > which nova-scheduler could still schedule VMs onto that compute node,
>> > and then of course they'll fail to boot.
>>
>> Right, but that window exists regardless of whether the node is or is
>> not ever coming back.
>
> Sure, but the window's a *lot* bigger if we don't do service-disable.
> Although perhaps your question "why are we trying to optimise the
> projected performance impact" was actually "why are we trying to avoid
> extra calls to service-disable" rather than "why do we want to call
> service-disable" as I initially assumed.  Is that right?

Exactly.  I assumed it was to limit the noise we'd be generating in doing so.

>
>> And as we already discussed, the proposed feature still leaves you
>> open to this window because we can't know if the expected restart will
>> ever happen.
>
> Yes, but as I already said, the perfect should not become the enemy of
> the good.  Just because an approach doesn't solve all cases, it
> doesn't necessarily mean it's not suitable for solving some of them.
>
>> In this context, trying to avoid the disable call under certain
>> circumstances, to avoid repeated and frequent flip-flopping of the
>> state, seems ill-advised.  At the point nova compute is bouncing up
>> and down like that, you have a more 

Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

2016-06-14 Thread Jeremy Voisin
Hi,

Every action on httpd is very slow due to ModSecurity 2.9. The reload in
postrotate may take awhile.

Here is the output log for message this morning : 
Jun 14 03:43:05 mail-px-** crmd[2685]:  notice: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: On loss of CCM Quorum:
Ignore
Jun 14 03:43:05 mail-px-** pengine[2684]: warning: Processing failed op
monitor for WebSite on node1: not running (7)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Recover
WebSite#011(Started node1)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Calculated Transition
367: /var/lib/pacemaker/pengine/pe-input-173.bz2
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: On loss of CCM Quorum:
Ignore
Jun 14 03:43:05 mail-px-** pengine[2684]: warning: Processing failed op
monitor for WebSite on node1: not running (7)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Recover
WebSite#011(Started node1)
Jun 14 03:43:05 mail-px-** crmd[2685]:  notice: Initiating action 4: stop
WebSite_stop_0 on node1 (local)
Jun 14 03:43:05 mail-px-** systemd: Reloading.
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Calculated Transition
368: /var/lib/pacemaker/pengine/pe-input-174.bz2
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/fusioninventory-agent.service is marked executable.
Please remove executable permission bits. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/auditd.service is marked world-inaccessible. This
has no effect as configuration data is accessible via APIs without
restrictions. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/ebtables.service is marked executable. Please remove
executable permission bits. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Removed slice user-0.slice.
Jun 14 03:43:05 mail-px-** systemd: Stopping user-0.slice.
Jun 14 03:44:35 mail-px-** systemd: httpd.service stop-sigterm timed out.
Killing.
Jun 14 03:44:35 mail-px-** systemd: httpd.service: main process exited,
code=killed, status=9/KILL
Jun 14 03:44:35 mail-px-** systemd: Stopped The Apache HTTP Server.
Jun 14 03:44:35 mail-px-** systemd: Unit httpd.service entered failed state.
Jun 14 03:44:35 mail-px-** systemd: httpd.service failed.
Jun 14 03:44:37 mail-px-** crmd[2685]:  notice: Operation WebSite_stop_0: ok
(node=node1, call=29, rc=0, cib-update=464, confirmed=true)
Jun 14 03:44:37 mail-px-** crmd[2685]:  notice: Initiating action 10: start
WebSite_start_0 on node1 (local)
Jun 14 03:44:37 mail-px-** systemd: Reloading.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/fusioninventory-agent.service is marked executable.
Please remove executable permission bits. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/auditd.service is marked world-inaccessible. This
has no effect as configuration data is accessible via APIs without
restrictions. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/ebtables.service is marked executable. Please remove
executable permission bits. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/run/systemd/system/httpd.service.d/50-pacemaker.conf is marked
world-inaccessible. This has no effect as configuration data is accessible
via APIs without restrictions. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Starting Cluster Controlled httpd...
Jun 14 03:44:55 mail-px-** puppet-agent[1645]: Did not receive certificate
Jun 14 03:44:57 mail-px-** systemd: Started Cluster Controlled httpd.
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Operation WebSite_start_0:
ok (node=node1, call=30, rc=0, cib-update=465, confirmed=true)
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Initiating action 3: monitor
WebSite_monitor_30 on node1 (local)
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Transition 368 (Complete=4,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-174.bz2): Complete
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]

The strange thing is that the problem is not present every logrotate...

Jérémy



-Message d'origine-
De : Ken Gaillot [mailto:kgail...@redhat.com] 
Envoyé : mardi 14 juin 2016 16:40
À : users@clusterlabs.org
Objet : Re: [ClusterLabs] Processing failed op monitor for WebSite on node1:
not running (7)

On 06/14/2016 03:10 AM, Jeremy Voisin wrote:
> Hi all,
> 
>  
> 
> We actually have a 2 nodes cluster with corosync and pacemaker for 
> httpd. We have 2 VIP configured.
> 
>  
> 
> Since we’ve added ModSecurity 2.9, httpd restart is very slow. So I 
> increased the start / stop timeout. But sometimes, after logrotate the 
> following 

Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

2016-06-14 Thread Ken Gaillot
On 06/14/2016 03:10 AM, Jeremy Voisin wrote:
> Hi all,
> 
>  
> 
> We actually have a 2 nodes cluster with corosync and pacemaker for
> httpd. We have 2 VIP configured.
> 
>  
> 
> Since we’ve added ModSecurity 2.9, httpd restart is very slow. So I
> increased the start / stop timeout. But sometimes, after logrotate the
> following error occurs :
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_30 on node1 'not running' (7): call=26,
> status=complete, exitreason='none',
> 
> last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> Here is the full output of crm_mon :
> 
> Last updated: Tue Jun 14 07:22:28 2016  Last change: Fri Jun 10
> 09:28:03 2016 by root via cibadmin on node1
> 
> Stack: corosync
> 
> Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
> quorum
> 
> 2 nodes and 4 resources configured
> 
>  
> 
> Online: [ node1 node2 ]
> 
>  
> 
> WebSite (systemd:httpd):Started node1
> 
> Resource Group: WAFCluster
> 
>  VirtualIP  (ocf::heartbeat:IPaddr2):   Started node1
> 
>  MailMon(ocf::heartbeat:MailTo):Started node1
> 
>  VirtualIP2 (ocf::heartbeat:IPaddr2):   Started node1
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_30 on node1 'not running' (7): call=26,
> status=complete, exitreason='none',
> 
> last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> # pcs resource --full
> 
> Resource: WebSite (class=systemd type=httpd)
> 
>   Attributes: configfile=/etc/httpd/conf/httpd.conf
> statusurl=http://127.0.0.1/server-status monitor=1min
> 
>   Operations: monitor interval=300s (WebSite-monitor-interval-300s)
> 
>   start interval=0s timeout=300s (WebSite-start-interval-0s)
> 
>   stop interval=0s timeout=300s (WebSite-stop-interval-0s)
> 
> Group: WAFCluster
> 
>   Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
> 
>Attributes: ip=195.70.7.74 cidr_netmask=27
> 
>Operations: start interval=0s timeout=20s (VirtualIP-start-interval-0s)
> 
>stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)
> 
>monitor interval=30s (VirtualIP-monitor-interval-30s)
> 
>   Resource: MailMon (class=ocf provider=heartbeat type=MailTo)
> 
>Attributes: email=sys...@dfi.ch
> 
>Operations: start interval=0s timeout=10 (MailMon-start-interval-0s)
> 
>stop interval=0s timeout=10 (MailMon-stop-interval-0s)
> 
>monitor interval=10 timeout=10 (MailMon-monitor-interval-10)
> 
>   Resource: VirtualIP2 (class=ocf provider=heartbeat type=IPaddr2)
> 
>Attributes: ip=195.70.7.75 cidr_netmask=27
> 
>Operations: start interval=0s timeout=20s (VirtualIP2-start-interval-0s)
> 
>stop interval=0s timeout=20s (VirtualIP2-stop-interval-0s)
> 
>monitor interval=30s (VirtualIP2-monitor-interval-30s)
> 
>  
> 
>  
> 
> If I run /crm_resource –P/ the Failed Actions disappear.
> 
>  
> 
> How can I fix the monitor “not running” error ?
> 
>  
> 
> Thanks,
> 
> Jérémy

Why does logrotate cause the site to stop responding? Normally it's a
graceful restart, which shouldn't cause any interruptions.

Any solution will have to be in logrotate, to keep it from interrupting
service.

Personally, my preferred configuration is to make apache log to syslog
instead of its usual log file. You can even configure syslog to log it
to the usual file, so there's no major difference. Then, you don't need
a separate logrotate script for apache, it gets rotated with the system
log. That avoids having to restart apache, which for a busy site can be
a big deal. It also gives you the option of tying into syslog tools such
as remote logging.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org