On 05/27/14 16:44, Bartosz Kupidura wrote: > Hello, > Responses inline. > > > Wiadomość napisana przez Vladimir Kuklin <vkuk...@mirantis.com> w dniu 27 maj > 2014, o godz. 15:12: > >> Hi, Bartosz >> >> First of all, we are using openstack-dev for such discussions. >> >> Second, there is also Percona's RA for Percona XtraDB Cluster, which looks >> like pretty similar, although it is written in Perl. May be we could derive >> something useful from it. >> >> Next, if you are working on this stuff, let's make it as open for the >> community as possible. There is a blueprint for Galera OCF script: >> https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. It >> would be awesome if you wrote down the specification and sent newer galera >> ocf code change request to fuel-library gerrit. > > Sure, I will update this blueprint. > Change request in fuel-library: https://review.openstack.org/#/c/95764/
That is a really nice catch, Bartosz, thank you. I believe we should review the new OCF script thoroughly and consider omitting cs_commits/cs_shadows as well. What would be the downsides? > >> >> Speaking of crm_attribute stuff. I am very surprised that you are saying >> that node attributes are altered by crm shadow commit. We are using similar >> approach in our scripts and have never faced this issue. > > This is probably because you update crm_attribute very rarely. And with my > approach GTID attribute is updated every 60s on every node (3 updates in 60s, > in standard HA setup). > > You can try to update any attribute in loop during deploying cluster to > trigger fail with corosync diff. It sounds reasonable and we should verify it. I've updated the statuses for related bugs and attached them to the aforementioned blueprint as well: https://bugs.launchpad.net/fuel/+bug/1283062/comments/7 https://bugs.launchpad.net/fuel/+bug/1281592/comments/6 > >> >> Corosync 2.x support is in our roadmap, but we are not sure that we will use >> Corosync 2.x earlier than 6.x release series start. > > Yeah, moreover corosync CMAP is not synced between cluster nodes (or maybe im > doing something wrong?). So we need other solution for this... > We should use CMAN for Corosync 1.x, perhaps. >> >> >> On Tue, May 27, 2014 at 3:08 PM, Bartosz Kupidura <bkupid...@mirantis.com> >> wrote: >> Hello guys! >> I would like to start discussion on a new resource agent for >> galera/pacemaker. >> >> Main features: >> * Support cluster boostrap >> * Support reboot any node in cluster >> * Support reboot whole cluster >> * To determine which node have latest DB version, we should use galera GTID >> (Global Transaction ID) >> * Node with latest GTID is galera PC (primary component) in case of >> reelection >> * Administrator can manually set node as PC >> >> GTID: >> * get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE >> ‚wsrep_local_state_uuid'' >> * store GTID as crm_attribute for node (crm_attribute --node $HOSTNAME >> --lifetime $LIFETIME --name gtid --update $GTID) >> * on every monitor/stop/start action update GTID for given node >> * GTID can have 3 format: >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:123 - standard cluster-id:commit-id >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:-1 - standard non initialized >> cluster, 00000000-0000-0000-0000-000000000000:-1 >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:INF - commit-id manually set to INF, >> force RA to create new cluster, with master on given node >> >> Check if reelection of PC is needed: >> * (node is located in partition with quorum OR we have only 1 node >> configured in cluster) AND galera resource is not running on any node >> * GTID is manually set to INF on given node >> >> Check if given node is PC: >> * have highest GTID in cluster, in case we have more than one node with >> „highest” GTID, we use CRC32 to choose proper PC. >> * GTID is manually set to INF >> * in case node with highest GTID will not come back after cluster reboot >> (for example disk failure) administrator should set GTID to INF on other node >> >> I have almost ready RA: http://zynzel.spof.pl/mysql-wss >> >> Tested with vanila centos galera/pacemaker/corosync - OK >> Tested with Fuel 4.1 - Fail >> >> >> Fuel 4.1 with that RA will not deploy correctly, because we use >> crm_attribute to store GTID, and in manifest we use cs_shadow/cs_commit for >> every pacemaker resource. >> This lead to cs_commit problem with different configuration in shadow copy >> and running configuration (running config changed by RA). >> "Could not commit shadow instance [..] to the CIB: Application of an update >> diff failed” >> >> To solve this we can go in 2 ways: >> 1) dont use cs_commit/cs_shadow in manifests >> 2) store GTID in other way than crm_attribute >> >> IMHO 2) is better (less invasive) and we can store GTID in corosync CMAP >> (http://www.polarhome.com/service/man/generic.php?qf=corosync-cmapctl), but >> this require corosync 2.X >> >> >> -- >> Mailing list: https://launchpad.net/~fuel-dev >> Post to : fuel-...@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~fuel-dev >> More help : https://help.launchpad.net/ListHelp >> >> >> >> -- >> Yours Faithfully, >> Vladimir Kuklin, >> Fuel Library Tech Lead, >> Mirantis, Inc. >> +7 (495) 640-49-04 >> +7 (926) 702-39-68 >> Skype kuklinvv >> 45bk3, Vorontsovskaya Str. >> Moscow, Russia, >> www.mirantis.com >> www.mirantis.ru >> vkuk...@mirantis.com > > -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com Irc #bogdando _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev