Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?

2021-04-11 Thread Andrei Borzenkov
On 11.04.2021 21:47, Eric Robinson wrote:
>> -Original Message-
>> From: Users  On Behalf Of Andrei
>> Borzenkov
>> Sent: Sunday, April 11, 2021 1:20 PM
>> To: users@clusterlabs.org
>> Subject: Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?
>>
>> On 11.04.2021 20:07, Eric Robinson wrote:
>>> We're writing a custom RA for a multi-tenant MySQL cluster that runs in
>> active/standby mode. I've read the RA documentation about what exit codes
>> should be returned for various outcomes, but something is still unclear to
>> me.
>>>
>>> We run multiple instances of MySQL from one filesystem, like this:
>>>
>>> /app_root
>>> /mysql1
>>> /mysql2
>>> /mysql3
>>> ...etc.
>>>
>>> The /app_root filesystem lives on a DRBD volume, which is only mounted
>> on the active node.
>>>
>>> When the RA performs a "start," "stop," or "monitor" action on the standby
>> node, the filesystem is not mounted so the mysql instances are not present.
>>
>> You are not supposed to do it in the first place. You are supposed to have
>> ordering constraint that starts MySQL instances after filesystem is 
>> available.
>>
> 
> That is what we have. The colocation constraints require mysql -> filesystem 
> -> drbd master. The ordering constraints promote drbd, then start the 
> filesystem, then start mysql.
> 

So how is it possible to have agent to execute "start" or "stop" on the
wrong node?

>>> What should the return  codes for those actions be? Fail? Not installed?
>> Unknown error?
>>>
>>
>> I believe that "not installed" is considered hard error and bans resource 
>> from
>> this node. As missing filesystem is probably transient it does not look
>> appropriate. There is no "fail" return code.
>>
>> In any case return code depends on action. For monitor you obviously are
>> expected to return "not running" in this case. "stop" should probably return
>> success (after all, instance is not running, right?) And "start"
>> should return error indication, but it I am not sure what is better - generic
>> error or not running.
>>
> 
> That's a big part of my question. I'm just trying to avoid a condition where 
> the mysql resource is running on node A, and Pacemaker thinks there is a 
> "problem" with it on Node B.
> 

I am not sure I understand the problem. By default nothing will run on
node B after initial probe. If you configured also monitoring in stopped
state, your monitor obviously has to return the truth - that application
is not running.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?

2021-04-11 Thread Eric Robinson
> -Original Message-
> From: Users  On Behalf Of Andrei
> Borzenkov
> Sent: Sunday, April 11, 2021 1:20 PM
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?
>
> On 11.04.2021 20:07, Eric Robinson wrote:
> > We're writing a custom RA for a multi-tenant MySQL cluster that runs in
> active/standby mode. I've read the RA documentation about what exit codes
> should be returned for various outcomes, but something is still unclear to
> me.
> >
> > We run multiple instances of MySQL from one filesystem, like this:
> >
> > /app_root
> > /mysql1
> > /mysql2
> > /mysql3
> > ...etc.
> >
> > The /app_root filesystem lives on a DRBD volume, which is only mounted
> on the active node.
> >
> > When the RA performs a "start," "stop," or "monitor" action on the standby
> node, the filesystem is not mounted so the mysql instances are not present.
>
> You are not supposed to do it in the first place. You are supposed to have
> ordering constraint that starts MySQL instances after filesystem is available.
>

That is what we have. The colocation constraints require mysql -> filesystem -> 
drbd master. The ordering constraints promote drbd, then start the filesystem, 
then start mysql.

> > What should the return  codes for those actions be? Fail? Not installed?
> Unknown error?
> >
>
> I believe that "not installed" is considered hard error and bans resource from
> this node. As missing filesystem is probably transient it does not look
> appropriate. There is no "fail" return code.
>
> In any case return code depends on action. For monitor you obviously are
> expected to return "not running" in this case. "stop" should probably return
> success (after all, instance is not running, right?) And "start"
> should return error indication, but it I am not sure what is better - generic
> error or not running.
>

That's a big part of my question. I'm just trying to avoid a condition where 
the mysql resource is running on node A, and Pacemaker thinks there is a 
"problem" with it on Node B.

-Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?

2021-04-11 Thread Andrei Borzenkov
On 11.04.2021 20:07, Eric Robinson wrote:
> We're writing a custom RA for a multi-tenant MySQL cluster that runs in 
> active/standby mode. I've read the RA documentation about what exit codes 
> should be returned for various outcomes, but something is still unclear to me.
> 
> We run multiple instances of MySQL from one filesystem, like this:
> 
> /app_root
> /mysql1
> /mysql2
> /mysql3
> ...etc.
> 
> The /app_root filesystem lives on a DRBD volume, which is only mounted on the 
> active node.
> 
> When the RA performs a "start," "stop," or "monitor" action on the standby 
> node, the filesystem is not mounted so the mysql instances are not present.

You are not supposed to do it in the first place. You are supposed to
have ordering constraint that starts MySQL instances after filesystem is
available.

> What should the return  codes for those actions be? Fail? Not installed? 
> Unknown error?
> 

I believe that "not installed" is considered hard error and bans
resource from this node. As missing filesystem is probably transient it
does not look appropriate. There is no "fail" return code.

In any case return code depends on action. For monitor you obviously are
expected to return "not running" in this case. "stop" should probably
return success (after all, instance is not running, right?) And "start"
should return error indication, but it I am not sure what is better -
generic error or not running.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/