Re: [lustre-discuss] BCP for High Availability?

2023-01-19 Thread Cameron Harr via lustre-discuss
We (LLNL) were probably that Lab using pacemaker-remote, and we still 
are as it generally works and is what we're used to. That said, on an 
upcoming system, we may end up trying 2-node HA clusters due to the 
vendor's preference. I'm not sure what specifics you're interested in, 
but as you mention, the PM-remote option let's one cluster bring or down 
the entire file system and can handle fencing and resource management 
for everyone. The biggest caveat with this method (learned harshly by 
numerous folks) is not to do 'systemctl stop pacemaker' on that central 
node unless you really want to take down the entire file system.


On 1/15/23 18:37, Andrew Elwell via lustre-discuss wrote:

Hi Folks,

I'm just rebuilding my testbed and have got to the "sort out all the
pacemaker stuff" part. What's the best current practice for the
current LTS (2.15.x) release tree?

I've always done this as multiple individual HA clusters covering each
pair of servers with common dual connected drive array(s), but I
remember seeing a talk some years ago where one of the US labs was
using ?pacemaker-remote? and bringing them all up from a central node

I note there's a few (old) crib notes on the wiki - referenced from
the lustre manual, but nothing updated in the last couple of years.

What are people out there doing?


Many thanks

Andrew
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!hA_mvzRa3TBp976BGEStcbJQ5HQrSaOHqnwTEkb-TKQGmwf1LaBDZXvRl7ULJ4Q$

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] BCP for High Availability?

2023-01-15 Thread Andrew Elwell via lustre-discuss
Hi Folks,

I'm just rebuilding my testbed and have got to the "sort out all the
pacemaker stuff" part. What's the best current practice for the
current LTS (2.15.x) release tree?

I've always done this as multiple individual HA clusters covering each
pair of servers with common dual connected drive array(s), but I
remember seeing a talk some years ago where one of the US labs was
using ?pacemaker-remote? and bringing them all up from a central node

I note there's a few (old) crib notes on the wiki - referenced from
the lustre manual, but nothing updated in the last couple of years.

What are people out there doing?


Many thanks

Andrew
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org