Thank you Stefan for your comments.

Am 11. Juni 2024 um 13:51 schrieb "Stefan Solbrig" <[email protected]>:


> 
> Hi,
> 
>  
> 
>  The method depends a bit if you use a distributed-only system (like me) or a 
> replicated setting.
> 
>  I'm using a distributed-only setting (many bricks on different servers, but 
> no replication).  All my servers boot via network, i.e., on a start, it's 
> like a new host.
> 

We have a distributed-replicated setup with 6 hosts: distribute 2, replicate 3. 
Each host has 4 bricks.

Specifically:

- Data is distributed across hosts “gluster1” and “gluster2” (using two 
volumes, each volume consists of 2 bricks, residing on separate disks. So every 
host has 4 bricks, residing on 4 separate disks. These are all undamaged and 
can be reused.)
- This setup is replicated on hosts “gluster3/gluster4” and “gluster5/gluster6”.
- gluster1 is the machine with the broken root disk (all bricks undamaged).


> To rescue the old bricks, just set up a new server this the same OS, the same 
> IP and and the same hostname (!very important).  The simplest thing would be 
> if you could retrieve the files in /var/lib/glusterd 

No data can be retrieved from the original root disk of gluster1, we tried any 
magic we could summon.

 
>  If you install a completely new server (but with tsame IP and the same 
> hostname), _then_ restore the files in /var/lib/glusterd,  you can just use 
> it as before. It will be recognised as the previous peer, without any 
> additional commands.

This sounds the way to go then.


>  In fact, I think that /var/lib/glusterd/*...   should be identical on all 
> servers, except
>  /var/lib/glusterd/glusterd.info http://glusterd.info/  
>  which holds the UUID of the server. However, you should be able to retrieve 
> the UUID from the command:
>  gluster pool list

I believe so. Having 5 working nodes we should be fine retrieving the 
configuration data.


>  This is your scenario 2a)
> 
>  Note that if it's __not__ a distributed-only system, other steps might be 
> necessary.

This worries me a bit. What other steps do you mean?



>  Your 2b) scenario should also work, but slightly different. (Again, only 
> distributed-only)
> 
>  I use it occasionally for failover mode, but I haven't tested it extensively:
> 
>  
> 
>      gluster v reset-brick NameOfVolume FailedServer:/path/to/brick   start
> 
>  
> 
>      gluster v add-brick NameOfVolume NewServer:/path/to/brick  force
> 
>  
> 
>      # Order is important!
> 
>      # if brick is removed before other brick is added,
> 
>      # will lead to duplicate files.
> 
>  
> 
>      gluster v remove-brick NameOfVolume FailedServer:/path/to/brick force
> 
>  
> 
>      gluster v rebalance NameOfVolume fix-layout  start
> 
>  
> 
>  If it's also replcated or striped or using sharding, then other steps might 
> be necessary.

See above: What steps do you think of?


>  best wishes,
> 
>  Stefan Solbrig


Best regards to Regensburg,
R. Kupper


> 
>  
> 
>  -- 
> 
>  
> 
>  Dr. Stefan Solbrig
> 
>  
> 
>  Universität Regensburg
> 
>  
> 
>  Fakultät für Informatik und Data Science
> 
>  
> 
>  93040 Regensburg
> 
>  
> 
>  
> 
>  Am 09.06.2024 um 15:00 schrieb [email protected]:
> 
>  
> 
>  Hi all,
> 
>  
> 
>  I know there are many tutorials on how to replace a gluster host that has 
> become unusable. But they all seem to assume that the bricks of the 
> respective host are gone, too.
> 
>  
> 
>  My problem is different and (I hope) more easily solved: the disk with the 
> host’s root file system died and cannot be recovered. However, all of its 
> bricks are on separate disks and completely undamaged.
> 
>  
> 
>  I‘m seeking your advice on what is best practice for replacing such a host.
> 
>  
> 
>  My notion is that it should be possible to setup a new root system, 
> configure it and have it use the existing bricks.
> 
>  
> 
>  My questions are:
> 
>  
> 
>  1) Is this a good idea at all or do I miss anything? Would it be better to 
> format the existing bricks and start over with a completely clean new host, 
> like most of the tutorials do?
> 
>  
> 
>  2) If it is feasible to use the existing bricks, two scenarios come to my 
> mind:
> 
>  
> 
>   a) Setup a new root file system for a gluster host and copy/change gluster 
> configuration from one of the existing hosts. Adjust it so that the newly 
> setup host actually thinks it is the old host (that died). I.e., copying over 
> the gluster UID, Volume configurations, Hostnamen, IP, etc. (What else would 
> it need?)
> 
>  
> 
>      The pool would then recognize the new host as identical to the old one 
> that died and accept it just like the old host came online again. 
> 
>  
> 
>   b) Setup a new root file system for a gluster host and probe it into the 
> trusted pool, with a new name and new gluster UID. Transfer bricks of the old 
> host that died to the new one using „change-brick“. There would be no need 
> for lengthy syncing as most of the data is existing and up-to-date on the new 
> host (that has the bricks of the old host), only self-heal would take place.
> 
>  
> 
>  Do these scenarios sound sane to you and which one would be best practice in 
> this situation? This is a production system, so safety is relevant.
> 
>  
> 
>  Thanks for any helpful comments and opinions!
> 
>  
> 
>  Best, R. Kupper
> 
>  
> 
>  ________
> 
>  
> 
>  Community Meeting Calendar:
> 
>  
> 
>  Schedule -
> 
>  
> 
>  Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> 
>  
> 
>  Bridge: https://meet.google.com/cpu-eiue-hvk
> 
>  
> 
>  Gluster-users mailing list
> 
>  
> 
>  [email protected]
> 
>  
> 
>  https://lists.gluster.org/mailman/listinfo/gluster-users
>

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to