Thank you Stefan for your comments.
Am 11. Juni 2024 um 13:51 schrieb "Stefan Solbrig" <[email protected]>: > > Hi, > > > > The method depends a bit if you use a distributed-only system (like me) or a > replicated setting. > > I'm using a distributed-only setting (many bricks on different servers, but > no replication). All my servers boot via network, i.e., on a start, it's > like a new host. > We have a distributed-replicated setup with 6 hosts: distribute 2, replicate 3. Each host has 4 bricks. Specifically: - Data is distributed across hosts “gluster1” and “gluster2” (using two volumes, each volume consists of 2 bricks, residing on separate disks. So every host has 4 bricks, residing on 4 separate disks. These are all undamaged and can be reused.) - This setup is replicated on hosts “gluster3/gluster4” and “gluster5/gluster6”. - gluster1 is the machine with the broken root disk (all bricks undamaged). > To rescue the old bricks, just set up a new server this the same OS, the same > IP and and the same hostname (!very important). The simplest thing would be > if you could retrieve the files in /var/lib/glusterd No data can be retrieved from the original root disk of gluster1, we tried any magic we could summon. > If you install a completely new server (but with tsame IP and the same > hostname), _then_ restore the files in /var/lib/glusterd, you can just use > it as before. It will be recognised as the previous peer, without any > additional commands. This sounds the way to go then. > In fact, I think that /var/lib/glusterd/*... should be identical on all > servers, except > /var/lib/glusterd/glusterd.info http://glusterd.info/ > which holds the UUID of the server. However, you should be able to retrieve > the UUID from the command: > gluster pool list I believe so. Having 5 working nodes we should be fine retrieving the configuration data. > This is your scenario 2a) > > Note that if it's __not__ a distributed-only system, other steps might be > necessary. This worries me a bit. What other steps do you mean? > Your 2b) scenario should also work, but slightly different. (Again, only > distributed-only) > > I use it occasionally for failover mode, but I haven't tested it extensively: > > > > gluster v reset-brick NameOfVolume FailedServer:/path/to/brick start > > > > gluster v add-brick NameOfVolume NewServer:/path/to/brick force > > > > # Order is important! > > # if brick is removed before other brick is added, > > # will lead to duplicate files. > > > > gluster v remove-brick NameOfVolume FailedServer:/path/to/brick force > > > > gluster v rebalance NameOfVolume fix-layout start > > > > If it's also replcated or striped or using sharding, then other steps might > be necessary. See above: What steps do you think of? > best wishes, > > Stefan Solbrig Best regards to Regensburg, R. Kupper > > > > -- > > > > Dr. Stefan Solbrig > > > > Universität Regensburg > > > > Fakultät für Informatik und Data Science > > > > 93040 Regensburg > > > > > > Am 09.06.2024 um 15:00 schrieb [email protected]: > > > > Hi all, > > > > I know there are many tutorials on how to replace a gluster host that has > become unusable. But they all seem to assume that the bricks of the > respective host are gone, too. > > > > My problem is different and (I hope) more easily solved: the disk with the > host’s root file system died and cannot be recovered. However, all of its > bricks are on separate disks and completely undamaged. > > > > I‘m seeking your advice on what is best practice for replacing such a host. > > > > My notion is that it should be possible to setup a new root system, > configure it and have it use the existing bricks. > > > > My questions are: > > > > 1) Is this a good idea at all or do I miss anything? Would it be better to > format the existing bricks and start over with a completely clean new host, > like most of the tutorials do? > > > > 2) If it is feasible to use the existing bricks, two scenarios come to my > mind: > > > > a) Setup a new root file system for a gluster host and copy/change gluster > configuration from one of the existing hosts. Adjust it so that the newly > setup host actually thinks it is the old host (that died). I.e., copying over > the gluster UID, Volume configurations, Hostnamen, IP, etc. (What else would > it need?) > > > > The pool would then recognize the new host as identical to the old one > that died and accept it just like the old host came online again. > > > > b) Setup a new root file system for a gluster host and probe it into the > trusted pool, with a new name and new gluster UID. Transfer bricks of the old > host that died to the new one using „change-brick“. There would be no need > for lengthy syncing as most of the data is existing and up-to-date on the new > host (that has the bricks of the old host), only self-heal would take place. > > > > Do these scenarios sound sane to you and which one would be best practice in > this situation? This is a production system, so safety is relevant. > > > > Thanks for any helpful comments and opinions! > > > > Best, R. Kupper > > > > ________ > > > > Community Meeting Calendar: > > > > Schedule - > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > > Gluster-users mailing list > > > > [email protected] > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
