Re: [Gluster-users] Initial sync

Kaushal M Wed, 12 Nov 2014 04:00:47 -0800

The glusterfs volume configuration lives in /var/lib/glusterd . You'll
need to backup/restore the complete directory to get your
functionality.


/etc/glusterfs contains a simple volfile needed to start glusterd,
which would be installed by the glusterfs package in most cases.
You'll need this as well.

~kaushal

On Wed, Nov 12, 2014 at 4:02 PM, Andreas Hollaus
<andreas.holl...@ericsson.com> wrote:
> Hi,
>
> As I previously described, my root file system is located in RAM so I'll lose 
> the
> gluster volume definition(s) in case of a reboot. However, I would like to 
> backup the
> required files to a mounted disk so that they can be restored to /etc after 
> the
> reboot. Which files would I have to backup/restore to be able to run 'gluster 
> volume
> start'  without first re-creating the volume?
>
> Regards
> Andreas
>
> On 11/05/14 12:23, Ravishankar N wrote:
>> On 11/05/2014 03:18 PM, Andreas Hollaus wrote:
>>> Hi,
>>>
>>> I'm curious about this 5 phase transaction scheme that is described in the 
>>> document
>>> (lock, pre-op, op, post-op, unlock).
>>> Are these stage switches all triggered from the client or can the server do 
>>> it
>>> without notifying the client, for instance switching from 'op' to 'post-op'?
>>
>> All stages are performed by the AFR translator in the client graph, where it 
>> is
>> loaded, in the sequence you listed.
>>> Decreasing the counter for the local pending operations could be done 
>>> without talking
>>> to the client, even though I realize a message has to sent to the other 
>>> server(s),
>>> possibly through the client.
>>>
>>> The reason I ask is that I'm trying to estimate the risk of ending up in a 
>>> split
>>> brain situation, or at least understand if our servers will 'accuse' each 
>>> other
>>> temporarily during this 5 phase transaction under normal circumstances. If I
>>> understand who sends messages to who and I what order, I'll have a better 
>>> chance to
>>> see if we require any solution to split brain situations. As I've 
>>> experienced
>>> problems to setup the 'favorite-child' option, I want to know if it's 
>>> required or
>>> not. In our use case, quorum is not a solution, but losing some data is 
>>> acceptable as
>>> long as the bricks are in sync.
>> If a file is split-brained, AFR does not allow modifications  by clients on 
>> it
>> until the split-brain is resolved. The afr xattrs and heal mechanisms ensure 
>> that
>> the bricks are in sync, so worries on that front.
>> Thanks,
>> Ravi
>>>
>>> Regards
>>> Andreas
>>>
>>> On 10/31/14 15:37, Ravishankar N wrote:
>>>> On 10/30/2014 07:23 PM, Andreas Hollaus wrote:
>>>>> Hi,
>>>>>
>>>>> Thanks! Seems like an interesting document. Although I've read blogs 
>>>>> about how
>>>>> extended attributes are used as a change log, this seams like a more 
>>>>> comprehensive
>>>>> document.
>>>>>
>>>>> I won't write directly to any brick. That's the reason I first have to 
>>>>> create a
>>>>> volume which consists of only one brick, until the other server is 
>>>>> available, and
>>>>> then add that second brick. I don't want to delay the file system clients 
>>>>> until the
>>>>> second server is available, hence the reason for add-brick.
>>>>>
>>>>> I guess that this procedure is only needed the first time the volume is 
>>>>> configured,
>>>>> right? If any of these bricks would fail later on, the change log would 
>>>>> keep
>>>>> track of
>>>>> all changes to the file system even though only one of the bricks is 
>>>>> available(?).
>>>> Yes, if one one brick of a replica pair goes down, the other one keeps 
>>>> track of
>>>> file modifications by the client, and would sync it back to the first one 
>>>> when it
>>>> comes back up.
>>>>
>>>>> After a restart, volume settings stored in the configuration file would 
>>>>> be accepted
>>>>> even though not all servers were up and running yet at that time, 
>>>>> wouldn't they?
>>>> glusterd running on all nodes ensures that the volume configurations 
>>>> stored on each
>>>> node are in sync.
>>>>> Speaking about configuration files. When are these copied to each server?
>>>>> If I create a volume which consists of two bricks, I guess that those 
>>>>> servers will
>>>>> create the configuration files, independently of each other, from the 
>>>>> information
>>>>> sent from the client (gluster volume create...).
>>>> All volume config/management commands must be run from any of the servers 
>>>> that make
>>>> up the volume and not the client (unless both happen to be in the same 
>>>> machine). As
>>>> mentioned above, when any of the volume commands are run on any one server,
>>>> glusterd orchestrates the necessary action on all servers and keeps them 
>>>> in sync.
>>>>>    In case I later on add a brick, I guess that the settings have to be 
>>>>> copied
>>>>> to the
>>>>> new brick after they have been modified on the first one, right (or will 
>>>>> they be
>>>>> recreated on all servers from the information specified by the client, 
>>>>> like in the
>>>>> previous case)?
>>>>>
>>>>> Will configuration files be copied in other situations as well, for 
>>>>> instance in
>>>>> case
>>>>> one of the servers which is part of the volume for some reason would be 
>>>>> missing
>>>>> those
>>>>> files? In my case, the root file system is recreated from an image at each
>>>>> reboot, so
>>>>> everything created in /etc will be lost. Will GlusterFS settings be 
>>>>> restored
>>>>> from the
>>>>> other server automatically
>>>> No, it is expected that servers have persistent file-systems.  There are 
>>>> ways to
>>>> restore such bricks; see
>>>> http://gluster.org/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server
>>>>
>>>>
>>>> -Ravi
>>>>> or do I need to backup and restore those myself? Even
>>>>> though the brick doesn't know that it is part of a volume in case it lose 
>>>>> the
>>>>> configuration files, both the other server(s) and the client(s) will 
>>>>> probably
>>>>> recognize it as being part of the volume. I therefore believe that such a
>>>>> self-healing would actually be possible, even though it may not be 
>>>>> implemented.
>>>>>
>>>>>
>>>>> Regards
>>>>> Andreas
>>>>>   On 10/30/14 05:21, Ravishankar N wrote:
>>>>>> On 10/28/2014 03:58 PM, Andreas Hollaus wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm curious about how GlusterFS manages to sync the bricks in the 
>>>>>>> initial phase,
>>>>>>> when
>>>>>>> the volume is created or
>>>>>>> extended.
>>>>>>>
>>>>>>> I first create a volume consisting of only one brick, which clients 
>>>>>>> will start to
>>>>>>> read and write.
>>>>>>> After a while I add a second brick to the volume to create a replicated 
>>>>>>> volume.
>>>>>>>
>>>>>>> If this new brick is empty, I guess that files will be copied from the 
>>>>>>> first
>>>>>>> brick to
>>>>>>> get the bricks in sync, right?
>>>>>>>
>>>>>>> However, if the second brick is not empty but rather contains a subset 
>>>>>>> of the
>>>>>>> files
>>>>>>> on the first brick I don't see
>>>>>>> how GlusterFS will solve the problem of syncing the bricks.
>>>>>>>
>>>>>>> I guess that all files which lack extended attributes could be removed 
>>>>>>> in this
>>>>>>> scenario, because they were created
>>>>>>> when the disk was not part of a GlusterFS volume. However, in case the 
>>>>>>> brick was
>>>>>>> used
>>>>>>> in the volume previously,
>>>>>>> for instance before that server restarted, there will be extended 
>>>>>>> attributes for
>>>>>>> the
>>>>>>> files on the second brick which
>>>>>>> weren't updated during the downtime (when the volume consisted of only 
>>>>>>> one
>>>>>>> brick).
>>>>>>> There could be multiple
>>>>>>> changes to the files during this time. In this case I don't understand 
>>>>>>> how the
>>>>>>> extended attributes could be used to
>>>>>>> determine which of the bricks contains the most recent file.
>>>>>>>
>>>>>>> Can anyone explain how this works? Is it only allowed to add empty 
>>>>>>> bricks to a
>>>>>>> volume?
>>>>>>>
>>>>>>>
>>>>>> It is allowed to add only empty bricks to the volume. Writing directly to
>>>>>> bricks is
>>>>>> not supported. One needs to access the volume only from a mount point or 
>>>>>> using
>>>>>> libgfapi.
>>>>>> After adding a brick to increase the distribute count, you need to run 
>>>>>> the volume
>>>>>> rebalance command so that the some of the existing files are hashed 
>>>>>> (moved) to
>>>>>> this
>>>>>> newly added brick.
>>>>>> After adding a brick to increase the replica count, you need to run the 
>>>>>> volume
>>>>>> heal
>>>>>> full command to sync the files from the other replica into the newly 
>>>>>> added brick.
>>>>>> https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md 
>>>>>> will give
>>>>>> you an idea of how the replicate translator uses xattrs to keep files in 
>>>>>> sync.
>>>>>>
>>>>>> HTH,
>>>>>> Ravi
>>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Initial sync

Reply via email to