Re: [Gluster-users] cluster.min-free-disk separate for each, brick

Dan Bretherton Sat, 19 Nov 2011 06:12:36 -0800


On 29/09/11 12:28, Dan Bretherton wrote:


On 08/09/11 23:51, Dan Bretherton wrote:

On Wed, Sep 7, 2011 at 4:27 PM, Dan Bretherton<d.a.brether...@reading.ac.uk <mailto:d.a.brether...@reading.ac.uk>>wrote:



    On 17/08/11 16:19, Dan Bretherton wrote:





            Dan Bretherton wrote:


                On 15/08/11 20:00, gluster-users-requ...@gluster.org
                <mailto:gluster-users-requ...@gluster.org> wrote:

                    Message: 1
                    Date: Sun, 14 Aug 2011 23:24:46 +0300
                    From: "Deyan Chepishev -
                    SuperHosting.BG"<dchepis...@superhosting.bg
                    <mailto:dchepis...@superhosting.bg>>
                    Subject: [Gluster-users] cluster.min-free-disk
                     separate for each
                       brick
                    To: gluster-users@gluster.org
                    <mailto:gluster-users@gluster.org>
                    Message-ID:<4e482f0e.3030...@superhosting.bg
                    <mailto:4e482f0e.3030...@superhosting.bg>>
                    Content-Type: text/plain; charset=UTF-8;
                    format=flowed

                    Hello,

                    I have a gluster set up with very different
                    brick sizes.

                    brick1: 9T
                    brick2: 9T
                    brick3: 37T

                    with this configuration if I set the parameter
                    cluster.min-free-disk to 10% it
                    applies to all bricks which is quite
                    uncomfortable with these brick sizes,
                    because 10% for the small bricks are ~ 1T but
                    for the big brick it is ~3.7T and
                    what happens at the end is that if all brick go
                    to 90% usage and I continue
                    writing, the small ones eventually fill up to
                    100% while the big one has enough
                    free space.

                    My question is, is there a way to set
                    cluster.min-free-disk per brick instead
                    setting it for the entire volume or any other
                    way to work around this problem ?

                    Thank you in advance

                    Regards,
                    Deyan

                Hello Deyan,

                I have exactly the same problem and I have asked
                about it before - see links below.

                
http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/

                http://gluster.org/pipermail/gluster-users/2011-May/007788.html

                My understanding is that the patch referred to in
                Amar's reply in the May thread prevents a
                "migrate-data" rebalance operation failing by
                running out of space on smaller bricks, but that
                doesn't solve the problem we are having.  Being able
                to set min-free-disk for each brick separately would
                be useful, as would being able to set this value as
                a number of bytes rather than a percentage.
                 However, even if these features were present we
                would still have a problem when the amount of free
                space becomes less than min-free-disk, because this
                just results in a warning message in the logs and
                doesn't actually prevent more files from being
                written.  In other words, min-free-disk is a soft
                limit rather than a hard limit.  When a volume is
                more than 90% full there may still be hundreds of
                gigabytes of free space spread over the large
                bricks, but the small bricks may each only have a
                few gigabytes left of even less.  Users do "df" and
                see lots of free space in the volume so they
                continue writing files.  However, when GlusterFS
                chooses to write a file to a small brick, the write
                fails with "device full" errors if the file grows
                too large, which is often the case here with files
                typically several gigabytes in size for some
                applications.

                I would really like to know if there is a way to
                make min-free-disk a hard limit.  Ideally, GlusterFS
                would chose a brick on which to write a file based
                on how much free space it has left rather than
                choosing a brick at random (or however it is done
                now).  That would solve the problem of non-uniform
                brick sizes without the need for a hard
                min-free-disk limit.

                Amar's comment in the May thread about QA testing
                being done only on volumes with uniform brick sizes
                prompted me to start standardising on a uniform
                brick size for each volume in my cluster.  My
                impression is that implementing the features needed
                for users with non-uniform brick sizes is not a
                priority for Gluster, and that users are all
                expected to use uniform brick sizes.  I really think
                this fact should be stated clearly in the GlusterFS
                documentation, in the sections on creating volumes
                in the Administration Guide for example.  That would
                stop other users from going down the path that I did
                initially, which has given me a real headache
                because I am now having to move tens of terabytes of
                data off bricks that are larger than the new
                standard size.

                Regards
                Dan.

            Hello,

            This is really bad news, because I already migrated my
            data and I just realized that I am screwed because
            Gluster just does not care about the brick sizes.
            It is impossible to move to uniform brick sizes.

            Currently we use 2TB  HDDs, but the disks are growing
            and soon we will probably use 3TB hdds or whatever other
            larges sizes appear on the market. So if we choose to
            use raid5 and some level of redundancy (for example
            6hdds in raid5, no matter what their size is) this
            sooner or later will lead us to non uniform bricks which
            is a problem and it is not correct to expect that we
            always can or want to provide uniform size bricks.

            With this way of thinking if we currently have 10T from
            6x2T in hdd5, at some point when there is a 10T on a
            single disk we will have to use no raid just because
            gluster can not handle non uniform bricks.

            Regards,
            Deyan


        I think Amar might have provided the answer in his posting
        to the thread yesterday, which has just appeared in my
        autospam folder.

        http://gluster.org/pipermail/gluster-users/2011-August/008579.html

            With size option, you can have a hardbound on min-free-disk

        This means that you can set a hard limit on min-free-disk,
        and set a value in GB that is bigger than the biggest file
        that is ever likely to be written.  This looks likely to
        solve our problem and make non-uniform brick sizes a
        practical proposition.  I wish I had known about this back
        in May when I embarked on my cluster restructuring exercise;
        the issue was discussed in this thread in May as well:
        http://gluster.org/pipermail/gluster-users/2011-May/007794.html

        Once I have moved all the data off the large bricks and
        standardised on a uniform brick size, it will be relatively
        easy to stick to this because I use LVM.  I create logical
        volumes for new bricks when a volume needs extending.  The
        only problem with this approach is what happens when the
        amount of free space left on a server is less than the size
        of the brick you want to create.  The only option then would
        be to use new servers, potentially wasting several TB of
        free space on existing servers.  The standard brick size for
        most of my volumes is 3TB, which allows me to use a mixture
        of small servers and large servers in a volume and limits
        the amount of free space that would be wasted if there
        wasn't quite enough free space on a server to create another
        brick.  Another consequence of having 3TB bricks is that a
        single server typically has two more more bricks belonging
        to a the same volume, although I do my best to distribute
        the volumes across different servers in order to spread the
        load.  I am not aware of any problems associated with
        exporting multiple bricks from a single server and it has
        not caused me any problems so far that I am aware of.

        -Dan.

    Hello Deyan,

    Have you tried giving min-free-disk a value in gigabytes, and if
    so does it prevent new files being written to your bricks when
    they are nearly full?  I recently tried it myself and found that
    min-free-disk had no effect all.  I deliberately filled my
    test/backup volume and most of the bricks became 100 full.  I
    set min-free-disk to "20GB", as reported in "gluster volume ...
    info" below.

    cluster.min-free-disk: 20GB

    Unless I am doing something wrong it seems as though we can not
    "have a hardbound on min-free-disk" after all, and uniform brick
    size is therefore an essential requirement.  It still doesn't
    say that in the documentation, at least not in the volume
    creation sections.


    -Dan.

On 08/09/11 06:35, Raghavendra Bhat wrote:
> This is how it is supposed to work.
>

> Suppose a distribute volume is created with 2 bricks. 1st brick ishaving 25GB of free space, 2nd disk has 35 GB of free space. If onesets a 30GB of minimum-free-disk through volume set (gluster volumeset <volname> min-free-disk 30GB), then whenever files are created,if the file is hashed to the 1st brick (which has 25GB of freespace), then actual file will be created in the 2nd brick to which alinkfile will be created in the 1st brick. So the linkfile points tothe actual file. A warning message indicating minimum free disklimit has been crosses and adding more nodes will be printed in theglusterfs log file. So any file which is hashed to the 1st brickwill be created in the 2nd brick.

> Once the free space of 2nd brick also comes below 30 GB, then thefiles will be created in the respective hashed bricks only. Therewill be a warning message in the log file about the 2nd brick alsocrossing the minimum free disk limit.

>
> Regards,
> Raghavendra Bhat

Dear Raghavendra,

Thanks for explaining this to me. This mechanism should allow avolume to function correctly with non-uniform brick sizes even thoughmin-free-disk is not a hard limit. I can understand now why I had somany problems with the default value of 10% for min-free-disk. 10%of a large brick can be very large compared to 10% of a small brick,so when they started filling up at the same rate after all had lessthan 10% free space the small bricks usually filled up long beforelarge ones, giving "device full" errors even when df still showed alot of free space in the volume. At least now we can minimise thiseffect by setting min-free-disk to a value in GB.


-Dan.

Dear Raghavendra,

Unfortunately I am still having problems with some bricks filling upcompletely, despite having "cluster.min-free-disk: 20GB". In one caseI am still seeing warnings about bricks being nearly full inpercentage terms in the client logs, so I am wondering if the volumeis still using cluster.min-free-disk: 10%, and ignoring the 20GBsetting I changed it to. When I changed cluster.min-free-disk shouldthis have taken effect immediately is there something else I shouldhave done to activate the change?

In your example above, suppose there are 9 bricks instead of 2 bricks(as in my volume), and they all have less than 30GB free space exceptfor one which is nearly empty, is GlusterFS clever enough to find thatnearly empty brick every time when creating new files? I expected allnew files to be created in my nearly empty brick but that has nothappened. Some files have gone in there but most have gone to nearlyfull bricks, one of which has now filled up completely. I have donerebalance...fix-layout a number of times. What can I do to fix thisproblem? The volumes with one or more full bricks are unusablebecause users are getting "device full" errors for some writes eventhough both volumes are showing several TB free space.


Regards
-Dan Bretherton.


Dear All,

If anyone is interested, I managed to produce the expected behaviour bysetting min-free-disk to 300GB rather than 30GB. 300GB is isapproximately 10% of the size of most of the bricks in the volume. Idon't understand why setting min-free-disk to 30GB (about 1% of thebrick) didn't work; maybe it is too close to the limit for some reason.I wonder if the default value of min-free-disk=10% is significant. Itseems that for non-uniform brick sizes, the correct approach is to setmin-free-disk to a value in GB that is approximately 10% of the bricksize in each case.


-Dan

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] cluster.min-free-disk separate for each, brick

Reply via email to