Re: [Gluster-users] cluster.min-free-disk separate for each, brick

Deyan Chepishev - SuperHosting.BG Wed, 17 Aug 2011 16:55:28 -0700

Hello,


Dan Bretherton wrote:

Dan Bretherton wrote:
On 15/08/11 20:00, gluster-users-requ...@gluster.org wrote:
Message: 1
Date: Sun, 14 Aug 2011 23:24:46 +0300
From: "Deyan Chepishev - SuperHosting.BG"<dchepis...@superhosting.bg>
Subject: [Gluster-users] cluster.min-free-disk  separate for each
    brick
To: gluster-users@gluster.org
Message-ID:<4e482f0e.3030...@superhosting.bg>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hello,

I have a gluster set up with very different brick sizes.

brick1: 9T
brick2: 9T
brick3: 37T

with this configuration if I set the parameter cluster.min-free-disk to 10% it
applies to all bricks which is quite uncomfortable with these brick sizes,
because 10% for the small bricks are ~ 1T but for the big brick it is ~3.7Tand
what happens at the end is that if all brick go to 90% usage and I continue
writing, the small ones eventually fill up to 100% while the big one hasenough
free space.

My question is, is there a way to set cluster.min-free-disk per brick instead
setting it for the entire volume or any other way to work around thisproblem ?
Thank you in advance

Regards,
Deyan
Hello Deyan,
I have exactly the same problem and I have asked about it before - see linksbelow.
http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/
http://gluster.org/pipermail/gluster-users/2011-May/007788.html
My understanding is that the patch referred to in Amar's reply in the Maythread prevents a "migrate-data" rebalance operation failing by running outof space on smaller bricks, but that doesn't solve the problem we arehaving. Being able to set min-free-disk for each brick separately would beuseful, as would being able to set this value as a number of bytes ratherthan a percentage. However, even if these features were present we wouldstill have a problem when the amount of free space becomes less thanmin-free-disk, because this just results in a warning message in the logsand doesn't actually prevent more files from being written. In other words,min-free-disk is a soft limit rather than a hard limit. When a volume ismore than 90% full there may still be hundreds of gigabytes of free spacespread over the large bricks, but the small bricks may each only have a fewgigabytes left of even less. Users do "df" and see lots of free space inthe volume so they continue writing files. However, when GlusterFS choosesto write a file to a small brick, the write fails with "device full" errorsif the file grows too large, which is often the case here with filestypically several gigabytes in size for some applications.
I would really like to know if there is a way to make min-free-disk a hardlimit. Ideally, GlusterFS would chose a brick on which to write a filebased on how much free space it has left rather than choosing a brick atrandom (or however it is done now). That would solve the problem ofnon-uniform brick sizes without the need for a hard min-free-disk limit.
Amar's comment in the May thread about QA testing being done only on volumeswith uniform brick sizes prompted me to start standardising on a uniformbrick size for each volume in my cluster. My impression is thatimplementing the features needed for users with non-uniform brick sizes isnot a priority for Gluster, and that users are all expected to use uniformbrick sizes. I really think this fact should be stated clearly in theGlusterFS documentation, in the sections on creating volumes in theAdministration Guide for example. That would stop other users from goingdown the path that I did initially, which has given me a real headachebecause I am now having to move tens of terabytes of data off bricks thatare larger than the new standard size.
Regards
Dan.
Hello,
This is really bad news, because I already migrated my data and I justrealized that I am screwed because Gluster just does not care about the bricksizes.
It is impossible to move to uniform brick sizes.
Currently we use 2TB HDDs, but the disks are growing and soon we willprobably use 3TB hdds or whatever other larges sizes appear on the market. Soif we choose to use raid5 and some level of redundancy (for example 6hdds inraid5, no matter what their size is) this sooner or later will lead us to nonuniform bricks which is a problem and it is not correct to expect that wealways can or want to provide uniform size bricks.
With this way of thinking if we currently have 10T from 6x2T in hdd5, at somepoint when there is a 10T on a single disk we will have to use no raid justbecause gluster can not handle non uniform bricks.
Regards,
Deyan
I think Amar might have provided the answer in his posting to the threadyesterday, which has just appeared in my autospam folder.
http://gluster.org/pipermail/gluster-users/2011-August/008579.html
With size option, you can have a hardbound on min-free-disk
This means that you can set a hard limit on min-free-disk, and set a value inGB that is bigger than the biggest file that is ever likely to be written.This looks likely to solve our problem and make non-uniform brick sizes apractical proposition. I wish I had known about this back in May when Iembarked on my cluster restructuring exercise; the issue was discussed in thisthread in May as well:http://gluster.org/pipermail/gluster-users/2011-May/007794.html
Once I have moved all the data off the large bricks and standardised on auniform brick size, it will be relatively easy to stick to this because I useLVM. I create logical volumes for new bricks when a volume needs extending.The only problem with this approach is what happens when the amount of freespace left on a server is less than the size of the brick you want to create.The only option then would be to use new servers, potentially wasting severalTB of free space on existing servers. The standard brick size for most of myvolumes is 3TB, which allows me to use a mixture of small servers and largeservers in a volume and limits the amount of free space that would be wastedif there wasn't quite enough free space on a server to create another brick.Another consequence of having 3TB bricks is that a single server typically hastwo more more bricks belonging to a the same volume, although I do my best todistribute the volumes across different servers in order to spread the load.I am not aware of any problems associated with exporting multiple bricks froma single server and it has not caused me any problems so far that I am aware of.
-Dan.


===> Hi,

I just realize that I actually did not realized Amar's answer. Thank you forbringing this to my attention. It really look like working solution I will giveit a try. I tried to lower the percentage of min free disk space which lookslike working so far, but hardcoding the min free space looks much better.


Thank you once again.

Regards,
Deyan

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] cluster.min-free-disk separate for each, brick

Reply via email to