On Wed, Sep 7, 2011 at 4:27 PM, Dan Bretherton
<d.a.brether...@reading.ac.uk <mailto:d.a.brether...@reading.ac.uk>>
wrote:
On 17/08/11 16:19, Dan Bretherton wrote:
Dan Bretherton wrote:
On 15/08/11 20:00, gluster-users-requ...@gluster.org
<mailto:gluster-users-requ...@gluster.org> wrote:
Message: 1
Date: Sun, 14 Aug 2011 23:24:46 +0300
From: "Deyan Chepishev -
SuperHosting.BG"<dchepis...@superhosting.bg
<mailto:dchepis...@superhosting.bg>>
Subject: [Gluster-users] cluster.min-free-disk
separate for each
brick
To: gluster-users@gluster.org
<mailto:gluster-users@gluster.org>
Message-ID:<4e482f0e.3030...@superhosting.bg
<mailto:4e482f0e.3030...@superhosting.bg>>
Content-Type: text/plain; charset=UTF-8;
format=flowed
Hello,
I have a gluster set up with very different
brick sizes.
brick1: 9T
brick2: 9T
brick3: 37T
with this configuration if I set the parameter
cluster.min-free-disk to 10% it
applies to all bricks which is quite
uncomfortable with these brick sizes,
because 10% for the small bricks are ~ 1T but
for the big brick it is ~3.7T and
what happens at the end is that if all brick go
to 90% usage and I continue
writing, the small ones eventually fill up to
100% while the big one has enough
free space.
My question is, is there a way to set
cluster.min-free-disk per brick instead
setting it for the entire volume or any other
way to work around this problem ?
Thank you in advance
Regards,
Deyan
Hello Deyan,
I have exactly the same problem and I have asked
about it before - see links below.
http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/
http://gluster.org/pipermail/gluster-users/2011-May/007788.html
My understanding is that the patch referred to in
Amar's reply in the May thread prevents a
"migrate-data" rebalance operation failing by
running out of space on smaller bricks, but that
doesn't solve the problem we are having. Being able
to set min-free-disk for each brick separately would
be useful, as would being able to set this value as
a number of bytes rather than a percentage.
However, even if these features were present we
would still have a problem when the amount of free
space becomes less than min-free-disk, because this
just results in a warning message in the logs and
doesn't actually prevent more files from being
written. In other words, min-free-disk is a soft
limit rather than a hard limit. When a volume is
more than 90% full there may still be hundreds of
gigabytes of free space spread over the large
bricks, but the small bricks may each only have a
few gigabytes left of even less. Users do "df" and
see lots of free space in the volume so they
continue writing files. However, when GlusterFS
chooses to write a file to a small brick, the write
fails with "device full" errors if the file grows
too large, which is often the case here with files
typically several gigabytes in size for some
applications.
I would really like to know if there is a way to
make min-free-disk a hard limit. Ideally, GlusterFS
would chose a brick on which to write a file based
on how much free space it has left rather than
choosing a brick at random (or however it is done
now). That would solve the problem of non-uniform
brick sizes without the need for a hard
min-free-disk limit.
Amar's comment in the May thread about QA testing
being done only on volumes with uniform brick sizes
prompted me to start standardising on a uniform
brick size for each volume in my cluster. My
impression is that implementing the features needed
for users with non-uniform brick sizes is not a
priority for Gluster, and that users are all
expected to use uniform brick sizes. I really think
this fact should be stated clearly in the GlusterFS
documentation, in the sections on creating volumes
in the Administration Guide for example. That would
stop other users from going down the path that I did
initially, which has given me a real headache
because I am now having to move tens of terabytes of
data off bricks that are larger than the new
standard size.
Regards
Dan.
Hello,
This is really bad news, because I already migrated my
data and I just realized that I am screwed because
Gluster just does not care about the brick sizes.
It is impossible to move to uniform brick sizes.
Currently we use 2TB HDDs, but the disks are growing
and soon we will probably use 3TB hdds or whatever other
larges sizes appear on the market. So if we choose to
use raid5 and some level of redundancy (for example
6hdds in raid5, no matter what their size is) this
sooner or later will lead us to non uniform bricks which
is a problem and it is not correct to expect that we
always can or want to provide uniform size bricks.
With this way of thinking if we currently have 10T from
6x2T in hdd5, at some point when there is a 10T on a
single disk we will have to use no raid just because
gluster can not handle non uniform bricks.
Regards,
Deyan
I think Amar might have provided the answer in his posting
to the thread yesterday, which has just appeared in my
autospam folder.
http://gluster.org/pipermail/gluster-users/2011-August/008579.html
With size option, you can have a hardbound on min-free-disk
This means that you can set a hard limit on min-free-disk,
and set a value in GB that is bigger than the biggest file
that is ever likely to be written. This looks likely to
solve our problem and make non-uniform brick sizes a
practical proposition. I wish I had known about this back
in May when I embarked on my cluster restructuring exercise;
the issue was discussed in this thread in May as well:
http://gluster.org/pipermail/gluster-users/2011-May/007794.html
Once I have moved all the data off the large bricks and
standardised on a uniform brick size, it will be relatively
easy to stick to this because I use LVM. I create logical
volumes for new bricks when a volume needs extending. The
only problem with this approach is what happens when the
amount of free space left on a server is less than the size
of the brick you want to create. The only option then would
be to use new servers, potentially wasting several TB of
free space on existing servers. The standard brick size for
most of my volumes is 3TB, which allows me to use a mixture
of small servers and large servers in a volume and limits
the amount of free space that would be wasted if there
wasn't quite enough free space on a server to create another
brick. Another consequence of having 3TB bricks is that a
single server typically has two more more bricks belonging
to a the same volume, although I do my best to distribute
the volumes across different servers in order to spread the
load. I am not aware of any problems associated with
exporting multiple bricks from a single server and it has
not caused me any problems so far that I am aware of.
-Dan.
Hello Deyan,
Have you tried giving min-free-disk a value in gigabytes, and if
so does it prevent new files being written to your bricks when
they are nearly full? I recently tried it myself and found that
min-free-disk had no effect all. I deliberately filled my
test/backup volume and most of the bricks became 100 full. I
set min-free-disk to "20GB", as reported in "gluster volume ...
info" below.
cluster.min-free-disk: 20GB
Unless I am doing something wrong it seems as though we can not
"have a hardbound on min-free-disk" after all, and uniform brick
size is therefore an essential requirement. It still doesn't
say that in the documentation, at least not in the volume
creation sections.
-Dan.
On 08/09/11 06:35, Raghavendra Bhat wrote:
> This is how it is supposed to work.
>
> Suppose a distribute volume is created with 2 bricks. 1st brick is
having 25GB of free space, 2nd disk has 35 GB of free space. If one
sets a 30GB of minimum-free-disk through volume set (gluster volume
set <volname> min-free-disk 30GB), then whenever files are created,
if the file is hashed to the 1st brick (which has 25GB of free
space), then actual file will be created in the 2nd brick to which a
linkfile will be created in the 1st brick. So the linkfile points to
the actual file. A warning message indicating minimum free disk
limit has been crosses and adding more nodes will be printed in the
glusterfs log file. So any file which is hashed to the 1st brick
will be created in the 2nd brick.
>
> Once the free space of 2nd brick also comes below 30 GB, then the
files will be created in the respective hashed bricks only. There
will be a warning message in the log file about the 2nd brick also
crossing the minimum free disk limit.
>
> Regards,
> Raghavendra Bhat