Re: [Gluster-users] GlusterFS performance questions

2011-03-15 Thread Ed W
On 14/03/2011 22:18, Alexander Todorov wrote:
> Hello folks,
> I'm looking for GlusterFS performance metrics. What I'm interested in
> particular is:
> 
> * Do adding more bricks to a volume make reads faster?
> * How do replica count affect that?

Although no one seems to be really talking about performance in these
terms, I think the limiting factor is usually going to be network
latency.  In very approximate terms, each time you touch a file in
Glusterfs you need to ask every other brick for it's opinion as to
whether you have the newest file or not.  Therefore your file IO/sec is
bounded by your network latency...

So I would presume that those who get infiniband network hardware and
it's few uS latency times see far better performance than those of us on
gigabit and the barely sub millisec latency that this entails?

So I suspect you can predict rough performance while changing the
hardware by thinking about how the network constrains you.  eg consider
your access pattern, small files/large files, small reads/large reads,
number of bricks, etc

Note it doesn't seem popular to discuss performance in these terms, but
I think if you read through the old posts in the lists you will see that
really it's this network latency vs required access patterns which
determine whether they feel gluster is fast/slow?

To jump to a conclusion, it makes sense that large reads on large files
do much better than accessing lots of small files...  If you make the
files large enough then you start to test the disk performance, etc

Good luck

Ed W
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS performance questions

2011-03-14 Thread Alexander Todorov

Hello folks,
I'm looking for GlusterFS performance metrics. What I'm interested in particular 
is:


* Do adding more bricks to a volume make reads faster?
* How do replica count affect that?

Thanks,
Alexander.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-30 Thread Craig Box
> OCFS2 is a shared-disk filesystem, and in EC2 neither ephemeral storage
> nor EBS can be mounted on more than one instance simultaneously.
> Therefore, you'd need something to provide a shared-disk abstraction
> within an AZ.  DRBD mode can do this, and I think it's even reentrant so
> that the devices created this way can themselves be used as components
> for the inter-AZ-replication devices, but active/active mode isn't
> recommended and I don't think you can connect more than two nodes this
> way.

What I am doing is using DRBD for shared disk between AZs, which (with
OCFS2) then gives me a standard POSIX file system, which I can share
inside the AZ with GlusterFS.  A bit of a duct-tape job perhaps, but
seems like it will work.  The proof will be in the testing, which I am
just building instances for now.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-30 Thread Jeff Darcy
On 06/30/2010 10:22 AM, Craig Box wrote:
> OK, so this brings me to Plan B.  (Feel free to suggest a plan C if you can.)
> 
> I want to have six nodes, three in each availability zone, replicate a
> Mercurial repository.  Here's some art:
> 
> [gluster c/s] [gluster c/s] | [gluster c/s] [gluster c/s]
> |
>[gluster s]  |  [gluster s]
>   [OCFS 2]  |  [OCFS 2]
>   [ DRBD ] --- [ DRBD ]
> 
> DRBD doing the cross-AZ replication, and a three node GlusterFS
> cluster inside each AZ.  That way, any one machine going down should
> still mean all the rest of the nodes can access the files.
> 
> Sound believable?

OCFS2 is a shared-disk filesystem, and in EC2 neither ephemeral storage
nor EBS can be mounted on more than one instance simultaneously.
Therefore, you'd need something to provide a shared-disk abstraction
within an AZ.  DRBD mode can do this, and I think it's even reentrant so
that the devices created this way can themselves be used as components
for the inter-AZ-replication devices, but active/active mode isn't
recommended and I don't think you can connect more than two nodes this
way.  What's really needed, and I'm slightly surprised doesn't already
exist, is a DRBD proxy that can be connected as a destination by several
local DRBD sources, and then preserve request order even across devices
as it becomes a DRBD source and ships those requests to another proxy in
another AZ.  Linbit's proxy doesn't seem to be designed for that
particular purpose.  The considerations for dm-replicator are
essentially the same BTW.

An async/long-distance replication translator has certainly been a
frequent topic of discussion between me, the Gluster folks, and others.
 I have plans to shoot for full N-way active/active replication, but
with that ambition comes complexity and we'll probably see simpler forms
(e.g. two-way active/passive) much earlier.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-30 Thread Craig Box
OK, so this brings me to Plan B.  (Feel free to suggest a plan C if you can.)

I want to have six nodes, three in each availability zone, replicate a
Mercurial repository.  Here's some art:

[gluster c/s] [gluster c/s] | [gluster c/s] [gluster c/s]
|
   [gluster s]  |  [gluster s]
  [OCFS 2]  |  [OCFS 2]
  [ DRBD ] --- [ DRBD ]

DRBD doing the cross-AZ replication, and a three node GlusterFS
cluster inside each AZ.  That way, any one machine going down should
still mean all the rest of the nodes can access the files.

Sound believable?

Craig

On Tue, Jun 29, 2010 at 5:16 PM, Count Zero  wrote:
> My short (and probably disappointing) answer is that with all my attempts, 
> and weeks trying to research and improve the performance, and asking here on 
> the mailing lists, that I have both failed to make it work over WAN, and that 
> authoritative answers were that "Wan is in the works".
>
> So for now, until WAN is officially supported, Keep it working within the 
> same zone, and use some other replication method to synchronize the two zones.
>
>
>
> On Jun 29, 2010, at 7:12 PM, Craig Box wrote:
>
>> Hi all,
>>
>> Spent the day reading the docs, blog posts, this mailing list, and
>> lurking on IRC, but still have a few questions to ask.
>>
>> My goal is to implement a cross-availability-zone file system in
>> Amazon EC2, and ensure that even if one server goes down, or is
>> rebooted, all clients can continue, reading from/writing to a
>> secondary server.
>>
>> The primary purpose is to share some data files for running a web site
>> for an open source project - a Mercurial repository and some shared
>> data, such as wiki images - but the main code/images/CSS etc for the
>> site will be stored on each instance and managed by version control.
>>
>> As we have 150GB ephemeral storage (aka instance store, as opposed to
>> EBS) free on each instance, I thought it might be good if we were to
>> use that as the POSIX backend for Gluster, and have a complete copy of
>> the Mercurial repository on each system, with each client using its
>> local brick as the read subvolume for speed.  That way, you don't need
>> to go to the network for reads, which ought to be far more common than
>> writes.
>>
>> We want to have the files available to seven servers, four in one AZ
>> and three in another.
>>
>> I think it best if we maximise client performance, rather than
>> replication speed; if one of our nodes is a few seconds behind, it's
>> not the end of the world, but if it consistently takes a few seconds
>> on every file write, that would be irritating.
>>
>> Some questions which I hope someone can answer:
>>
>> 1. Somewhat obviously, when we turn on replication and introduce a
>> second server, write speed to the volume drops drastically  If we use
>> client-side replication, we can have redundancy in servers.  Does this
>> mean that GlusterFS client blocks, waiting for the client to write to
>> every server?  If we changed to server-side replication, would this
>> background the replication overhead?
>>
>> 2. If we were to use server-side replication, should we use the
>> write-behind translator in the server stack?
>>
>> 3. I was originally using 3.0.2 packaged with Ubuntu 10.04, and have
>> tried upgrading to 3.0.5rc7 (as suggested on this list) for better
>> performance with the quick-read translator, and other fixes.  However,
>> this actually seemed to make write performance *worse*!  Should this
>> be expected?
>>
>> (Our write test is totally scientific *cough*: we cp -a a directory of
>> files onto the mounted volume.)
>>
>> 4. Should I expect a different performance pattern using the instance
>> storage, rather than an EBS volume?  I found this post helpful -
>> http://www.sirgroane.net/2010/03/tuning-glusterfs-for-apache-on-ec2/ -
>> but it talks more about reading files than writing them, and it writes
>> off some translators as not useful because of the way EBS works.
>>
>> 5. Is cluster/replicate even the right answer?  Could we do something
>> with cluster/distribute - is this, in effect, a RAID 10?  It doesn't
>> seem that replicate could possibly scale up to the number of nodes you
>> hear about other people using GlusterFS with.
>>
>> 6. Could we do something crafty where you read directly from the POSIX
>> volume but you do all your writes through GlusterFS?  I see it's
>> unsupported, but I guess that is just because you might get old data
>> by reading the disk, rather than the client.
>>
>> Any advice that anyone can provide is welcome, and my thanks in advance!
>>
>> Regards
>> Craig
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-u

Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-29 Thread Craig Box
Thanks, that led me to your slightly longer archived posts on the
subject, which helps shed light on the issue.  Quoting from that post:

> The problem with WAN is that when the gluster client receives a request to 
> read
> the file, it first checks with all the nodes in the cluster, to make sure 
> there
> are no discrepancies. Only after all nodes have answered, it will read the
> local file (if it's replicated locally).

Unfortunately that does put us back at square one, where we have to
think of a way to keep the masters in each zone in sync.
Unison/rsync?  Any other suggestions?

On Tue, Jun 29, 2010 at 5:16 PM, Count Zero  wrote:
> My short (and probably disappointing) answer is that with all my attempts, 
> and weeks trying to research and improve the performance, and asking here on 
> the mailing lists, that I have both failed to make it work over WAN, and that 
> authoritative answers were that "Wan is in the works".
>
> So for now, until WAN is officially supported, Keep it working within the 
> same zone, and use some other replication method to synchronize the two zones.
>
>
>
> On Jun 29, 2010, at 7:12 PM, Craig Box wrote:
>
>> Hi all,
>>
>> Spent the day reading the docs, blog posts, this mailing list, and
>> lurking on IRC, but still have a few questions to ask.
>>
>> My goal is to implement a cross-availability-zone file system in
>> Amazon EC2, and ensure that even if one server goes down, or is
>> rebooted, all clients can continue, reading from/writing to a
>> secondary server.
>>
>> The primary purpose is to share some data files for running a web site
>> for an open source project - a Mercurial repository and some shared
>> data, such as wiki images - but the main code/images/CSS etc for the
>> site will be stored on each instance and managed by version control.
>>
>> As we have 150GB ephemeral storage (aka instance store, as opposed to
>> EBS) free on each instance, I thought it might be good if we were to
>> use that as the POSIX backend for Gluster, and have a complete copy of
>> the Mercurial repository on each system, with each client using its
>> local brick as the read subvolume for speed.  That way, you don't need
>> to go to the network for reads, which ought to be far more common than
>> writes.
>>
>> We want to have the files available to seven servers, four in one AZ
>> and three in another.
>>
>> I think it best if we maximise client performance, rather than
>> replication speed; if one of our nodes is a few seconds behind, it's
>> not the end of the world, but if it consistently takes a few seconds
>> on every file write, that would be irritating.
>>
>> Some questions which I hope someone can answer:
>>
>> 1. Somewhat obviously, when we turn on replication and introduce a
>> second server, write speed to the volume drops drastically  If we use
>> client-side replication, we can have redundancy in servers.  Does this
>> mean that GlusterFS client blocks, waiting for the client to write to
>> every server?  If we changed to server-side replication, would this
>> background the replication overhead?
>>
>> 2. If we were to use server-side replication, should we use the
>> write-behind translator in the server stack?
>>
>> 3. I was originally using 3.0.2 packaged with Ubuntu 10.04, and have
>> tried upgrading to 3.0.5rc7 (as suggested on this list) for better
>> performance with the quick-read translator, and other fixes.  However,
>> this actually seemed to make write performance *worse*!  Should this
>> be expected?
>>
>> (Our write test is totally scientific *cough*: we cp -a a directory of
>> files onto the mounted volume.)
>>
>> 4. Should I expect a different performance pattern using the instance
>> storage, rather than an EBS volume?  I found this post helpful -
>> http://www.sirgroane.net/2010/03/tuning-glusterfs-for-apache-on-ec2/ -
>> but it talks more about reading files than writing them, and it writes
>> off some translators as not useful because of the way EBS works.
>>
>> 5. Is cluster/replicate even the right answer?  Could we do something
>> with cluster/distribute - is this, in effect, a RAID 10?  It doesn't
>> seem that replicate could possibly scale up to the number of nodes you
>> hear about other people using GlusterFS with.
>>
>> 6. Could we do something crafty where you read directly from the POSIX
>> volume but you do all your writes through GlusterFS?  I see it's
>> unsupported, but I guess that is just because you might get old data
>> by reading the disk, rather than the client.
>>
>> Any advice that anyone can provide is welcome, and my thanks in advance!
>>
>> Regards
>> Craig
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
__

Re: [Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-29 Thread Count Zero
My short (and probably disappointing) answer is that with all my attempts, and 
weeks trying to research and improve the performance, and asking here on the 
mailing lists, that I have both failed to make it work over WAN, and that 
authoritative answers were that "Wan is in the works".

So for now, until WAN is officially supported, Keep it working within the same 
zone, and use some other replication method to synchronize the two zones.



On Jun 29, 2010, at 7:12 PM, Craig Box wrote:

> Hi all,
> 
> Spent the day reading the docs, blog posts, this mailing list, and
> lurking on IRC, but still have a few questions to ask.
> 
> My goal is to implement a cross-availability-zone file system in
> Amazon EC2, and ensure that even if one server goes down, or is
> rebooted, all clients can continue, reading from/writing to a
> secondary server.
> 
> The primary purpose is to share some data files for running a web site
> for an open source project - a Mercurial repository and some shared
> data, such as wiki images - but the main code/images/CSS etc for the
> site will be stored on each instance and managed by version control.
> 
> As we have 150GB ephemeral storage (aka instance store, as opposed to
> EBS) free on each instance, I thought it might be good if we were to
> use that as the POSIX backend for Gluster, and have a complete copy of
> the Mercurial repository on each system, with each client using its
> local brick as the read subvolume for speed.  That way, you don't need
> to go to the network for reads, which ought to be far more common than
> writes.
> 
> We want to have the files available to seven servers, four in one AZ
> and three in another.
> 
> I think it best if we maximise client performance, rather than
> replication speed; if one of our nodes is a few seconds behind, it's
> not the end of the world, but if it consistently takes a few seconds
> on every file write, that would be irritating.
> 
> Some questions which I hope someone can answer:
> 
> 1. Somewhat obviously, when we turn on replication and introduce a
> second server, write speed to the volume drops drastically  If we use
> client-side replication, we can have redundancy in servers.  Does this
> mean that GlusterFS client blocks, waiting for the client to write to
> every server?  If we changed to server-side replication, would this
> background the replication overhead?
> 
> 2. If we were to use server-side replication, should we use the
> write-behind translator in the server stack?
> 
> 3. I was originally using 3.0.2 packaged with Ubuntu 10.04, and have
> tried upgrading to 3.0.5rc7 (as suggested on this list) for better
> performance with the quick-read translator, and other fixes.  However,
> this actually seemed to make write performance *worse*!  Should this
> be expected?
> 
> (Our write test is totally scientific *cough*: we cp -a a directory of
> files onto the mounted volume.)
> 
> 4. Should I expect a different performance pattern using the instance
> storage, rather than an EBS volume?  I found this post helpful -
> http://www.sirgroane.net/2010/03/tuning-glusterfs-for-apache-on-ec2/ -
> but it talks more about reading files than writing them, and it writes
> off some translators as not useful because of the way EBS works.
> 
> 5. Is cluster/replicate even the right answer?  Could we do something
> with cluster/distribute - is this, in effect, a RAID 10?  It doesn't
> seem that replicate could possibly scale up to the number of nodes you
> hear about other people using GlusterFS with.
> 
> 6. Could we do something crafty where you read directly from the POSIX
> volume but you do all your writes through GlusterFS?  I see it's
> unsupported, but I guess that is just because you might get old data
> by reading the disk, rather than the client.
> 
> Any advice that anyone can provide is welcome, and my thanks in advance!
> 
> Regards
> Craig
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS performance questions for Amazon EC2 deployment

2010-06-29 Thread Craig Box
Hi all,

Spent the day reading the docs, blog posts, this mailing list, and
lurking on IRC, but still have a few questions to ask.

My goal is to implement a cross-availability-zone file system in
Amazon EC2, and ensure that even if one server goes down, or is
rebooted, all clients can continue, reading from/writing to a
secondary server.

The primary purpose is to share some data files for running a web site
for an open source project - a Mercurial repository and some shared
data, such as wiki images - but the main code/images/CSS etc for the
site will be stored on each instance and managed by version control.

As we have 150GB ephemeral storage (aka instance store, as opposed to
EBS) free on each instance, I thought it might be good if we were to
use that as the POSIX backend for Gluster, and have a complete copy of
the Mercurial repository on each system, with each client using its
local brick as the read subvolume for speed.  That way, you don't need
to go to the network for reads, which ought to be far more common than
writes.

We want to have the files available to seven servers, four in one AZ
and three in another.

I think it best if we maximise client performance, rather than
replication speed; if one of our nodes is a few seconds behind, it's
not the end of the world, but if it consistently takes a few seconds
on every file write, that would be irritating.

Some questions which I hope someone can answer:

1. Somewhat obviously, when we turn on replication and introduce a
second server, write speed to the volume drops drastically  If we use
client-side replication, we can have redundancy in servers.  Does this
mean that GlusterFS client blocks, waiting for the client to write to
every server?  If we changed to server-side replication, would this
background the replication overhead?

2. If we were to use server-side replication, should we use the
write-behind translator in the server stack?

3. I was originally using 3.0.2 packaged with Ubuntu 10.04, and have
tried upgrading to 3.0.5rc7 (as suggested on this list) for better
performance with the quick-read translator, and other fixes.  However,
this actually seemed to make write performance *worse*!  Should this
be expected?

(Our write test is totally scientific *cough*: we cp -a a directory of
files onto the mounted volume.)

4. Should I expect a different performance pattern using the instance
storage, rather than an EBS volume?  I found this post helpful -
http://www.sirgroane.net/2010/03/tuning-glusterfs-for-apache-on-ec2/ -
but it talks more about reading files than writing them, and it writes
off some translators as not useful because of the way EBS works.

5. Is cluster/replicate even the right answer?  Could we do something
with cluster/distribute - is this, in effect, a RAID 10?  It doesn't
seem that replicate could possibly scale up to the number of nodes you
hear about other people using GlusterFS with.

6. Could we do something crafty where you read directly from the POSIX
volume but you do all your writes through GlusterFS?  I see it's
unsupported, but I guess that is just because you might get old data
by reading the disk, rather than the client.

Any advice that anyone can provide is welcome, and my thanks in advance!

Regards
Craig
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users