Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22

2010-01-05 Thread phil cryer
This is *very* helpful, thanks for taking the time Larry!  Looking
forward to giving feedback once we have the cluster up.

P

On Thu, Dec 17, 2009 at 11:23 AM, Tejas N. Bhise te...@gluster.com wrote:
 Thanks, Larry, for the comprehensive information.

 Phil, I hope that answers a lot of your questions. Feel free to ask more, we 
 have a great community here.

 Regards,
 Tejas.

 - Original Message -
 From: Larry Bates larry.ba...@vitalesafe.com
 To: gluster-users@gluster.org, p...@cryer.us
 Sent: Thursday, December 17, 2009 9:47:30 PM GMT +05:30 Chennai, Kolkata, 
 Mumbai, New Delhi
 Subject: Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22

 Phi.l,

 I think the real question you need to ask has to do with why we are
 using GlusterFS at all and what happens when something fails.  Normally
 GlusterFS is used to provide scalability, redundancy/recovery, and
 performance.  For many applications performance will be the least of the
 worries so we concentrate on scalability and redundancy/recovery.
 Scalability can be achieved no matter which way you configure your
 servers.  Using distribute translator (DHT) you can unify all the
 servers into a single virtual storage space.  The problem comes when you
 look at what happens when you have a machine/drive failures and need the
 redundancy/recovery capabilities of GlusterFS.  By putting 36Tb of
 storage on a single server and exposing it as a single volume (using
 either hardware or software RAID), you will have to replicate that to a
 replacement server after a failure.  Replicating 36Tb will take a lot of
 time and CPU cycles.  If you keep things simple (JBOD) and use AFR to
 replicate drives between servers and use DHT to unify everything
 together, now you only have to move 1.5Tb/2Tb when a drive fails.  You
 will also note that you get to use 100% of your disk storage this way
 instead of wasting 1 drive per array with RAID5 or two drives with
 RAID6.  Normally with RAID5/6 it is also imperative that you have a hot
 spare per array, which means you waste an additional driver per array.
 To make RAID5/6 work with no single point of failure you have to do
 something like RAID50/60 across two controllers which gets expensive and
 much more difficult to manage and to grow.  Implementing GlusterFS using
 more modest hardware makes all those issues go away.  Just use
 GlusterFS to provide the RAID-like capabilities (via AFR and DHT).

 Personally I doubt that I would set up my storage the way you describe.
 I probably would (and have) set it up with more smaller servers.
 Something like three times as many 2U servers with 8x2Tb drives each (or
 even 6 times as many 1U servers with 4x2Tb drives each) and forget the
 expensive RAID SATA controllers, they aren't necessary and are just a
 single point of failure that you can eliminate.  In addition you will
 enjoy significant performance improvements because you have:

 1) Many parallel paths to storage (36x1U or 18x2U vs 6x5U servers).
 Gigabit Ethernet is fast, but still will limit bandwidth to a single
 machine.
 2) Write performance on RAID5/6 is never going to be as fast as JBOD.
 3) You should have much more memory caching available (36x8Gb = 256Gb
 memory or 18x8Gb memory = 128Gb vs maybe 6x16Gb = 96Gb)
 4) Management of the storage is done in one place..GlusterFS.  No messy
 RAID controller setups to document/remember.
 5) You can expand in the future in a much more granular and controlled
 fashion.  Add 2 machines (1 for replication) and you get 8Tb (using 2Tb
 drives) of storage.  When you want to replace a machine, just set up new
 one, fail the old one, and let GlusterFS build the new one for you (AFR
 will do the heavy lifting).  CPUs will get faster, hard drives will get
 faster and bigger in the future, so make it easy to upgrade.  A small
 number of BIG machines makes it a lot harder to do upgrades as new
 hardware becomes available.
 6) Machine failures (motherboard, power supply, etc.) will effect much
 less of your storage network.  Having a spare 1U machine around as a hot
 spare doesn't cost much (maybe $1200).  Having a spare 5U monster around
 does (probably close to $6000).

 IMHO 36 x 1U or 18 x 2U servers shouldn't cost any more (and maybe less)
 than the big boxes you are looking to buy.  They are commodity items.
 If you go the 1U route you don't need anything but a machine, with
 memory and 4 hard drives (all server motherboards come with at least 4
 SATA ports).  By using 2Tb drives, I think you would find that the cost
 would be actually less.  By NOT using hardware RAID you can also NOT use
 RAID-class hard drives which cost about $100 each more than non-RAID
 hard drives.  Just that change alone could save you 6 x 24 = 144 x $100
 = $14,400!  JBOD just doesn't need RAID-class hard drives because you
 don't need the sophisticated firmware that the RAID-class hard drives
 provide.  You still will want quality hard drives, but failures will
 have such a low impact

Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22

2010-01-05 Thread Liam Slusser
Larry  All,

I would much rather rebuild a bad drive with a raid controller then
have to wait for Gluster to do it.  With a large number of files doing
a ls -aglR can take weeks.  Also you don't NEED enterprise drives with
a raid controller, i use desktop 1.5tb Seagate drives which happy as a
clam on a 3ware SAS card under a SAS expander.

liam


On Thu, Dec 17, 2009 at 8:17 AM, Larry Bates larry.ba...@vitalesafe.com wrote:
 Phi.l,

 I think the real question you need to ask has to do with why we are using
 GlusterFS at all and what happens when something fails.  Normally GlusterFS
 is used to provide scalability, redundancy/recovery, and performance.  For
 many applications performance will be the least of the worries so we
 concentrate on scalability and redundancy/recovery.  Scalability can be
 achieved no matter which way you configure your servers.  Using distribute
 translator (DHT) you can unify all the servers into a single virtual storage
 space.  The problem comes when you look at what happens when you have a
 machine/drive failures and need the redundancy/recovery capabilities of
 GlusterFS.  By putting 36Tb of storage on a single server and exposing it as
 a single volume (using either hardware or software RAID), you will have to
 replicate that to a replacement server after a failure.  Replicating 36Tb
 will take a lot of time and CPU cycles.  If you keep things simple (JBOD)
 and use AFR to replicate drives between servers and use DHT to unify
 everything together, now you only have to move 1.5Tb/2Tb when a drive fails.
  You will also note that you get to use 100% of your disk storage this way
 instead of wasting 1 drive per array with RAID5 or two drives with RAID6.
  Normally with RAID5/6 it is also imperative that you have a hot spare per
 array, which means you waste an additional driver per array.  To make
 RAID5/6 work with no single point of failure you have to do something like
 RAID50/60 across two controllers which gets expensive and much more
 difficult to manage and to grow.  Implementing GlusterFS using more modest
 hardware makes all those issues go away.  Just use GlusterFS to provide
 the RAID-like capabilities (via AFR and DHT).

 Personally I doubt that I would set up my storage the way you describe.  I
 probably would (and have) set it up with more smaller servers.  Something
 like three times as many 2U servers with 8x2Tb drives each (or even 6 times
 as many 1U servers with 4x2Tb drives each) and forget the expensive RAID
 SATA controllers, they aren't necessary and are just a single point of
 failure that you can eliminate.  In addition you will enjoy significant
 performance improvements because you have:

 1) Many parallel paths to storage (36x1U or 18x2U vs 6x5U servers).  Gigabit
 Ethernet is fast, but still will limit bandwidth to a single machine.
 2) Write performance on RAID5/6 is never going to be as fast as JBOD.
 3) You should have much more memory caching available (36x8Gb = 256Gb memory
 or 18x8Gb memory = 128Gb vs maybe 6x16Gb = 96Gb)
 4) Management of the storage is done in one place..GlusterFS.  No messy RAID
 controller setups to document/remember.
 5) You can expand in the future in a much more granular and controlled
 fashion.  Add 2 machines (1 for replication) and you get 8Tb (using 2Tb
 drives) of storage.  When you want to replace a machine, just set up new
 one, fail the old one, and let GlusterFS build the new one for you (AFR will
 do the heavy lifting).  CPUs will get faster, hard drives will get faster
 and bigger in the future, so make it easy to upgrade.  A small number of BIG
 machines makes it a lot harder to do upgrades as new hardware becomes
 available.
 6) Machine failures (motherboard, power supply, etc.) will effect much less
 of your storage network.  Having a spare 1U machine around as a hot spare
 doesn't cost much (maybe $1200).  Having a spare 5U monster around does
 (probably close to $6000).

 IMHO 36 x 1U or 18 x 2U servers shouldn't cost any more (and maybe less)
 than the big boxes you are looking to buy.  They are commodity items.  If
 you go the 1U route you don't need anything but a machine, with memory and 4
 hard drives (all server motherboards come with at least 4 SATA ports).  By
 using 2Tb drives, I think you would find that the cost would be actually
 less.  By NOT using hardware RAID you can also NOT use RAID-class hard
 drives which cost about $100 each more than non-RAID hard drives.  Just that
 change alone could save you 6 x 24 = 144 x $100 = $14,400!  JBOD just
 doesn't need RAID-class hard drives because you don't need the sophisticated
 firmware that the RAID-class hard drives provide.  You still will want
 quality hard drives, but failures will have such a low impact that it is
 much less of a problem.

 By using more smaller machines you also eliminate the need for redundant
 power supplies (which would be a requirement in your large boxes because it
 would be a single point of failure on a large 

Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22

2010-01-05 Thread Konstantin Sharlaimov
Author is exaggerating. We recover 6 TB RAID-5 on desktop-class hardware 
in less then 6 hours. Our RAID is controlled by LSR (Linux Software 
RAID). Performance is not good while rebuilding a single node, but 
GlusterFS replicate/distribute translators help.


Arvids Godjuks wrote:

Consider this - a rebuild of 1.5-2 TB HDD in raid5/6 array can easily
take up to few days to complete. At that moment your storage at that
node will not perform well. I read a week ago very good article with
research of this area, only thing it's in russian, but it mentions a
few english sources too. Maybe google translate will help.
Here's the original link: http://habrahabr.ru/blogs/hardware/78311/
Here's the google translate version:
http://translate.google.com/translate?js=yprev=_thl=enie=UTF-8layout=1eotf=1u=http%3A%2F%2Fhabrahabr.ru%2Fblogs%2Fhardware%2F78311%2Fsl=rutl=en
(looks quite neet by the way)

2010/1/5 Liam Slusser lslus...@gmail.com:
  

Larry  All,

I would much rather rebuild a bad drive with a raid controller then
have to wait for Gluster to do it.  With a large number of files doing
a ls -aglR can take weeks.  Also you don't NEED enterprise drives with
a raid controller, i use desktop 1.5tb Seagate drives which happy as a
clam on a 3ware SAS card under a SAS expander.

liam




___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
  

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22

2010-01-05 Thread Arvids Godjuks
2010/1/6 Liam Slusser lslus...@gmail.com:
 Arvids  Larry,

 Interesting read, Arvids. ...

Well, I think JBOD style makes it much more less painfull loosing a
disk than a whole RAID array. Single disk will be restored far faster
than a whole raid array. Especially if you can do hotswap of your
disks, well yes, hardware should support this ofcourse.
Anyway, I do think that a correct combination of AFR and DHT will make
up for that disk loss. If only Gluster could do data relocation as a
node/disk goes offline.

Konstantin Sharlaimov
RAID-6 isn't raid-5. Raid-6 has more parity control disks, that means
it can do paralel reads from more than one disk and performance
doesn't degrade as much as with raid-5. That's why I think you get
your rstore fast - your disk is able to recive data fast enought to do
writes at high speed: actualy 8 hours for 1.5TB is about 72MB/sec
average write speed - not many disks can keep up such speeds all the
time.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22

2010-01-05 Thread Liam Slusser
Yeah im waiting for Gluster to come out with a 3.0.1 release before i
upgrade.  I'll make sure to do my best to compare 3.0.1 with OneFS's
performance/recovery/etc once i upgrade.  I still have two Isilon
clusters which aren't in production anymore in our lab i can play
around with.

And i've been waiting for brfs for awhile now, it can't come soon enough!

thanks,
liam

On Tue, Jan 5, 2010 at 7:48 PM, Harshavardhana har...@gluster.com wrote:
 Hi Liam,

 GlusterFS does checksum based self-heal since the 3.0 release, i would
 believe your experiences are from 2.0? which has issues of doing a full file
 self-heal which will a lot of time.  But i would suggest an upgrade with
 3.0.1
 release which is due Feb 1st week for your cluster. 3.x releases with new
 self-heal you should get very less rebuild times. If its possible to compare
 the
 3.0.1 rebuild times with the One-FS from Isilon should help us improve it
 too.

 Thanks
\
 I would suggest wait for brtfs.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22

2009-12-17 Thread Tejas N. Bhise
Thanks, Larry, for the comprehensive information.

Phil, I hope that answers a lot of your questions. Feel free to ask more, we 
have a great community here.

Regards,
Tejas.

- Original Message -
From: Larry Bates larry.ba...@vitalesafe.com
To: gluster-users@gluster.org, p...@cryer.us
Sent: Thursday, December 17, 2009 9:47:30 PM GMT +05:30 Chennai, Kolkata, 
Mumbai, New Delhi
Subject: Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22

Phi.l,

I think the real question you need to ask has to do with why we are 
using GlusterFS at all and what happens when something fails.  Normally 
GlusterFS is used to provide scalability, redundancy/recovery, and 
performance.  For many applications performance will be the least of the 
worries so we concentrate on scalability and redundancy/recovery.  
Scalability can be achieved no matter which way you configure your 
servers.  Using distribute translator (DHT) you can unify all the 
servers into a single virtual storage space.  The problem comes when you 
look at what happens when you have a machine/drive failures and need the 
redundancy/recovery capabilities of GlusterFS.  By putting 36Tb of 
storage on a single server and exposing it as a single volume (using 
either hardware or software RAID), you will have to replicate that to a 
replacement server after a failure.  Replicating 36Tb will take a lot of 
time and CPU cycles.  If you keep things simple (JBOD) and use AFR to 
replicate drives between servers and use DHT to unify everything 
together, now you only have to move 1.5Tb/2Tb when a drive fails.  You 
will also note that you get to use 100% of your disk storage this way 
instead of wasting 1 drive per array with RAID5 or two drives with 
RAID6.  Normally with RAID5/6 it is also imperative that you have a hot 
spare per array, which means you waste an additional driver per array.  
To make RAID5/6 work with no single point of failure you have to do 
something like RAID50/60 across two controllers which gets expensive and 
much more difficult to manage and to grow.  Implementing GlusterFS using 
more modest hardware makes all those issues go away.  Just use 
GlusterFS to provide the RAID-like capabilities (via AFR and DHT).

Personally I doubt that I would set up my storage the way you describe.  
I probably would (and have) set it up with more smaller servers.  
Something like three times as many 2U servers with 8x2Tb drives each (or 
even 6 times as many 1U servers with 4x2Tb drives each) and forget the 
expensive RAID SATA controllers, they aren't necessary and are just a 
single point of failure that you can eliminate.  In addition you will 
enjoy significant performance improvements because you have:

1) Many parallel paths to storage (36x1U or 18x2U vs 6x5U servers).  
Gigabit Ethernet is fast, but still will limit bandwidth to a single 
machine.
2) Write performance on RAID5/6 is never going to be as fast as JBOD.
3) You should have much more memory caching available (36x8Gb = 256Gb 
memory or 18x8Gb memory = 128Gb vs maybe 6x16Gb = 96Gb)
4) Management of the storage is done in one place..GlusterFS.  No messy 
RAID controller setups to document/remember.
5) You can expand in the future in a much more granular and controlled 
fashion.  Add 2 machines (1 for replication) and you get 8Tb (using 2Tb 
drives) of storage.  When you want to replace a machine, just set up new 
one, fail the old one, and let GlusterFS build the new one for you (AFR 
will do the heavy lifting).  CPUs will get faster, hard drives will get 
faster and bigger in the future, so make it easy to upgrade.  A small 
number of BIG machines makes it a lot harder to do upgrades as new 
hardware becomes available.
6) Machine failures (motherboard, power supply, etc.) will effect much 
less of your storage network.  Having a spare 1U machine around as a hot 
spare doesn't cost much (maybe $1200).  Having a spare 5U monster around 
does (probably close to $6000).

IMHO 36 x 1U or 18 x 2U servers shouldn't cost any more (and maybe less) 
than the big boxes you are looking to buy.  They are commodity items.  
If you go the 1U route you don't need anything but a machine, with 
memory and 4 hard drives (all server motherboards come with at least 4 
SATA ports).  By using 2Tb drives, I think you would find that the cost 
would be actually less.  By NOT using hardware RAID you can also NOT use 
RAID-class hard drives which cost about $100 each more than non-RAID 
hard drives.  Just that change alone could save you 6 x 24 = 144 x $100 
= $14,400!  JBOD just doesn't need RAID-class hard drives because you 
don't need the sophisticated firmware that the RAID-class hard drives 
provide.  You still will want quality hard drives, but failures will 
have such a low impact that it is much less of a problem.

By using more smaller machines you also eliminate the need for redundant 
power supplies (which would be a requirement in your large boxes because 
it would be a single point