Re: [Gluster-users] Adding new volumes to DHT

2009-07-22 Thread Moritz Krinke
sorry, in my test they dont get distributed equally. the new volume  
gehts more files (all files with the same size).

How exactly the distribution algorithm works would be interesting?! :-)

Moritz

Am 22.07.2009 um 14:25 schrieb Moritz Krinke:


Barry,

i just did a test about that.
if you add another subvolume to the DHT volume and add new files  
into an existing directory they will be distributed over all DHT- 
subvolumes.


I assume that the distribution on old subvolumes will stop when the  
free-space threshold is reached, but i did not test that.


Moritz


Am 21.07.2009 um 18:57 schrieb Barry Jaspan:

I have a question about this paragraph from the "Understanding DHT  
Translator":


"Currently hash works based on directory level distribution. i.e, a  
given file's parent directory will have information of how the hash  
numbers are mapped to subvolumes. So, adding new node doesn't  
disturb any current setup as the files/dirs present already have  
its information preserved. Whatever new directory gets created,  
will start considering new volume for scheduling files."


The last sentence suggests that if I have a single directory on a  
DHT volume and it is getting full, adding additional subvolumes to  
the DHT volume will not help because all the files in a directory  
will only ever live in subvolumes that existed at the time the  
directory is created.  Is that true?


Also, it sounds like an entry mapping each filename (hash number)  
to a subvolume is stored in the extended attributes of the parent  
directory for that file.  What is the practical limit for the  
number of files that can be stored in a single directory under this  
system?  It seems like eventually doing a lookup in the directory's  
attributes would itself become a very expensive operation.


Thanks,
Barry

___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users





Moritz Krinke
web development

| fotocommunity GmbH
| rheinwerkallee 2
| 53227 Bonn

| tel: +49 228 227888-44
| fax: +49 228 227888-19
| mobil: +49 172 5695482
| mkri...@fotocommunity.net
| http://www.fotocommunity.net

| Geschäftsführer: Andreas Constantin Meyer, Sven Jan Arndt
| HRB 14645 Bonn, Ust-IdNr.: DE814891097


___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Adding new volumes to DHT

2009-07-22 Thread Moritz Krinke

Barry,

i just did a test about that.
if you add another subvolume to the DHT volume and add new files into  
an existing directory they will be distributed over all DHT-subvolumes.


I assume that the distribution on old subvolumes will stop when the  
free-space threshold is reached, but i did not test that.


Moritz


Am 21.07.2009 um 18:57 schrieb Barry Jaspan:

I have a question about this paragraph from the "Understanding DHT  
Translator":


"Currently hash works based on directory level distribution. i.e, a  
given file's parent directory will have information of how the hash  
numbers are mapped to subvolumes. So, adding new node doesn't  
disturb any current setup as the files/dirs present already have its  
information preserved. Whatever new directory gets created, will  
start considering new volume for scheduling files."


The last sentence suggests that if I have a single directory on a  
DHT volume and it is getting full, adding additional subvolumes to  
the DHT volume will not help because all the files in a directory  
will only ever live in subvolumes that existed at the time the  
directory is created.  Is that true?


Also, it sounds like an entry mapping each filename (hash number) to  
a subvolume is stored in the extended attributes of the parent  
directory for that file.  What is the practical limit for the number  
of files that can be stored in a single directory under this  
system?  It seems like eventually doing a lookup in the directory's  
attributes would itself become a very expensive operation.


Thanks,
Barry

___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Design Questions

2009-07-07 Thread Moritz Krinke

Hello,

while doing some research on how to build a System which enables us to  
easily rsync backups from other system on it, i came across GlusterFS,  
and after several hours of reading, im quite impressed and im looking  
forward to implement it ;)


Basicly i would like you to comment on the Design i put together, as  
there are lots of different ways to do things, some might be prefered  
over others.


We need:
15 TB Storage, at least stored in a Raid1 like fashion, Raid5/Raid6  
would be preferable, but i think this is not possible with GlusterFS?!.
While reading the docs i realised i could use this system probably too  
for hosting our images via HTTP, because of features like

- easy to expand with new storage / servers
- io-cache
- Lighttpd Plugin for direct FS access

This way we would not just gain a backup storage for our pictures,  
which are currently served by a mogilefs/varnish/lighttpd cluster but  
also a backup cluster which could serve files directly to our users.
( community-site with lots of pictures, file size varies, but most the  
files are 50 to 300 KiloBytes, but planning on storing files with  
~10MB too)


Great :-)

We've planned to use the following Hardware

5 Servers, each with:
 - quad-core
 - 16 GB Ram
 - 4 x 1,5 TB HDD, no RAID
 - dedicated GBit Ethernet Switched Network

GlusterFS Setup:

 Same config on all nodes, each with

 volume posix -> volume locks -> volume  with io-threads -> volume  
with write-behind -> volume with io-cache size of 14 GB (so 2GB is  
left for the system)

 for each of the 4 drives / mountpoints

 then having config entries for all 20 bricks, using tcp as transport  
type


 then creating cluster/replicate-volumes with always 2 disks on  
different servers,
 and creating a cluster/nufa-volume having the 10 replicate-volumes  
as subvolumes.



 As i understand it, this should provide me with the following:

 - data redundandy: if one disk fails, i can replace the disk and  
GlusterFS automaticlly replicates all the lost data back to the new disk

 AND/OR: the same if the whole server is lost/broken
 - distributed access: a read access to a specific file will always  
go to the same server/drive, regardless from which server it gets  
requested, and will therefore be cached by the io-cache layer on
the specific node which has the file on disk- ok, a little  
network-overhead, but thats better than putting the cache on top of  
the distribute-volume which would result in having the "same" cached  
content on all servers
 - Global Cache with no dublicates of 70GB (5 servers times 14 GB io- 
cache ram per server)
 -> How exactly does the io-cache work? can i specify TTL for a file- 
pattern, or specify which files should not be cached at all... or..  
or? Cant find any specific info on this.
 - I can put apache/lighttpd on all the servers which then have  
direct access to the storage, no need for extra webservers for serving  
static & cacheable content
 - Remote access: i can mount the fs from another Location (another  
DC), securely if i wish through some kind of VPN and use it there for  
backup purposes
 - Expandable: i can just put 2 new servers online, each with 2/4/6/8  
drives


 If you have read and understood ( :-) ) this, i would highly  
appreciate if you could answer my questions and/or comment the input  
you might have.


Thanks a lot,
Moritz


___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users