Re: [Gluster-users] Adding new volumes to DHT
sorry, in my test they dont get distributed equally. the new volume gehts more files (all files with the same size). How exactly the distribution algorithm works would be interesting?! :-) Moritz Am 22.07.2009 um 14:25 schrieb Moritz Krinke: Barry, i just did a test about that. if you add another subvolume to the DHT volume and add new files into an existing directory they will be distributed over all DHT- subvolumes. I assume that the distribution on old subvolumes will stop when the free-space threshold is reached, but i did not test that. Moritz Am 21.07.2009 um 18:57 schrieb Barry Jaspan: I have a question about this paragraph from the "Understanding DHT Translator": "Currently hash works based on directory level distribution. i.e, a given file's parent directory will have information of how the hash numbers are mapped to subvolumes. So, adding new node doesn't disturb any current setup as the files/dirs present already have its information preserved. Whatever new directory gets created, will start considering new volume for scheduling files." The last sentence suggests that if I have a single directory on a DHT volume and it is getting full, adding additional subvolumes to the DHT volume will not help because all the files in a directory will only ever live in subvolumes that existed at the time the directory is created. Is that true? Also, it sounds like an entry mapping each filename (hash number) to a subvolume is stored in the extended attributes of the parent directory for that file. What is the practical limit for the number of files that can be stored in a single directory under this system? It seems like eventually doing a lookup in the directory's attributes would itself become a very expensive operation. Thanks, Barry ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users Moritz Krinke web development | fotocommunity GmbH | rheinwerkallee 2 | 53227 Bonn | tel: +49 228 227888-44 | fax: +49 228 227888-19 | mobil: +49 172 5695482 | mkri...@fotocommunity.net | http://www.fotocommunity.net | Geschäftsführer: Andreas Constantin Meyer, Sven Jan Arndt | HRB 14645 Bonn, Ust-IdNr.: DE814891097 ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Adding new volumes to DHT
Barry, i just did a test about that. if you add another subvolume to the DHT volume and add new files into an existing directory they will be distributed over all DHT-subvolumes. I assume that the distribution on old subvolumes will stop when the free-space threshold is reached, but i did not test that. Moritz Am 21.07.2009 um 18:57 schrieb Barry Jaspan: I have a question about this paragraph from the "Understanding DHT Translator": "Currently hash works based on directory level distribution. i.e, a given file's parent directory will have information of how the hash numbers are mapped to subvolumes. So, adding new node doesn't disturb any current setup as the files/dirs present already have its information preserved. Whatever new directory gets created, will start considering new volume for scheduling files." The last sentence suggests that if I have a single directory on a DHT volume and it is getting full, adding additional subvolumes to the DHT volume will not help because all the files in a directory will only ever live in subvolumes that existed at the time the directory is created. Is that true? Also, it sounds like an entry mapping each filename (hash number) to a subvolume is stored in the extended attributes of the parent directory for that file. What is the practical limit for the number of files that can be stored in a single directory under this system? It seems like eventually doing a lookup in the directory's attributes would itself become a very expensive operation. Thanks, Barry ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Design Questions
Hello, while doing some research on how to build a System which enables us to easily rsync backups from other system on it, i came across GlusterFS, and after several hours of reading, im quite impressed and im looking forward to implement it ;) Basicly i would like you to comment on the Design i put together, as there are lots of different ways to do things, some might be prefered over others. We need: 15 TB Storage, at least stored in a Raid1 like fashion, Raid5/Raid6 would be preferable, but i think this is not possible with GlusterFS?!. While reading the docs i realised i could use this system probably too for hosting our images via HTTP, because of features like - easy to expand with new storage / servers - io-cache - Lighttpd Plugin for direct FS access This way we would not just gain a backup storage for our pictures, which are currently served by a mogilefs/varnish/lighttpd cluster but also a backup cluster which could serve files directly to our users. ( community-site with lots of pictures, file size varies, but most the files are 50 to 300 KiloBytes, but planning on storing files with ~10MB too) Great :-) We've planned to use the following Hardware 5 Servers, each with: - quad-core - 16 GB Ram - 4 x 1,5 TB HDD, no RAID - dedicated GBit Ethernet Switched Network GlusterFS Setup: Same config on all nodes, each with volume posix -> volume locks -> volume with io-threads -> volume with write-behind -> volume with io-cache size of 14 GB (so 2GB is left for the system) for each of the 4 drives / mountpoints then having config entries for all 20 bricks, using tcp as transport type then creating cluster/replicate-volumes with always 2 disks on different servers, and creating a cluster/nufa-volume having the 10 replicate-volumes as subvolumes. As i understand it, this should provide me with the following: - data redundandy: if one disk fails, i can replace the disk and GlusterFS automaticlly replicates all the lost data back to the new disk AND/OR: the same if the whole server is lost/broken - distributed access: a read access to a specific file will always go to the same server/drive, regardless from which server it gets requested, and will therefore be cached by the io-cache layer on the specific node which has the file on disk- ok, a little network-overhead, but thats better than putting the cache on top of the distribute-volume which would result in having the "same" cached content on all servers - Global Cache with no dublicates of 70GB (5 servers times 14 GB io- cache ram per server) -> How exactly does the io-cache work? can i specify TTL for a file- pattern, or specify which files should not be cached at all... or.. or? Cant find any specific info on this. - I can put apache/lighttpd on all the servers which then have direct access to the storage, no need for extra webservers for serving static & cacheable content - Remote access: i can mount the fs from another Location (another DC), securely if i wish through some kind of VPN and use it there for backup purposes - Expandable: i can just put 2 new servers online, each with 2/4/6/8 drives If you have read and understood ( :-) ) this, i would highly appreciate if you could answer my questions and/or comment the input you might have. Thanks a lot, Moritz ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users