Re: [Gluster-users] Small files
Matan - We replicate to two nodes. But since a zfs send | zfs recv communicates one-way, I'd think you could do as many as you want. It just might take a little bit longer - although you should be able to run multiple at a time as long as you had enough bandwidth over the network. Ours are connected via a dedicated 10gigabit network and see around 4-5gbit/sec on a large commit. How long the replication job takes depends on how much is changed between the two snapshots. Even though the seek time with a SSD is quick, you'll still get far greater throughput in sequential read/writing vs small random accesses. You can test it yourself. Create a directory with 100 64MB files and another directory with 64,000 100K files. Now copy it from one place to another and see for yourself which is faster. Sequential reading always wins. And this is true with both Gluster and HDFS. In HDFS small files exacerbates the problem because you need to contact the NameNode to get the block information and then contact the DataNode to get the block. Think of it like this. Reading 1000 64KB files in HDFS means 1000 requests to the NameNode and 1000 requests to the datanodes while reading 1 64MB file is one trip to the NameNode and one trip the the Datanode to get the same amount of data. You can read more about this issue here: http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ thanks, liam On Thu, Jan 29, 2015 at 12:30 PM, Matan Safriel dev.ma...@gmail.com wrote: Hi Liam, Thanks for the comprehensive reply (!) How many nodes do you safely replicate to with ZFS? I don't think seek time is much of a concern with SSD by the way, so it does seem that glusterfs is much better for the small files scenario than HDFS, which as you say is very different in key aspects, and couldn't quite follow why rebalancing is slow or slower than in the case of HDFS actually, unless you just meant that HDFS works at a large block level and no more. Perhaps you'd care to comment ;) Matan On Thu, Jan 29, 2015 at 9:15 PM, Liam Slusser lslus...@gmail.com wrote: Matan - I'll do my best to take a shot at answering this... They're completely different technologies. HDFS is not posix compliant and is not a mountable filesystem while Gluster is. In HDFS land, every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes. So 10 million files would each up about 3 gigs of memory. Furthermore was designed for streaming large files - the default blocksize in HDFS is 64MB. Gluster doesn't have a central namenode, so having millions of files doesn't put a tax on it in the same way. But, again, small files causes lots of small seeks to handle the replication tasks/checks and generally isn't very efficient. So don't expect blazing performance... Doing rebalancing and rebuilding of Gluster bricks can be extremely painful since Gluster isn't a block level filesystem - so it will have to read each file one at a time. If you want to use HDFS and don't need a mountable filesystem have a look at HBASE. We tacked the small files problem by using a different technology. I have an image store of about 120 million+ small-file images, I needed a mountable filesystem which was posix compliant and ended up doing a ZFS setup - using the built in replication to create a few identical copies on different servers for both load balancing and reliability. So we update one server and than have a few read-only copies serving the data. Changes get replicated, at a block level, every few minutes. thanks, liam On Thu, Jan 29, 2015 at 4:29 AM, Matan Safriel dev.ma...@gmail.com wrote: Hi, Is glusterfs much better than hdfs for the many small files scenario? Thanks, Matan ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Writing is slow when there are 10 million files.
I had about 100 million files in Gluster and it was unbelievably painfully slow. We had to ditch it for other technology. On Mon, Apr 14, 2014 at 11:24 PM, Franco Broi franco.b...@iongeo.comwrote: I seriously doubt this is the right filesystem for you, we have problems listing directories with a few hundred files, never mind millions. On Tue, 2014-04-15 at 10:45 +0900, Terada Michitaka wrote: Dear All, I have a problem with slow writing when there are 10 million files. (Top level directories are 2,500.) I configured GlusterFS distributed cluster(3 nodes). Each node's spec is below. CPU: Xeon E5-2620 (2.00GHz 6 Core) HDD: SATA 7200rpm 4TB*12 (RAID 6) NW: 10GBEth GlusterFS : glusterfs 3.4.2 built on Jan 3 2014 12:38:06 This cluster(volume) is mounted on CentOS via FUSE client. This volume is storage of our application and I want to store 3 hundred million to 5 billion files. I performed a writing test, writing 32KByte file × 10 million to this volume, and encountered a problem. (1) Writing is so slow and slow down as number of files increases. In non clustering situation(one node), this node's writing speed is 40 MByte/sec at random, But writing speed is 3.6MByte/sec on that cluster. (2) ls command is very slow. About 20 second. Directory creation takes about 10 seconds at lowest. Question: 1)5 Billion files are possible to store in GlusterFS? Has someone succeeded to store billion files to GlusterFS? 2) Could you give me a link for a tuning guide or some information of tuning? Thanks. -- Michitaka Terada ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Writing is slow when there are 10 million files.
We consolidated hardware into a single large ZFS server with a redundant hot slave. thanks, liam On Mon, Apr 14, 2014 at 11:33 PM, Jeffrey 'jf' Lim jfs.wo...@gmail.comwrote: On Tue, Apr 15, 2014 at 2:30 PM, Liam Slusser lslus...@gmail.com wrote: I had about 100 million files in Gluster and it was unbelievably painfully slow. We had to ditch it for other technology. and what is (or was) that other technology? -jf -- He who settles on the idea of the intelligent man as a static entity only shows himself to be a fool. Mensan / Full-Stack Technical Polymath / System Administrator 12 years over the entire web stack: Performance, Sysadmin, Ruby and Frontend ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Writing is slow when there are 10 million files.
Our application also stores the path of the file in a database. Accessing a file directly is normally pretty speedy. However, to get the files into the database required searching parts of the filesystem which was really slow. We also had users using the filesystem fixing things which was all unix shell ls/cp/mv etc, and again, really slow. And the biggest problem I had was if one of the nodes went down for a reboot/patching/whatever, to resync the filesystems took weeks because of the huge number of files. thanks, liam On Tue, Apr 15, 2014 at 3:15 AM, Terada Michitaka terra...@gmail.comwrote: To Liam: I had about 100 million files in Gluster and it was unbelievably painfully slow. We had to ditch it for other technology. Has slow down occurred on writing file?, listing files, or both? In our application, path of the data is managed in database. ls is slow, but not influence to my application, but writing file slow down is critical. To All: I uploaded a statistics when writing test(32kbyte x 10 million, 6 bricks). http://gss.iijgio.com/gluster/gfs-profile_d03r2.txt Line 15, average-latency value is about 30 ms. I cannot judge this value is a normal(ordinary?) performance or not. Is it slow? Thanks, --Michika Terada 2014-04-15 16:05 GMT+09:00 Franco Broi franco.b...@iongeo.com: My bug report is here https://bugzilla.redhat.com/show_bug.cgi?id=1067256 On Mon, 2014-04-14 at 23:51 -0700, Joe Julian wrote: If you experience pain using any filesystem, you should see your doctor. If you're not actually experiencing pain, perhaps you should avoid hyperbole and instead talk about what version you tried, what your tests were, how you tried to fix it, and what the results were. If you're using a current version with a kernel that has readdirplus support for fuse it shouldn't be that bad. If it is, file a bug report - especially if you have the skills to help diagnose the problem. On April 14, 2014 11:30:26 PM PDT, Liam Slusser lslus...@gmail.com wrote: I had about 100 million files in Gluster and it was unbelievably painfully slow. We had to ditch it for other technology. On Mon, Apr 14, 2014 at 11:24 PM, Franco Broi franco.b...@iongeo.com wrote: I seriously doubt this is the right filesystem for you, we have problems listing directories with a few hundred files, never mind millions. On Tue, 2014-04-15 at 10:45 +0900, Terada Michitaka wrote: Dear All, I have a problem with slow writing when there are 10 million files. (Top level directories are 2,500.) I configured GlusterFS distributed cluster(3 nodes). Each node's spec is below. CPU: Xeon E5-2620 (2.00GHz 6 Core) HDD: SATA 7200rpm 4TB*12 (RAID 6) NW: 10GBEth GlusterFS : glusterfs 3.4.2 built on Jan 3 2014 12:38:06 This cluster(volume) is mounted on CentOS via FUSE client. This volume is storage of our application and I want to store 3 hundred million to 5 billion files. I performed a writing test, writing 32KByte file × 10 million to this volume, and encountered a problem. (1) Writing is so slow and slow down as number of files increases. In non clustering situation(one node), this node's writing speed is 40 MByte/sec at random, But writing speed is 3.6MByte/sec on that cluster. (2) ls command is very slow. About 20 second. Directory creation takes about 10 seconds at lowest. Question: 1)5 Billion files are possible to store in GlusterFS? Has someone succeeded to store billion files to GlusterFS? 2) Could you give me a link for a tuning guide or some information of tuning? Thanks. -- Michitaka Terada ___ Gluster-users mailing list Gluster
Re: [Gluster-users] Switch recommendations
Just to put in my two cents. I have 12 Dell 6248 connected via 10g to a core 10g Brocade switch and haven't had any problem. They work very well, are super reliable, and are easy to manage. I do recommend using the latest firmware off of dell's website though!! My only complaint is they do not have dual power supplies though you can get a 1u dell power thingy that can act as a second PS for 3 or 4 of them. But it takes up another U of space, id much rather have an option to put in another PS. I don't do anything crazy with them, just basic snmp stats, vlan groups, and some trunk/port-channel groups, and they do all that very well. They're great top of rack switches - which is what we use them for. For the price they are hard to beat. liam On Fri, Jan 27, 2012 at 5:04 AM, Dan Bretherton d.a.brether...@reading.ac.uk wrote: Dear All, I need to buy a bigger GigE switch for my GlusterFS cluster and I am trying to decide whether or not a much more expensive one would be justified. I have limited experience with networking so I don't know if it would be appropriate to spend £500, £1500 or £3500 for a 48-port switch. Those rough costs are based on a comparison of 3 Dell Powerconnect switches: the 5548 (bigger version of what we have now), the 6248 and the 7048. The servers in the cluster are nothing special - mostly Supermicro with SATA drives and 1GigE network adapters. I can only justify spending more than ~£500 if I can be sure that users would notice the difference. Some of the users' applications do lots of small reads and writes, and they do run much more slowly if all the servers are not connected to the same switch, as is the case now while I don't have a big enough switch. Any advice or comments would be much appreciated. Regards Dan. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Recommendations for busy static web server replacement
64k strip, and yes, one big 22+2 raid6 array. Liam On Feb 7, 2012 11:41 PM, Brian Candler b.cand...@pobox.com wrote: On Tue, Feb 07, 2012 at 05:11:01PM -0800, Liam Slusser wrote: We run a similar setup here. I use gluster 2.x to strip/replicate 8 30tb xfs bricks (24x1.5tb raid6) into a single 120tb array. What stripe size are you using on the RAID6 array? Do you put all 24 drives into a single RAID6 group (22+2), or two groups of 12+2, or something else? Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] A faster way to to replicate?
All - Recently one of bricks lost a hard drive, during the rebuild (3ware 9690 controller) we lost another drive and then had a few ECC errors on a third. This was on a ~30tb 24 drive RAID6 array. I was able to force the controller to rebuild with a ignore_ECC flag which has completed successfully and the XFS partition appears to be fine. In the 3ware device logs I see a dozen alerts about bad sectors/ecc errors. The partition is at 97% full so its a pretty good chance we have some data corruption. But not to worry, Gluster to the rescue right?? We currently have two copies of our data and gluster handles the replication between them - let's call them A and B clusters. Our A cluster is the cluster having issues. We've been planning on adding a third C cluster for extra reliability and mostly for the added performance. So since I have a working good copy of the brick that is having issues on our B cluster I started a gluster sync of our B cluster to the new C cluster. And OMG its so slow. I've been running a ls -alR for the last week and its only done 3.8% (replicated ~9.9 million files) of our total space with an estimate finish date of another 223 days - thats the end of August! So my question is how can I get this done quicker? Can I rsync one brick to another brick directly - I know that will not copy the extended attributes correctly and I believe will mess up gluster right? Anybody have some great ideas? I'm running gluster 2.0.9 with 64bit Centos 5.7/6.2. Each A/B/C cluster is 4 x 30tb xfs raid6 bricks for a total of ~120tb (84tb in use). liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ZFS + Linux + Glusterfs for a production ready 100+ TB NAS on cloud
I've used ZFS in lots of different roles and I've found that out of the box ZFS performs decent but to get really great performance out of the (zfs) filesystem you really need to tune it for the application. ZFS has tons and tons of somewhat hidden features (edit /etc/system and reboot type stuff) and if set correctly has outstanding performance. liam On Thu, Sep 29, 2011 at 10:58 AM, Joe Landman land...@scalableinformatics.com wrote: This said, please understand that there is a (significant) performance cost to all those nice features in ZFS. And there is a reason why its not generally considered a high performance file system. So if you start building with it, you shouldn't necessarily think that the whole is going to be faster than the sum of the parts. Might be worse. This is a caution from someone who has tested/shipped many different file systems in the past. ZFS included, on Solaris and other machines. There is a very significant performance penalty one pays for using some of these features. You have to decide if this penalty is worth it. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ZFS + Linux + Glusterfs for a production ready 100+ TB NAS on cloud
I have a very large, 500tb, Gluster cluster on Centos Linux but I use the XFS filesystem in a production role. Each xfs filesystem (brick) is around 32tb in size. No problems all runs very well. ls ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ZFS + Linux + Glusterfs for a production ready 100+ TB NAS on cloud
I've also heard it can be slower however I've never done any performance tests on the same hardware with ext3/4 vs XFS since my partitions are so big ext3/4 is just not an option. With that said I've been pleased with the performance I get and am a happy XFS user. ls On Sep 24, 2011 12:31 PM, Craig Carl cr...@gestas.net wrote: XFS is a valid alternative to ZFS on Linux. If I remember correctly any operation that requires modifying a lot of xattr's can be slower than ext*, have you noticed anything like that? You might see slower rebalances or self healing? Craig Sent from a mobile device, please excuse my tpyos. On Sep 24, 2011, at 22:14, Liam Slusser lslus...@gmail.com wrote: I have a very large, 500tb, Gluster cluster on Centos Linux but I use the XFS filesystem in a production role. Each xfs filesystem (brick) is around 32tb in size. No problems all runs very well. ls ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster usable as open source solution
Glad I could help, Uwe. We do use hardware RAID inside the servers. The reason being is because gluster isn't block level so doing a remirror/resync with 80+ million files takes weeks - so if we lose a hard drive the rebuild time would be unacceptable. With hardware raid a rebuild takes only a day or two. We access our clusters mostly via the native fuse gluster client because performance via the NFS client is somewhat slow. However we do have a few clients that connect via NFS so we can mount readonly or that don't require a lot of performance. liam On Thu, Aug 4, 2011 at 11:13 AM, Uwe Kastens kiste...@googlemail.com wrote: Hello Liam, Thank you for sharing information. This is very kind and helpful. Indeed there are some questions: - Are you working with hardware raid inside the server? - How are you acessing the storage? NFS/gluster native? Kind Regards Uwe 2011/8/4 Liam Slusser lslus...@gmail.com I run two Gluster clusters in a very production roll using open source Gluster, all supported in-house mostly by myself. We have a 4-node 240tb after raid (576tb raw) cluster supporting a farm of audio transcoders. This one was built not so much for speed (its not very speedy), but to be reliable and cheap. Over the last two years we've had a few small issues but nothing major. Very reliable. All built on commodity hardware (Supermicro chassis's, Seagate 7.2k desktop harddrives). I also run a smaller 6-node 120tb (432tb raw) as storage for a pool of public facing apache webservers. This smaller cluster serves content to feed our CDN providers which feeds all our users. We can saturate a gigabit line (with 2-3meg http objects) without issues. (Same Supermicro chassis's and Seagate 7.2k desktop harddrives) This cluster has never gone down in the last two years it has been running. Our two homebuilt Gluster clusters replaced nearly 1/4 of a million dollars in Isilon hardware for less then the cost of the Isilon annual support contract while doubling the space at the same time. It has saved our company hundreds of thousands of dollars and has been hugely successful. You're welcome to email me offline if you would like more information. liam On Thu, Aug 4, 2011 at 3:38 AM, Uwe Kastens kiste...@googlemail.com wrote: Hi, I looked at gluster over the past year. It looks nice but the commercial option is not so interesting, since it is not possible to evaluate a storage solution within 30 days. More than one any other storage platform its a matter of trust, if the scaling is working. So my questions to this mailinglist are: - Anybody using the open source edition in a bigger production environment? How is the expierence over a longer time? - Since gluster seems only to offer support within the enterprise version. Anybody out there how is supporting the open source edition? Regards Uwe ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] HW raid or not
I'm in the HW raid camp. Mostly because gluster is not block level, so with large quantities of files replication can take days or weeks. In my case a rebuild/resync can take weeks because of how many files/directories I have in my cluster. With hardware RAID I can just replace the disk and a rebuild happens automatically and very quickly. liam On Mon, Aug 8, 2011 at 4:12 AM, Gabriel-Adrian Samfira samfiragabr...@gmail.com wrote: We use raw disks with our setup. Gluster takes care of the replication part, so RAID would be useless for us. Performance wise, you are better off just adding a new brick and let gluster do the rest. Best regards, Gabriel On Mon, Aug 8, 2011 at 9:54 AM, Uwe Kastens kiste...@googlemail.com wrote: Hi, I know, that there is no general answer to this question :) Is it better to use HW Raid or LVM as gluster backend or raw disks? Regards Uwe ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster usable as open source solution
I run two Gluster clusters in a very production roll using open source Gluster, all supported in-house mostly by myself. We have a 4-node 240tb after raid (576tb raw) cluster supporting a farm of audio transcoders. This one was built not so much for speed (its not very speedy), but to be reliable and cheap. Over the last two years we've had a few small issues but nothing major. Very reliable. All built on commodity hardware (Supermicro chassis's, Seagate 7.2k desktop harddrives). I also run a smaller 6-node 120tb (432tb raw) as storage for a pool of public facing apache webservers. This smaller cluster serves content to feed our CDN providers which feeds all our users. We can saturate a gigabit line (with 2-3meg http objects) without issues. (Same Supermicro chassis's and Seagate 7.2k desktop harddrives) This cluster has never gone down in the last two years it has been running. Our two homebuilt Gluster clusters replaced nearly 1/4 of a million dollars in Isilon hardware for less then the cost of the Isilon annual support contract while doubling the space at the same time. It has saved our company hundreds of thousands of dollars and has been hugely successful. You're welcome to email me offline if you would like more information. liam On Thu, Aug 4, 2011 at 3:38 AM, Uwe Kastens kiste...@googlemail.com wrote: Hi, I looked at gluster over the past year. It looks nice but the commercial option is not so interesting, since it is not possible to evaluate a storage solution within 30 days. More than one any other storage platform its a matter of trust, if the scaling is working. So my questions to this mailinglist are: - Anybody using the open source edition in a bigger production environment? How is the expierence over a longer time? - Since gluster seems only to offer support within the enterprise version. Anybody out there how is supporting the open source edition? Regards Uwe ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] hardware raid controller
That's not a hardware raid controller. The raid is done in software via the raid driver. You can probably find a linux driver however its a really really crappy raid card. I'd recommend getting something else like a 3ware/lsi etc. liam On Fri, Jul 15, 2011 at 1:55 AM, Derk Roesink derkroes...@viditech.nl wrote: Hello! Im trying to install my first Gluster Storage Platform server. It has a Jetway JNF99FL-525-LF motherboard with an internal raid controller (based on a Intel ICH9R chipset) which has 4x 1tb drives for data that i would like to run in a RAID5 configuration It seems Gluster doesnt support the raid controller.. Because i still see the 4 disks as 'servers' in the WebUI. Any ideas?! Kind Regards, Derk ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?
Gluster is threaded and will take advantage of multiple CPU hardware and memory, however having a fast disk subsystem is far more important. Having a lot of memory with huge bricks isn't very necessary IMO because even with 32g of ram your cache hit ratio across huge 30+tb bricks is so insanely small that it doesn't really make any real world difference. I have a 240tb cluster over 16 bricks (4 physical servers each exporting 4 30tb bricks) and another 120tb cluster over 8 bricks (4 physical servers each exporting 2 30tb bricks). Hardware wise both my clusters are basically the same. Supermicro SC846E1-R900B 4u 24 drive chassis, dual 2.2ghz quadcore xeons, 8g ram, 3ware 9690sa-4i4e SAS raid controller, 24 x Seagate 1.5tb SATA 7200rpm hard drives in each chassis. Each brick is a raid6 over all 24 drives per chassis. I daisy chain the chassis's together via SAS cables. So on my larger cluster I daisy chain 3 more 24 drive chassis's off the back of the head node. The smaller cluster I only daisy chain one chassis off the back. Lots of people prefer not to do raid and have gluster handle the file replication (replicate) between bricks. My problem with that is with a huge amount of files (I have nearly 100 million files on my larger cluster) that a rebuild (ls -alR) takes 2-3 weeks. And since those Seagate drives are crap (I lose maybe 1-3 drives a month!) I would constantly be rebuilding almost all the time. Using hardware raid makes life much easier for me. Instead of having 384 bricks I have 16. When I lose a drive I just hot swap it and let the 3ware controller rebuild the raid6 array. The rebuild time on the 3ware depends on the workload but its anywhere from 2-5 days normally. One time I lost a drive in the middle of a rebuild (so one failed and one in a rebuild state) and was able to hotswap the new failed drive and it correctly rebuilt the array with two failed drives without any problems or downtime on the cluster. Win! So I'm a big fan of hardware raid, especially the 3ware controllers. They handle the slow non-enterprise Seagate drives very well. I've tried LSI, Dell Perc 5e/6e, and Supermicro (LSI) controller and they all had issues with drive timeouts. A few recommendations when using the 3ware controllers, disable SMARTD in Linux (it pisses off the 3ware controller) and the 3ware controller keeps an eye on the SMART on each disk anyway, set the block readahead in linux to 16384 (/sbin/blockdev --setra 16384 /dev/sdX), upgrade the firmware on the 3ware controller to the newest version from 3ware, use the newest 3ware drives and not the included driver bundled with whatever linux distro you use, spend the $100 and make sure you get the optional battery backup module for the controller, and use nagios to check your raid status! Oh, and if you use desktop commodity hard drives, make sure you have a bunch of spares on hand. :) Even with hardware raid I still use gluster's replication to provide me redundancy so I can do patches and system maintenance without downtime to my clients. I mirror bricks between head nodes and then use distribute to glue all the replicated bricks together. I have two Dell 1950 1u public facing webservers (Linux/Apache) using the gluster fuse mount connected via a private backend network to my smaller cluster. My average file request size is around 3megs (10-15 requests per second), and i've been able to push 800mbit/sec of http traffic from those two clients. Might have been higher but my firewall only has gigabit ethernet which was basically saturated at that point. I only use a 128meg gluster client cache because I'm feeding my CDN so the requests are very random and I very rarely see two requests for the same file. Thats pretty awesome random read performance if you ask me considering the hardware. I start getting uncomfortable with anymore than 600mbit/sec of traffic as the service read times off the bricks on the gluster servers start getting quite high. Those 1.5tb Seagate hard drives are cheap, $80 a drive, but they're not very fast at random reads. Hope that helps! liam On Wed, Apr 27, 2011 at 1:39 AM, Martin Schenker martin.schen...@profitbricks.com wrote: Hi all! I'm new to the Gluster system and tried to find answers to some simple questions (and couldn't find the information with Google etc.) -does Gluster spread it's cpu load across a multicore environment? So does it make sense to have 50 core units as Gluster server? CPU loads seem to go up quite high during file system repairs so spreading / multithreading should help? What kind of CPUs are working well? How much memory does help the preformance? -Are there any recommendations for commodity hardware? We're thinking of 36 slot 4U servers, what kind of controllers DO work well for IO speed? Any real life experiences? Does it dramatically improve the performance to increase the number of controllers per disk? The aim is for a ~80-120T file system with 2-3
Re: [Gluster-users] Server side or client side AFR
Problem with server side AFR is you loose your redundancy in the event that one of your server goes down because the client is only connecting to one of the two servers. So if your client is connected to the server that went down, you're SOL. liam On Fri, Apr 22, 2011 at 4:05 PM, Nobodys Home n1sh...@yahoo.com wrote: Hello all, I prefer server side AFR for my topology and I think it makes overall sense for most configurations. However, I don't think the 3.1 documentation regarding the creation of a replicated distributed volume is server side after a bit of testing. It seems like I'm in a bit of a pickle: 1. Can I only expand volumes on the fly if I use the gluster cli to initially define my volumes? 2. Most of the examples regarding server side AFR do not show how to add bricks to a predifined server.vol configuration on the fly. If there is a way to do this can I get help on this? Best Regards, Nobodys Home ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Backup Strategy
Netbackup is great and can probably backup directly from a glusterFS client mount, however, the license and software cost for a few clients and one server/media server is nearly $50k. Not exactly cheap. Id look into Amanda backup if I was on a budget and wanted to backup to tape. Another option is to just do rsync your gluster cluster to a Sun Solaris server with a ZFS partition. Then you can do nightly zfs snapshots of your data (snapshots only save what changes so it uses very little space). liam On Wed, Mar 9, 2011 at 11:40 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks! have you heard of netbackup? Our co. already has license for it. I think it can be used. On Wed, Mar 9, 2011 at 11:11 AM, Sabuj Pattanayek sab...@gmail.com wrote: I read the docs. But here you go : http://lmgtfy.com/?q=backuppc+howto On Wed, Mar 9, 2011 at 1:05 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks! Is there a short blog or steps that I can look at. documentaion looks overwhelming at first look :) On Wed, Mar 9, 2011 at 10:53 AM, Sabuj Pattanayek sab...@gmail.com wrote: for the amount of features that you get with backuppc, it's worth the fairly painless setup. Btw, we've found that it's better/faster to use tar via backuppc (it supports rsync as well) to do the backups rather than rsync in backuppc. Rsync can be really slow if you have thousands/millions of files. On Wed, Mar 9, 2011 at 12:50 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is there a problem with using just rsync vs backupcc? I need to read about backupcc and how easy it is to setup. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 3.1.1 64-bit only?
You can - its just not supported or very well tested. liam On Mon, Dec 13, 2010 at 10:07 AM, Matt Keating matt.keating.li...@gmail.com wrote: Will it not run at all if I compile on a 32bit system? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs client waiting on SYN_SENT to connect...
Just wanted to update you all. Turns out the problem is my Juniper Firewall - sort of. I've created a service in our Juniper that describes Gluster and allowed the tcp session to never timeout. The problem comes when a server crashes and the TCP connection isn't cleaned up. It looks like the gluster client always starts using the same outbound (source) TCP port and in our firewall that source/dest port combination is already in use (never times out right) and the firewall isn't allowing it to be created again - so its blocked. So right now if i do a netstat -pan tcp0 1 10.10.10.101:996 10.20.10.102:6996 SYN_SENT23491/glusterfs tcp0 1 10.10.10.101:997 10.20.10.102:6996 SYN_SENT23491/glusterfs tcp0 1 10.10.10.101:100010.20.10.102:6996 SYN_SENT23491/glusterfs tcp0 0 10.10.10.101:100110.20.10.102:6996 ESTABLISHED 23491/glusterfs tcp0 0 10.10.10.101:999 10.20.10.101:6996 ESTABLISHED 23491/glusterfs tcp0 1 10.10.10.101:998 10.20.10.101:6996 SYN_SENT23491/glusterfs tcp0 1 10.10.10.101:100310.20.10.101:6996 SYN_SENT23491/glusterfs tcp0 1 10.10.10.101:100210.20.10.101:6996 SYN_SENT23491/glusterfs Now if i kill the gluster process and restart it againnotice the source port doesn't change... tcp0 1 10.10.10.101:996 10.20.10.102:6996 SYN_SENT23687/glusterfs tcp0 1 10.10.10.101:997 10.20.10.102:6996 SYN_SENT23687/glusterfs tcp0 1 10.10.10.101:100010.20.10.102:6996 SYN_SENT23687/glusterfs tcp0 0 10.10.10.101:100110.20.10.102:6996 ESTABLISHED 23687/glusterfs tcp0 0 10.10.10.101:999 10.20.10.101:6996 ESTABLISHED 23687/glusterfs tcp0 1 10.10.10.101:998 10.20.10.101:6996 SYN_SENT23687/glusterfs tcp0 1 10.10.10.101:100310.20.10.101:6996 SYN_SENT23687/glusterfs tcp0 1 10.10.10.101:100210.20.10.101:6996 SYN_SENT23687/glusterfs Now if i kill and restart a few times...i can get lucky and get a different source port...but you can see i'm still missing a few bricks. tcp0 0 10.10.10.101:994 10.20.10.102:6996 ESTABLISHED 23745/glusterfs tcp0 0 10.10.10.101:995 10.20.10.102:6996 ESTABLISHED 23745/glusterfs tcp0 0 10.10.10.101:998 10.20.10.102:6996 ESTABLISHED 23745/glusterfs tcp0 1 10.10.10.101:100010.20.10.102:6996 SYN_SENT23745/glusterfs tcp0 0 10.10.10.101:997 10.20.10.101:6996 ESTABLISHED 23745/glusterfs tcp0 0 10.10.10.101:996 10.20.10.101:6996 ESTABLISHED 23745/glusterfs tcp0 1 10.10.10.101:100310.20.10.101:6996 SYN_SENT23745/glusterfs tcp0 1 10.10.10.101:100210.20.10.101:6996 SYN_SENT23745/glusterfs Now telnet works always because it always picks a random source port: $ telnet 10.20.10.102 6996 Trying 10.20.10.102... Connected to glusterserver (10.20.10.102). Escape character is '^]'. $ netstat -pan|grep telne tcp0 0 10.10.10.101:58757 10.20.10.102:6996 ESTABLISHED 23622/telnet Why does gluster not use a more random source port?? I'm going to have to dig through the Juniper docs to see if i can manually close an active session (lets hope) which should fix my immediate problem but it doesn't really fix the long term problem. Thoughts? thanks, liam On Fri, Dec 3, 2010 at 6:51 PM, Liam Slusser lslus...@gmail.com wrote: Ah the two different IPs are because I was changing my IPs for this mailing list and I guess I forgot that one. :) Will try added a static route. Also going to snoop traffic and see if the gluster client is actually getting to the server or being blocked by the firewall. Ill letcha all know what I find. Thanks for the ideas. Liam On Dec 3, 2010 6:32 PM, mki-gluste...@mozone.net wrote: On Fri, Dec 03, 2010 at 04:25:18PM -0800, Liam Slusser wrote: [r...@client~]# netstat -pan|grep glus tcp 0 1 10.8.10.107:1000 10.8.11.102:6996 SYN_SENT 3385/glusterfs from the gluster client log: However, the port is obviously open... [r...@client~]# telnet 10.8.11.102 6996 Trying 10.2.56.102... Connected to glusterserverb (10.8.11.102). Escape character is '^]'. ^] telnet close Connection closed. Looking further... why is your telnet trying 10.2.56.102 when you clearly specified 10.8.11.102? Also, what happens if you do a specific route for the 10.8.11.0/24 block thru the appropriate gw without relying on the default gw to route for you
[Gluster-users] glusterfs client waiting on SYN_SENT to connect...
Hey all, I've run into a weird problem. I have a few client boxes that occasionally crash due to a non-gluster related problem. But once the box comes back up i cannot get the Gluster client to reconnect to the bricks. Centos 5 64bit and Gluster 2.0.9 df shows: df: `/mnt/mymount': Transport endpoint is not connected [r...@client~]# netstat -pan|grep glus tcp0 1 10.8.10.107:100010.8.11.102:6996 SYN_SENT3385/glusterfs tcp0 1 10.8.10.107:100110.8.11.102:6996 SYN_SENT3385/glusterfs tcp0 1 10.8.10.107:998 10.8.11.102:6996 SYN_SENT3385/glusterfs tcp0 1 10.8.10.107:996 10.8.11.102:6996 SYN_SENT3385/glusterfs tcp0 1 10.8.10.107:100310.8.11.101:6996 SYN_SENT3385/glusterfs tcp0 1 10.8.10.107:100210.8.11.101:6996 SYN_SENT3385/glusterfs tcp0 1 10.8.10.107:997 10.8.11.101:6996 SYN_SENT3385/glusterfs tcp0 1 10.8.10.107:999 10.8.11.101:6996 SYN_SENT3385/glusterfs from the gluster client log: +--+ [2010-12-03 15:48:28] W [glusterfsd.c:526:_log_if_option_is_invalid] readahead: option 'page-size' is not recognized [2010-12-03 15:48:28] N [glusterfsd.c:1306:main] glusterfs: Successfully started [2010-12-03 15:48:29] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 2: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:48:30] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 3: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:48:31] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 4: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:48:31] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 5: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:48:32] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 6: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1a: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1a: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2a: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2a: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1b: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1b: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2b: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2b: connection to failed (Connection timed out) [2010-12-03 15:59:46] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 7: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:59:47] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 8: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:59:54] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 9: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 10: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 11: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 12: ERR = -1 (Transport endpoint is not connected) [2010-12-03 15:59:56] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 13: ERR = -1 (Transport endpoint is not connected) However, the port is obviously open... [r...@client~]# telnet 10.8.11.102 6996 Trying 10.2.56.102... Connected to glusterserverb (10.8.11.102). Escape character is '^]'. ^] telnet close Connection closed. The gluster server log doesnt see ANY connection attempts from the client however it DOES see my telnet tcp attempts. I'm using IP addresses in all my configuration files - no names. I do have a Juniper firewall between the two servers that is doing stateful firewalling and i've set it up for the connections to never timeout - and ive never had a problem once it finally connects. And i can create a new connection with telnet but not the client... Anybody seen anything like this before? Ideas? thanks, liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs client waiting on SYN_SENT to connect...
I thought the exact same thing...but like i said i can telnet to the host/port without any issue. And there is no other issues on the network that would indicate any not working correctly. And all the other clients on the same network/switch are working fine. Its only when a client crashes... liam On Fri, Dec 3, 2010 at 4:34 PM, m...@mozone.net wrote: I've run into a weird problem. I have a few client boxes that occasionally crash due to a non-gluster related problem. But once the box comes back up i cannot get the Gluster client to reconnect to the bricks. This almost seems like a networking/firewall issue... Do you have any trunks setup between the switch that the client and/or server are on and the router? Perhaps one of those trunk legs is down causing random packets to get blackholed? Mohan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs client waiting on SYN_SENT to connect...
Telnet never fails. Gluster client consistently fails however. Server is using bonded NICs but as far as i can tell they're configured correctly, both links are up and passing traffic. On Fri, Dec 3, 2010 at 6:15 PM, mki-gluste...@mozone.net wrote: On Fri, Dec 03, 2010 at 06:03:19PM -0800, Liam Slusser wrote: This almost seems like a networking/firewall issue... ?Do you have any trunks setup between the switch that the client and/or server are on and the router? ?Perhaps one of those trunk legs is down causing random packets to get blackholed? I thought the exact same thing...but like i said i can telnet to the host/port without any issue. And there is no other issues on the network that would indicate any not working correctly. And all the other clients on the same network/switch are working fine. Its only when a client crashes... Consistently? If random telnets fail then that would explain your random SYN_SENT state stuck sockets. Is the client or server using bonded nics? Mohan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] upgrade 2.0.9 - 3.1.0 ?
Daniel, I just asked this question earlier this month, see http://gluster.org/pipermail/gluster-users/2010-November/005801.html for that thread. It was also recommended that i wait for 3.1.1 which was released today i believe. thanks, liam On Mon, Nov 29, 2010 at 9:37 AM, Daniel Maher dma+glus...@witbe.net wrote: Hello all, I have a relatively straightforward 4-node gluster setup (2 clients, 2 servers, client-side replication) running version 2.0.9 across the board. We are considering upgrading to 3.1.0 . The documentation indicates that it would be as simple as: - shut gluster down - uninstall the previous packages (we package install everything) - install the new packages - generate the new server and client configs - start everything up Beyond what is indicated above, is there any particular upgrade path or specific concerns we will need to address ? Thank you. -- Daniel Maher dma+gluster AT witbe DOT net ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster with xfs
Our smaller cluster, 60tb, stores media data and acts as our CDN feed system. Its a pretty simple setup, the front end is two Dell 1950 servers running Apache mounting gluster via the fuse client. We use bonded gigabit ethernet on the back side to two supermicro 4u 24 bay servers. Each server has another supermicro 4u 24 bay chassis hanging off the back connected via SAS. Both servers are mirrors of each other. Drives are desktop Seagate 1.5tb drives connected to a 3ware 9690 SAS card. We make one huge 24 drive raid6 volume (~30tb) as a brick and use gluster to glue it all together. Performace is decent - we've pushed nearly 800mbit of web traffic with it. Our Juniper firewall only has gigabit anyway so I don't know how much more I could push if I went to 10g. One weird thing I've noticed is 500mbit of web traffic is about double that on the backend which is why we use bonded ethernet for the backend. Another trick we do is our two frontend webservers only mount one server each - so webserver A only mounts gluster server A. We found that the over head of gluster constantly verifing the files were in sync added a ~20% overhead. All the clients that actually write the data of course mount both servers so the files mirrored correctly. Email me privately if you want more detail. Liam On Nov 15, 2010 6:57 AM, Rudi Ahlers r...@softdux.com wrote: ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster with xfs
We run two somewhat large gluster clusters in production on xfs with great success. I had to go with xfs as ext4 doesn't support large enough file systems. Make sure you mount your xfs partitions with 64bit inode support and use only 64bit OS's. I'm still running 2.0.9 however the performance is pretty good. We use ours to store media for our website and with our smaller two server four brick 60tb cluster I can easily push 800mbit of http traffic with an average object size of 2-3megs. Not bad for a bunch of slow sata disks! Liam On Nov 15, 2010 2:53 AM, David Lloyd david.ll...@v-consultants.co.uk wrote: Hello, We're starting to set up a 4 node gluster system. I'm currently trying to decide on the low-level options, including what filesystem to use. For various reasons I would be more comfortable with XFS over ext4, but I read in the 'Introduction to Gluster' that 'XFS (can be slow)'. I haven't found any other details about this, and wondered if anyone has more information or experience of using gluster with XFS. Or if anything has changed with 3.1. We don't want it to be slow, and I'm happy enough using ext4 if necessary, but just wanted to see what others thought first. Thanks David -- David Lloyd V Consultants www.v-consultants.co.uk ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] upgrading from 2.0.9 to 3.1, any gotchas?
Hey Gluster Users, Been awhile since i've posted here. I'm looking to upgrade our 150tb 10 brick cluster from 2.0.9 to 3.1. Is there any gotcha's that i should be aware of? Anybody run into any problems? Any suggestions or hints would be most helpful. I hoping the new Gluster will be a bit more forgiving on split brain issues and an increase in performance is always welcome. thanks, liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] iscsi with gluster
I have a gfs2 cluster and have found the performance to be outstanding. It's great with small files. It's hard to say how it compares to my gluster cluster since I designed them to do different tasks. But since the storage is all shared block level it does have many advantages. Liam On Apr 22, 2010, at 1:29 AM, milo...@gmail.com milo...@gmail.com wrote: On Thu, Apr 22, 2010 at 8:38 AM, Liam Slusser lslus...@gmail.com wrote: You COULD run gluster on top of an iscsi mounted volume...but why would you want too? If you already have an iscsi SAN why not use gfs2 or something like that? You need full cluster infrastructure for that - Gluster is a much simpler solution. GFS2 is also _very_ slow, although I never run a test to compare it with Gluster, but my feeling is that Gluster much faster. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] How to re-sync
Assuming you used raid1 (distribute), you DO bring up the new machine and start gluster. On one of your gluster mounts you run a ls -alR and it will resync the new node. The gluster clients are smart enough to get the files from the first node. liam On Sat, Mar 6, 2010 at 11:48 PM, Chad ccolu...@hotmail.com wrote: Ok, so assuming you have N glusterfsd servers (say 2 cause it does not really matter). Now one of the servers dies. You repair the machine and bring it back up. I think 2 things: 1. You should not start glusterfsd on boot (you need to sync the HD first) 2. When it is up how do you re-sync it? Do you rsync the underlying mount points? If it is a busy gluster cluster it will be getting new files all the time. So how do you sync and bring it back up safely so that clients don't connect to an incomplete server? ^C ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] gluster rebuild time
All, I've been asked to share some rebuild times on my large gluster cluster. I recently added another more storage (bricks) and did a full ls -alR on the whole system. I estimate we have around 50 million files and directories. Gluster Server Hardware: 2 x Supermicro 4u chassis with 24 1.5tb SATA drives and another 24 1.5tb SATA drives in an external drive array via SAS (total of 96 drives all together), 8 core 2.5ghz xeon, 8gig ram 3ware raid controllers, 24 drives per raid6 array, 4 arrays total, 2 arrays per server Centos 5.3 64bit XFS with inode64 mount option Gluster 2.0.9 Bonded gigabit ethernet Clients: 20 or so Dell 1950 clients Mixture of RedHat ES4 and Centos 5 clients + 20 Windows XP clients via Samba (theses are VMs and do have to run on windows jobs) All clients on gigabit ethernet I must say that our load on our gluster servers is normally very high, load average on the box is anywhere from 7-10 at peak (although decent service times) - so im sure if we had a more idle system the rebuild time would have been quicker. The system is at its highest load while writing a large amount of data while at peak of the day - so i try to schedule jobs around our peak times. Anyhow... I started the job sometime January 16th and it JUST finished...18 days later. real27229m56.894s user13m19.833s sys 56m51.277s Finish date was Wed Feb 3 23:33:12 PST 2010 Now i've known some people have mentioned that Gluster is happier with many bricks instead of larger raid arrays like I use however either way id be stuck doing a ls -aglR which takes forever. So id rather add a huge amount of space at once and keep the system setup similar - and let my 3ware controllers deal with drive failures instead of having to do a ls -aglR each time i loose a drive. Replacing a drive with the 3ware controller 7 to 8 days in a 24 drive raid6 array but thats better then 18 days for Gluster to do a ls -aglR. By comparison our old 14 node Isilon 6000 cluster (6tb per node) did a node rebuilt/resync in about a day or two - theres a big difference in block level and file system level replication! We're still running Gluster 2.0.9 but I am looking to upgrade to 3.0 once a few more releases are out and am hoping that the new checksum based checks will speedup this whole process. Once i have some numbers on 3.0 ill be sure to share. thanks, liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] booster with apache permission denied
[client-protocol.c:5733:client_setvolume_cbk] brick2a: Connected to 192.168.12.35:6996, attached to remote volume 'brick2a'. [2010-01-11 14:16:02] D [libglusterfsclient.c:1713:libgf_vmp_map_ghandle] libglusterfsclient: New Entry: /pub [2010-01-11 14:16:02] D [libglusterfsclient.c:1421:libgf_init_vmpentry] libglusterfsclient: New VMP entry: /pub [2010-01-11 14:16:02] D [libglusterfsclient.c:1724:libgf_vmp_map_ghandle] libglusterfsclient: Empty list [2010-01-11 14:16:02] D [booster.c:1190:booster_init] booster: booster is inited [2010-01-11 14:16:02] D [libglusterfsclient.c:5318:glusterfs_chmod] libglusterfsclient: path /home/httpd/apps/httpd-2.2.14/logs/cgisock.29127 [2010-01-11 14:16:02] D [libglusterfsclient.c:1541:_libgf_vmp_search_entry] libglusterfsclient: VMP Search: path /home/httpd/apps/httpd-2.2.14/logs/cgisock.29127, type: LongestPrefix [2010-01-11 14:16:02] D [libglusterfsclient.c:1631:libgf_vmp_search_entry] libglusterfsclient: VMP Entry not found: path: /home/httpd/apps/httpd-2.2.14/logs/cgisock.29127 [2010-01-11 14:16:02] D [libglusterfsclient.c:5443:glusterfs_chown] libglusterfsclient: path /home/httpd/apps/httpd-2.2.14/logs/cgisock.29127 [2010-01-11 14:16:02] D [libglusterfsclient.c:1541:_libgf_vmp_search_entry] libglusterfsclient: VMP Search: path /home/httpd/apps/httpd-2.2.14/logs/cgisock.29127, type: LongestPrefix [2010-01-11 14:16:02] D [libglusterfsclient.c:1631:libgf_vmp_search_entry] libglusterfsclient: VMP Entry not found: path: /home/httpd/apps/httpd-2.2.14/logs/cgisock.29127 [2010-01-11 14:16:02] D [libglusterfsclient.c:1713:libgf_vmp_map_ghandle] libglusterfsclient: New Entry: /pub [2010-01-11 14:16:02] D [libglusterfsclient.c:1421:libgf_init_vmpentry] libglusterfsclient: New VMP entry: /pub [2010-01-11 14:16:02] D [libglusterfsclient.c:1724:libgf_vmp_map_ghandle] libglusterfsclient: Empty list [2010-01-11 14:16:02] D [booster.c:1190:booster_init] booster: booster is inited [2010-01-11 14:16:02] D [libglusterfsclient.c:1713:libgf_vmp_map_ghandle] libglusterfsclient: New Entry: /pub [2010-01-11 14:16:02] D [libglusterfsclient.c:1421:libgf_init_vmpentry] libglusterfsclient: New VMP entry: /pub [2010-01-11 14:16:02] D [libglusterfsclient.c:1724:libgf_vmp_map_ghandle] libglusterfsclient: Empty list [2010-01-11 14:16:02] D [booster.c:1190:booster_init] booster: booster is inited [2010-01-11 14:16:12] D [libglusterfsclient.c:4866:glusterfs_stat] libglusterfsclient: path /pub/data/tnsc/test/test.mp3 [2010-01-11 14:16:12] D [libglusterfsclient.c:1541:_libgf_vmp_search_entry] libglusterfsclient: VMP Search: path /pub/data/tnsc/test/test.mp3, type: LongestPrefix [2010-01-11 14:16:12] D [libglusterfsclient.c:1628:libgf_vmp_search_entry] libglusterfsclient: VMP Entry found: path :/pub/data/tnsc/test/test.mp3 vmp: /pub/ [2010-01-11 14:16:12] D [libglusterfsclient.c:4788:__glusterfs_stat] libglusterfsclient: path /data/tnsc/test/test.mp3, op: 2 [2010-01-11 14:16:12] D [libglusterfsclient.c:869:libgf_resolve_path_light] libglusterfsclient: Path: /data/tnsc/test/test.mp3, Resolved Path: /data/tnsc/test/test.mp3 [2010-01-11 14:16:12] D [libglusterfsclient-dentry.c:268:__do_path_resolve] libglusterfsclient-dentry: resolved path(/data/tnsc/test/test.mp3) till 1(/). sending lookup for remaining path [2010-01-11 14:16:12] D [libglusterfsclient.c:4725:libgf_client_stat] libglusterfsclient: path /data/tnsc/test/test.mp3, status 0, errno 0 [2010-01-11 14:16:12] D [libglusterfsclient.c:3001:glusterfs_open] libglusterfsclient: path /pub/data/tnsc/test/test.mp3 [2010-01-11 14:16:12] D [libglusterfsclient.c:1541:_libgf_vmp_search_entry] libglusterfsclient: VMP Search: path /pub/data/tnsc/test/test.mp3, type: LongestPrefix [2010-01-11 14:16:12] D [libglusterfsclient.c:1628:libgf_vmp_search_entry] libglusterfsclient: VMP Entry found: path :/pub/data/tnsc/test/test.mp3 vmp: /pub/ [2010-01-11 14:16:12] D [libglusterfsclient.c:869:libgf_resolve_path_light] libglusterfsclient: Path: /data/tnsc/test/test.mp3, Resolved Path: /data/tnsc/test/test.mp3 [2010-01-11 14:16:12] D [libglusterfsclient-dentry.c:389:libgf_client_path_lookup] libglusterfsclient: resolved path(/data/tnsc/test/test.mp3) to 1118653312/1118655564 [2010-01-11 14:16:12] D [libglusterfsclient.c:2752:libgf_client_open] libglusterfsclient: open: path /data/tnsc/test/test.mp3, status: 0, errno 117 On Mon, Jan 11, 2010 at 1:23 PM, Raghavendra G raghavendra...@gmail.com wrote: Hi Liam, Can you send glusterfs server logs? regards, On Sat, Jan 9, 2010 at 1:46 AM, Liam Slusser lslus...@gmail.com wrote: I believe i posted this here before but never got any replies. I'm in the middle of upgrading to Gluster 2.0.9 and would like to move away from having to use fuse to serve up files out of apache so im working again on getting boosting working correctly. Everything appears to load and work fine but i always get permission denied, 403, in my apache logs. Works fine under fuse. I'm running Apache under the user nobody which does have read access
Re: [Gluster-users] booster with apache permission denied
I was able to install lighttpd 1.4.25 and it appears to work just fine with glusterfs-booster.so. So I think its an issue with Apache 2.2.14 (the newest version available). I suppose i can try an older version of Apache and see if i have better luck (say 2.0)... liam On Mon, Jan 11, 2010 at 2:20 PM, Liam Slusser lslus...@gmail.com wrote: Logs are below. I also noticed this while trying to debug this issue...Notice the md5sum do not match up below? On the fuse mounted system: [r...@server test]# ls -al test.mp3 -rw-r--r-- 1 user group 3692251 Aug 27 2007 test.mp3 [r...@server test]# md5sum test.mp3 d480d794882c814ae1a2426b79cf8b3e test.mp3 Using glusterfs-boost.so: [r...@server tmp]# LD_PRELOAD=/home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/glusterfs-booster.so ls -al /pub/data/tnsc/test/test.mp3 ls: /pub/data/tnsc/test/test.mp3: Invalid argument -rw-r--r-- 1 tcode tcode 3692251 Aug 27 2007 /pub/data/tnsc/test/test.mp3 [r...@server tmp]# LD_PRELOAD=/home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/glusterfs-booster.so cp /pub/data/tnsc/test/test.mp3 /tmp/test.mp3 [r...@server tmp]# md5sum /tmp/test.mp3 9bff3bb90b6897fc19b6b4658b83f3f8 /tmp/test.mp3 [r...@server tmp]# ls -al /tmp/test.mp3 -rw-r--r-- 1 root root 3690496 Jan 11 14:10 /tmp/test.mp3 Here are the gluster logs from a clean apache start and one request to test.mp3 with wget: [2010-01-11 14:16:02] D [xlator.c:634:xlator_set_type] xlator: dlsym(notify) on /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/xlator/performance/io-threads.so: undefined symbol: notify -- neglecting [2010-01-11 14:16:02] D [xlator.c:634:xlator_set_type] xlator: dlsym(notify) on /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so: undefined symbol: notify -- neglecting [2010-01-11 14:16:02] D [xlator.c:634:xlator_set_type] xlator: dlsym(notify) on /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/xlator/performance/io-cache.so: undefined symbol: notify -- neglecting [2010-01-11 14:16:02] D [client-protocol.c:6130:init] brick1a: defaulting frame-timeout to 30mins [2010-01-11 14:16:02] D [client-protocol.c:6141:init] brick1a: defaulting ping-timeout to 10 [2010-01-11 14:16:02] D [transport.c:141:transport_load] transport: attempt to load file /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/transport/socket.so [2010-01-11 14:16:02] D [transport.c:141:transport_load] transport: attempt to load file /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/transport/socket.so [2010-01-11 14:16:02] D [client-protocol.c:6130:init] brick2a: defaulting frame-timeout to 30mins [2010-01-11 14:16:02] D [client-protocol.c:6141:init] brick2a: defaulting ping-timeout to 10 [2010-01-11 14:16:02] D [transport.c:141:transport_load] transport: attempt to load file /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/transport/socket.so [2010-01-11 14:16:02] D [transport.c:141:transport_load] transport: attempt to load file /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/transport/socket.so [2010-01-11 14:16:02] D [io-threads.c:2280:init] iothreads: io-threads: Autoscaling: off, min_threads: 32, max_threads: 32 [2010-01-11 14:16:02] D [read-ahead.c:824:init] readahead: Using conf-page_count = 16 [2010-01-11 14:16:02] D [client-protocol.c:6472:notify] brick1a: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-01-11 14:16:02] D [client-protocol.c:6472:notify] brick1a: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-01-11 14:16:02] D [client-protocol.c:6472:notify] brick2a: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-01-11 14:16:02] D [client-protocol.c:6472:notify] brick2a: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-01-11 14:16:02] D [client-protocol.c:6472:notify] brick1a: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-01-11 14:16:02] D [client-protocol.c:6472:notify] brick1a: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-01-11 14:16:02] D [client-protocol.c:6472:notify] brick2a: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-01-11 14:16:02] D [client-protocol.c:6472:notify] brick2a: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-01-11 14:16:02] D [client-protocol.c:6486:notify] brick2a: got GF_EVENT_CHILD_UP [2010-01-11 14:16:02] D [client-protocol.c:6486:notify] brick1a: got GF_EVENT_CHILD_UP [2010-01-11 14:16:02] D [client-protocol.c:6486:notify] brick1a: got GF_EVENT_CHILD_UP [2010-01-11 14:16:02] D [client-protocol.c:6486:notify] brick2a: got GF_EVENT_CHILD_UP [2010-01-11 14:16:02] N [client-protocol.c:5733:client_setvolume_cbk] brick1a: Connected to 192.168.12.30:6996, attached to remote volume 'brick1a'. [2010-01-11 14:16:02] N [afr.c:2194:notify] replicate: Subvolume 'brick1a' came back up; going online. [2010-01-11 14:16:02] N [client-protocol.c:5733:client_setvolume_cbk] brick1a: Connected
Re: [Gluster-users] booster with apache permission denied
Oh, sorry, here are the glusterfsd server logs - this is all it logged from the Apache startup and wget. server1: [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1014 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1015 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1010 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1011 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1006 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1007 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1002 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1005 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1001 [2010-01-11 22:09:39] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1004 [2010-01-11 22:09:49] E [posix.c:270:posix_lookup] server1: lstat on /data/tnsc/test/test.mp3/.htaccess failed: Not a directory server2: [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1017 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:1016 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:993 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:992 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:987 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:984 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:983 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:981 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:982 [2010-01-11 22:14:12] N [server-protocol.c:7056:mop_setvolume] server: accepted client from 192.168.12.72:980 [2010-01-11 22:14:14] E [posix.c:270:posix_lookup] server2: lstat on /data/tnsc/test/test.mp3/.htaccess failed: Not a directory thanks, liam On Mon, Jan 11, 2010 at 9:42 PM, Raghavendra G raghaven...@gluster.com wrote: Hi, Can you send the glusterfs server logs? The logs you've sent are of booster (which is glusterfs client). Looking at the configuration, there is a protocol/client in configuration and hence you need a glusterfs server running. We'll work on issue of md5sums being different. regards, On Tue, Jan 12, 2010 at 2:20 AM, Liam Slusser lslus...@gmail.com wrote: Logs are below. I also noticed this while trying to debug this issue...Notice the md5sum do not match up below? On the fuse mounted system: [r...@server test]# ls -al test.mp3 -rw-r--r-- 1 user group 3692251 Aug 27 2007 test.mp3 [r...@server test]# md5sum test.mp3 d480d794882c814ae1a2426b79cf8b3e test.mp3 Using glusterfs-boost.so: [r...@server tmp]# LD_PRELOAD=/home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/glusterfs-booster.so ls -al /pub/data/tnsc/test/test.mp3 ls: /pub/data/tnsc/test/test.mp3: Invalid argument -rw-r--r-- 1 tcode tcode 3692251 Aug 27 2007 /pub/data/tnsc/test/test.mp3 [r...@server tmp]# LD_PRELOAD=/home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/glusterfs-booster.so cp /pub/data/tnsc/test/test.mp3 /tmp/test.mp3 [r...@server tmp]# md5sum /tmp/test.mp3 9bff3bb90b6897fc19b6b4658b83f3f8 /tmp/test.mp3 [r...@server tmp]# ls -al /tmp/test.mp3 -rw-r--r-- 1 root root 3690496 Jan 11 14:10 /tmp/test.mp3 Here are the gluster logs from a clean apache start and one request to test.mp3 with wget: [2010-01-11 14:16:02] D [xlator.c:634:xlator_set_type] xlator: dlsym(notify) on /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/xlator/performance/io-threads.so: undefined symbol: notify -- neglecting [2010-01-11 14:16:02] D [xlator.c:634:xlator_set_type] xlator: dlsym(notify) on /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so: undefined symbol: notify -- neglecting [2010-01-11 14:16:02] D [xlator.c:634:xlator_set_type] xlator: dlsym(notify) on /home/gluster/apps/glusterfs-2.0.9/lib/glusterfs/2.0.9/xlator/performance/io-cache.so: undefined symbol: notify -- neglecting [2010-01-11 14:16:02] D [client-protocol.c:6130:init] brick1a: defaulting frame-timeout to 30mins [2010-01-11 14:16:02] D [client-protocol.c:6141:init] brick1a: defaulting ping-timeout to 10 [2010-01-11 14:16:02] D [transport.c:141
Re: [Gluster-users] Bonded Gigabit
Shoot me an email if you would like to see how I configure my Cisco switches. Let us know how the testing works out! Liam On Jan 6, 2010, at 3:36 AM, Adrian Revill adrian.rev...@shazamteam.com wrote: Hi Liam, Yes that is a good point, i will have to check for that, as I will be moving from 3com to Cisco 5500g So far I only have 2 elderly test servers, using netperf i have measured 1600Bbits/s which seems to be CPU limited. I will look at the double brick idea as it sounds a good work arround. Liam Slusser wrote: I use balance mode0 on my gluster servers - but it doesnt exactly work as you would expect it too. We run Cisco 4948g switches (48 port gigabit) and our gluster servers have two gigabit links bounded together using mode0. Balance mode0 does a great job of balancing outbound traffic however the Cisco's always routes each single INBOUND tcp connection traffic down a single trunk. So the only way to really gain an advantage is to use multiple tcp connections between the many hosts - or in the case of gluster using multiple bricks per server striped together. liam On Tue, Jan 5, 2010 at 7:20 AM, Adrian Revill adrian.rev...@shazamteam.com wrote: Hi I am looking at which is the best bonding mode for giagbit links for the servers. I have a choice of using the 802.3ad (mode4) or bonding- rr (mode0) I would prefer to use mode4 but this will only give a single TCP connection 1Gbit of bandwidth, where mode0 will give multi Gbit of band width to a single TCP connection. My question is, If i have 4 mirrored servers. when a AFR replicates data between servers, does it run multiple TCP connections concurrently to copy the data to all 4 servers at once, or does it do each server in turn. __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Bonded Gigabit
I use balance mode0 on my gluster servers - but it doesnt exactly work as you would expect it too. We run Cisco 4948g switches (48 port gigabit) and our gluster servers have two gigabit links bounded together using mode0. Balance mode0 does a great job of balancing outbound traffic however the Cisco's always routes each single INBOUND tcp connection traffic down a single trunk. So the only way to really gain an advantage is to use multiple tcp connections between the many hosts - or in the case of gluster using multiple bricks per server striped together. liam On Tue, Jan 5, 2010 at 7:20 AM, Adrian Revill adrian.rev...@shazamteam.com wrote: Hi I am looking at which is the best bonding mode for giagbit links for the servers. I have a choice of using the 802.3ad (mode4) or bonding-rr (mode0) I would prefer to use mode4 but this will only give a single TCP connection 1Gbit of bandwidth, where mode0 will give multi Gbit of band width to a single TCP connection. My question is, If i have 4 mirrored servers. when a AFR replicates data between servers, does it run multiple TCP connections concurrently to copy the data to all 4 servers at once, or does it do each server in turn. __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22
Larry All, I would much rather rebuild a bad drive with a raid controller then have to wait for Gluster to do it. With a large number of files doing a ls -aglR can take weeks. Also you don't NEED enterprise drives with a raid controller, i use desktop 1.5tb Seagate drives which happy as a clam on a 3ware SAS card under a SAS expander. liam On Thu, Dec 17, 2009 at 8:17 AM, Larry Bates larry.ba...@vitalesafe.com wrote: Phi.l, I think the real question you need to ask has to do with why we are using GlusterFS at all and what happens when something fails. Normally GlusterFS is used to provide scalability, redundancy/recovery, and performance. For many applications performance will be the least of the worries so we concentrate on scalability and redundancy/recovery. Scalability can be achieved no matter which way you configure your servers. Using distribute translator (DHT) you can unify all the servers into a single virtual storage space. The problem comes when you look at what happens when you have a machine/drive failures and need the redundancy/recovery capabilities of GlusterFS. By putting 36Tb of storage on a single server and exposing it as a single volume (using either hardware or software RAID), you will have to replicate that to a replacement server after a failure. Replicating 36Tb will take a lot of time and CPU cycles. If you keep things simple (JBOD) and use AFR to replicate drives between servers and use DHT to unify everything together, now you only have to move 1.5Tb/2Tb when a drive fails. You will also note that you get to use 100% of your disk storage this way instead of wasting 1 drive per array with RAID5 or two drives with RAID6. Normally with RAID5/6 it is also imperative that you have a hot spare per array, which means you waste an additional driver per array. To make RAID5/6 work with no single point of failure you have to do something like RAID50/60 across two controllers which gets expensive and much more difficult to manage and to grow. Implementing GlusterFS using more modest hardware makes all those issues go away. Just use GlusterFS to provide the RAID-like capabilities (via AFR and DHT). Personally I doubt that I would set up my storage the way you describe. I probably would (and have) set it up with more smaller servers. Something like three times as many 2U servers with 8x2Tb drives each (or even 6 times as many 1U servers with 4x2Tb drives each) and forget the expensive RAID SATA controllers, they aren't necessary and are just a single point of failure that you can eliminate. In addition you will enjoy significant performance improvements because you have: 1) Many parallel paths to storage (36x1U or 18x2U vs 6x5U servers). Gigabit Ethernet is fast, but still will limit bandwidth to a single machine. 2) Write performance on RAID5/6 is never going to be as fast as JBOD. 3) You should have much more memory caching available (36x8Gb = 256Gb memory or 18x8Gb memory = 128Gb vs maybe 6x16Gb = 96Gb) 4) Management of the storage is done in one place..GlusterFS. No messy RAID controller setups to document/remember. 5) You can expand in the future in a much more granular and controlled fashion. Add 2 machines (1 for replication) and you get 8Tb (using 2Tb drives) of storage. When you want to replace a machine, just set up new one, fail the old one, and let GlusterFS build the new one for you (AFR will do the heavy lifting). CPUs will get faster, hard drives will get faster and bigger in the future, so make it easy to upgrade. A small number of BIG machines makes it a lot harder to do upgrades as new hardware becomes available. 6) Machine failures (motherboard, power supply, etc.) will effect much less of your storage network. Having a spare 1U machine around as a hot spare doesn't cost much (maybe $1200). Having a spare 5U monster around does (probably close to $6000). IMHO 36 x 1U or 18 x 2U servers shouldn't cost any more (and maybe less) than the big boxes you are looking to buy. They are commodity items. If you go the 1U route you don't need anything but a machine, with memory and 4 hard drives (all server motherboards come with at least 4 SATA ports). By using 2Tb drives, I think you would find that the cost would be actually less. By NOT using hardware RAID you can also NOT use RAID-class hard drives which cost about $100 each more than non-RAID hard drives. Just that change alone could save you 6 x 24 = 144 x $100 = $14,400! JBOD just doesn't need RAID-class hard drives because you don't need the sophisticated firmware that the RAID-class hard drives provide. You still will want quality hard drives, but failures will have such a low impact that it is much less of a problem. By using more smaller machines you also eliminate the need for redundant power supplies (which would be a requirement in your large boxes because it would be a single point of failure on a large
Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22
Yeah im waiting for Gluster to come out with a 3.0.1 release before i upgrade. I'll make sure to do my best to compare 3.0.1 with OneFS's performance/recovery/etc once i upgrade. I still have two Isilon clusters which aren't in production anymore in our lab i can play around with. And i've been waiting for brfs for awhile now, it can't come soon enough! thanks, liam On Tue, Jan 5, 2010 at 7:48 PM, Harshavardhana har...@gluster.com wrote: Hi Liam, GlusterFS does checksum based self-heal since the 3.0 release, i would believe your experiences are from 2.0? which has issues of doing a full file self-heal which will a lot of time. But i would suggest an upgrade with 3.0.1 release which is due Feb 1st week for your cluster. 3.x releases with new self-heal you should get very less rebuild times. If its possible to compare the 3.0.1 rebuild times with the One-FS from Isilon should help us improve it too. Thanks \ I would suggest wait for brtfs. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] adding another brick howto
I just wanted to make sure I have the procedure correctly when adding a new bricks into my gluster raid10 configuration. The current configuration is two servers each exporting 3 large drive arrays hanging off the back of each server. In the gluster client config file i mirror each partition on server 1 with the corresponding partition on server 2. Then i use cluster/distribute to stripe it all together. Think raid10. I'm running low on disk space and wanted to add two more drive arrays (one on each server, same size as the original arrays), which should be easy as far as the gluster configuration file goes. My questions are: a) Once i add the new arrays (bricks if you will), do i need to run a ls -aglR on a gluster client so the new arrays will get the directory tree created? b) What would happen if i don't do step a - would it just start creating new files and the directory tree when a create file operation happened? c) Is gluster smart enough to know that the original arrays are running low on space and to write all new files to the new servers? d) Anything else i should be aware of? I'm running Centos Linux 5.4 64bit with XFS inode64 file systems with Gluster 2.0.6 (although upgrading to 2.0.9 sometime this week depending on how much free time i get) thanks, liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] volume sizes
We have a very similar setup. We have a 6 x 24 bay gluster cluster with 36TB per node. We use 3ware raid cards with raid6 over all the 24 drives making a ~32TB usable per node. We have our gluster cluster setup like raid 10, so 3 nodes stripped together and then mirrored to the other 3 nodes. Performance is very good as so is the reliability which was more important to us then performance. I thought about breaking it into smaller pieces but it gets complicated very quick so i went with the simpler is better setup. We also grow about 1tb a week of data so i have to add 1-2 nodes a year which is a huge pain in the butt since gluster doesnt make it very easy to do. (ie building the directory structure on each new node) Doing a ls -agl on the root of our cluster takes well over a week - we have around 50+ million files in there. The only downside is the rebuild time whenever we loose a drive. The 3ware controller with such a large array takes about a week to rebuild from any one drive failure. Of course, with raid6, we can loose two drives without any data loss. Luckily we've never lost two or more drives within the same week. However if we DID for whatever reason loose the whole array we can always pull the data of the other mirror node. I do very closely watch the SMART output of each drive and proactively replace any drive which starts to show any signs of failing or read/write errors. I have a smaller cluster of 4 x 24 bay 36TB per node. This array pushes well over 500mbit of traffic almost 24/7 with almost zero issues. I've been very happy with how well it performs. I do notice that during an array rebuild after a failed drive the IOwait time on the server is a bit higher but over all it does very well. If you would like more information on my setup or what hardware/software i run please feel free to contact me privately. thanks, liam On Tue, Dec 29, 2009 at 1:54 PM, Anthony Goddard agodd...@mbl.edu wrote: First post! We're looking at setting up 6x 24 bay storage servers (36TB of JBOD storage per node) and running glusterFS over this cluster. We have RAID cards on these boxes and are trying to decide what the best size of each volume should be, for example if we present the OS's (and gluster) with six 36TB volumes, I imagine rebuilding one node would take a long time, and there may be other performance implications of this. On the other hand, if we present gluster / the OS's with 6x 6TB volumes on each node, we might have more trouble in managing a larger number of volumes. My gut tells me a lot of small (if you can call 6TB small) volumes will be lower risk and offer faster rebuilds from a failure, though I don't know what the pros and cons of these two approaches might be. Any advice would be much appreciated! Cheers, Anthony ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] What about maximum number of Folder of GlusterFS?
Thats ~32,000 folders per DIRECTORY - not in total. With that said, i have over 50 million files and directories on my cluster on XFS. Gluster doesn't really care - its the underlined filesystem you need to worry about. liam On Tue, Dec 29, 2009 at 5:39 PM, lesonus leso...@gmail.com wrote: I know that EXT3 has max number of folder is about 32., EXT4 ~ 64.000 folder And I want to know Max folder of GlusterFS? Thanks in advanced! ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] big IO problem with kvm iamges while ionice doesnt work on shared storage
Might want to try upgrading to the newest version - the next few versions are much more stable. Liam On Dec 19, 2009, at 6:00 AM, Ran smtp.tes...@gmail.com wrote: Hi all , We run into a big problem while using gluster(2.0.6) to serve kvm images and other stuff like mail storage in distrubted mode with only 1 server(we will add more in the future) . The problem is basicly when say 3 KVM VPS's using intesing IO applications It doesnt happen all the time but when happen it uses all gluster server IO and other servers that need access to the mail storage for example freezes . Normaly on local disk the sulotion is ionice but as i understand it only work on local block devices and not on network mounts . Anyone has any idea to overcome this e.g limit IO for clients or clients applications like kvm images ? this is a real IO problem wich makes the all storage crowl . Many thanks , ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] booster and apache 2.2.14 permission errors
I'm having a strange booster+apache issue. I am unable to get apache to download any of the files through booster. I get a 403 (Forbidden) on any file. If I enabled directory indexes i can get directory listings but still a 403 on any file. I can view/list just files just fine by using LD_PRELOAD=...glusterfs-booster.so with ls or cat /pub/data/path/to/myfile. So its just apache that im having issues with. If i mount the file system (to /pub) with fuse and start httpd without booster it works fine so im pretty sure i have all the permissions correctly. Ideas? thanks, liam # wget -S http://x.x.x.x/data/test/test.mp3 --2009-11-05 18:35:07-- http://x.x.x.x/data/test/test.mp3 Connecting to x.x.x.x:80... connected. HTTP request sent, awaiting response... HTTP/1.1 403 Forbidden Date: Fri, 06 Nov 2009 02:35:07 GMT Server: Apache/2.2.14 (Unix) Content-Length: 228 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 2009-11-05 18:35:07 ERROR 403: Forbidden. # wget -S http://x.x.x.x/data/test/ --2009-11-05 18:36:13-- http://x.x.x.x/data/test/ Connecting to x.x.x.x:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Fri, 06 Nov 2009 02:36:13 GMT Server: Apache/2.2.14 (Unix) Content-Length: 919 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html;charset=ISO-8859-1 Length: 919 [text/html] Saving to: `index.html' 100%[=] 919 --.-K/s in 0s 2009-11-05 18:36:13 (87.6 MB/s) - `index.html' saved [919/919] (inside the index.html will be an apache pretty output of the files in /data/test) my booster-pub.log output: [2009-11-05 18:41:50] D [libglusterfsclient.c:2908:glusterfs_open] libglusterfsclient: path /pub/data/test/test.mp3 [2009-11-05 18:41:50] D [libglusterfsclient.c:1517:_libgf_vmp_search_entry] libglusterfsclient: VMP Search: path /pub/data/test/test.mp3, type: LongestPrefix [2009-11-05 18:41:50] D [libglusterfsclient.c:1604:libgf_vmp_search_entry] libglusterfsclient: VMP Entry found: path :/pub/data/test/test.mp3 vmp: /pub/ [2009-11-05 18:41:50] D [libglusterfsclient.c:851:libgf_resolve_path_light] libglusterfsclient: Path: /data/test/test.mp3, Resolved Path: /data/test/test.mp3 [2009-11-05 18:41:50] D [libglusterfsclient-dentry.c:389:libgf_client_path_lookup] libglusterfsclient: resolved path(/data/test/test.mp3) to 1118653312/1118655564 [2009-11-05 18:41:50] D [libglusterfsclient.c:2659:libgf_client_open] libglusterfsclient: open: path /data/test/test.mp3, status: 0, errno 117 My httpd.conf is very simple: Alias /data/ /pub/data Directory /pub/data/ Options All AllowOverride All Order allow,deny Allow from all /Directory booster.fstab: /home/gluster/apps/glusterfs-2.0.7/etc/glusterfs/glusterfs.vol-pub.booster /pub/ glusterfs subvolume=cache,logfile=/home/gluster/apps/glusterfs-2.0.7/var/log/glusterfs/booster-pub.log,loglevel=DEBUG,attr_timeout=0 glusterfs.vol-pub.booster: /home/gluster/apps/glusterfs-2.volume brick1a type protocol/client option transport-type tcp option remote-host x.x.x.x option remote-subvolume brick1a end-volume volume brick2a type protocol/client option transport-type tcp option remote-host x.x.x.x option remote-subvolume brick2a end-volume volume replicate type cluster/replicate subvolumes brick1a brick2a end-volume volume iothreads type performance/io-threads option thread-count 32 subvolumes replicate end-volume volume readahead type performance/read-ahead option page-count 16 # cache per file = (page-count x page-size) option force-atime-update off subvolumes iothreads end-volume volume cache type performance/io-cache option cache-size 512MB subvolumes readahead end-volume ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Problmes with XFS and Gluster 2.0.6
Have you checked that you have free inodes on your XFS partitions? xfs_db -r -c sb -c p /dev/sda1 | egrep 'ifree|icount' If you're running low - you'll have to mount your partition with the inode64 option. Note that it requires a 64bit box and all your gluster clients will also need to be 64bit for everything to work. There is a thread here a few months back about inode64 and gluster - dig through the archives lots of good info in it - but the short is it works fine as long as everything is 64bit. liam On Fri, Sep 18, 2009 at 5:44 PM, Nathan Stratton nat...@robotics.net wrote: Anyone else running into problems with XFS and Gluster? Things run fine for a while, but then I get things like: ls: reading directory .: Input/output error I initially did not think it was a Gluster issue because I saw the errors on the raw XFS exported partition. However when I checked I found that the problem happened on all 4 nodes. I just don't know how 4 XFS partions on 4 different boxes could all become corrupted at one time. Whatever happens it is bad wrong because xfs can't even fix it: http://share.robotics.net/xfs-crash.txt Nathan Stratton CTO, BlinkMind, Inc. nathan at robotics.net nathan at blinkmind.com http://www.robotics.net http://www.blinkmind.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Problems with folders not being created on newly added disks
You should really upgrade to gluster 2.0.6, there has been many bug fixes. ls On Sep 9, 2009, at 4:36 AM, Roland Rabben rol...@jotta.no wrote: Hi I am using GlusterFS 2.02 on Ubuntu 9.04 64 bit. I have 4 data-nodes and 3 clients. Se my vol files at the end of this email. After adding more disks to my data-nodes for more capacity and reconfiguring GlusterFS to include those drives I am experiencing problems. I am getting No such file or directory if I try to copy a new file into an existing directory. However, if I copy a new file into a new directory everyting works fine. It seems that if I create the folderstructure from the old data- nodes on the new disks, everything works fine. So my questions are? 1. Am I doing somthing wrong in the upgrade process? 2. Do I need to manually create the existing folders on the new hard drives? 3. Self heal does not fix this. Shouldn't it? 4. Is there a tool that will create the folderstructure on the new disks for me? Client vol file example: = # DN-000 volume dn-000-01 type protocol/client option transport-type tcp option remote-host dn-000 option remote-subvolume brick-01 end-volume volume dn-000-02 type protocol/client option transport-type tcp option remote-host dn-000 option remote-subvolume brick-02 end-volume volume dn-000-03 type protocol/client option transport-type tcp option remote-host dn-000 option remote-subvolume brick-03 end-volume volume dn-000-04 type protocol/client option transport-type tcp option remote-host dn-000 option remote-subvolume brick-04 end-volume volume dn-000-ns type protocol/client option transport-type tcp option remote-host dn-000 option remote-subvolume brick-ns end-volume # DN-001 volume dn-001-01 type protocol/client option transport-type tcp option remote-host dn-001 option remote-subvolume brick-01 end-volume volume dn-001-02 type protocol/client option transport-type tcp option remote-host dn-001 option remote-subvolume brick-02 end-volume volume dn-001-03 type protocol/client option transport-type tcp option remote-host dn-001 option remote-subvolume brick-03 end-volume volume dn-001-04 type protocol/client option transport-type tcp option remote-host dn-001 option remote-subvolume brick-04 end-volume volume dn-001-ns type protocol/client option transport-type tcp option remote-host dn-001 option remote-subvolume brick-ns end-volume # DN-002 volume dn-002-01 type protocol/client option transport-type tcp option remote-host dn-002 option remote-subvolume brick-01 end-volume volume dn-002-02 type protocol/client option transport-type tcp option remote-host dn-002 option remote-subvolume brick-02 end-volume volume dn-002-03 type protocol/client option transport-type tcp option remote-host dn-002 option remote-subvolume brick-03 end-volume volume dn-002-04 type protocol/client option transport-type tcp option remote-host dn-002 option remote-subvolume brick-04 end-volume # DN-003 volume dn-003-01 type protocol/client option transport-type tcp option remote-host dn-003 option remote-subvolume brick-01 end-volume volume dn-003-02 type protocol/client option transport-type tcp option remote-host dn-003 option remote-subvolume brick-02 end-volume volume dn-003-03 type protocol/client option transport-type tcp option remote-host dn-003 option remote-subvolume brick-03 end-volume volume dn-003-04 type protocol/client option transport-type tcp option remote-host dn-003 option remote-subvolume brick-04 end-volume # Replicate data between the servers # Use pairs, but swtich the order to distribute read load volume repl-000-001-01 type cluster/replicate subvolumes dn-000-01 dn-001-01 end-volume volume repl-000-001-02 type cluster/replicate subvolumes dn-001-02 dn-000-02 end-volume volume repl-000-001-03 type cluster/replicate subvolumes dn-000-03 dn-001-03 end-volume volume repl-000-001-04 type cluster/replicate subvolumes dn-001-04 dn-000-04 end-volume volume repl-002-003-01 type cluster/replicate subvolumes dn-002-01 dn-003-01 end-volume volume repl-002-003-02 type cluster/replicate subvolumes dn-003-02 dn-002-02 end-volume volume repl-002-003-03 type cluster/replicate subvolumes dn-002-03 dn-003-03 end-volume volume repl-002-003-04 type cluster/replicate subvolumes dn-003-04 dn-002-04 end-volume # Also replicate the namespace volume repl-ns type
Re: [Gluster-users] double traffic usage since upgrade?
Any other thoughts on why i'm seeing double the inbound traffic? We're have a large increase in site traffic the last few weeks and my out bound traffic has increase to almost 400mbit/sec which has translated to 800mbit of backend gluster traffic. I'm basically at the limit of gigabit ethernet unless i do bounding. Ideas on how to fix this? thanks, liam On Mon, Aug 17, 2009 at 3:28 PM, Liam Slusser lslus...@gmail.com wrote: On Mon, Aug 17, 2009 at 7:42 AM, Mark Mielkem...@mark.mielke.cc wrote: On 08/17/2009 08:06 AM, Shehjar Tikoo wrote: For a start, we've aimed at getting apache and unfs3 to work with booster. The functional support for both in booster is complete in 2.0.6 release. For a list of system calls supported by booster, please see: http://www.gluster.org/docs/index.php/BoosterConfiguration There can be applications which need un-boosted syscalls also to be usable over GlusterFS. For such a scenario we have two ways booster can be used. Both approaches are described at the page linked above but in short, you're right in thinking that when the un-supported syscalls are also needed to go over FUSE, we are, as you said, leaking or redirecting calls over the FUSE mount point. Hi Shehjar: That's fine, I think, as long as it is recognized that trapping system call open() as booster is implemented today probably does not trap fopen() on Linux. If apache and unfs3 always call open() directly, and you are trapping this, then your purpose is being served. I was kind of hoping you had found a way around --disable-hidden-plt, so I could steal the idea from you. Too bad. :-) Cheers, mark -- Mark Mielkem...@mielke.cc Just a FYI - I am not using booster at all on our feed boxes, this is just straight fuse and the glusterfs process [with the box we're seeing the traffic doubling on]. liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd initscript default sequence
The init script is also wrong if you used a non-default install path. It always points to /usr/sbin/glusterfsd and not your --prefix specified path. liam On Sun, Sep 6, 2009 at 9:18 PM, Jeff Evans je...@tricab.com wrote: In the case that the node is both a server and a client, as I wish to use it (3-node cluster, where each is both a client and server in cluster/replicate configuration), I found that using /etc/fstab to mount and the default glusterfsd initscript of S90 causes the mount to be made before glusterfsd is up. My scenario exactly. In a test I just ran where I restarted all three nodes at the same time, for the server that came up first, it seems the client decided nothing was up. Yes, and this causes anything that depends upon the glusterfs mount to wait at startup for the FS to become available. I too think S90 is off, although I'm not sure where it should go, or how to make it start glusterfsd before it gets to /etc/fstab mounting? I think the only way to ensure glusterfsd comes up before fstab mounting (mount -a) is by using the noauto option and then mounting it later in rc.local or whenever you are ready. In my case, I want glusterfs available ASAP and using S50 was adequate as this is before anything like smb/nfs/httpd starts looking for the mount. Thanks, Jeff. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] new user help
answer inline... On Sun, Aug 23, 2009 at 10:50 AM, Mag Gammagaw...@gmail.com wrote: I am trying to setup gluster 2.0 and I am a new user. Welcome. Basically, I been trying to follow the tutorials but I am having no luck. My setup is 2 servers and 10 clients for now. Couple of questions: On the client, do I have to have FUSE module? If you want to mount the filesystem, you need to use FUSE. You can also nfs mount it if you setup a unfs server on another client and share the gluster mount. Do I have to run all this as root? yes to mount the filesystem How can I check what clients are mounted on the server side? Not really an easy way to do it, however i use... (in linux) netstat -pan | grep EST | grep gluster_listen_port How can I check weather I can talk to the server via gluster tools (not ping :-)? Try telneting to the gluster_listen_port - if it answers its up. liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Interesting experiment
On Tue, Aug 18, 2009 at 3:05 AM, Hiren Joshij...@moonfruit.com wrote: Hi, Ok, the basic setup is 6 bricks per server, 2 servers. Mirror the six bricks and DHT them. I'm running three tests, dd 1G of zeros to the gluster mount, dd 1000 100k files and dd 1000 1M files. With 3M write-behind I get: 0m35.460s for 1G file 0m52.427s for 100k files 1m37.209s for 1M files Then I added a 400M external journal to all the bricks, the twist being the journals were made on a ram drive Running the same tests: 0m33.614s for 1G file 0m52.851s for 100k files 1m31.693s for 1M files So why is it that adding an external journal (in the ram!) seems to make no difference at all? I would imagine that most of your bottle neck is with the network and not the disks. Modern raid disk storage systems are much quicker than gigabit ethernet. liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] double traffic usage since upgrade?
On Mon, Aug 17, 2009 at 7:42 AM, Mark Mielkem...@mark.mielke.cc wrote: On 08/17/2009 08:06 AM, Shehjar Tikoo wrote: For a start, we've aimed at getting apache and unfs3 to work with booster. The functional support for both in booster is complete in 2.0.6 release. For a list of system calls supported by booster, please see: http://www.gluster.org/docs/index.php/BoosterConfiguration There can be applications which need un-boosted syscalls also to be usable over GlusterFS. For such a scenario we have two ways booster can be used. Both approaches are described at the page linked above but in short, you're right in thinking that when the un-supported syscalls are also needed to go over FUSE, we are, as you said, leaking or redirecting calls over the FUSE mount point. Hi Shehjar: That's fine, I think, as long as it is recognized that trapping system call open() as booster is implemented today probably does not trap fopen() on Linux. If apache and unfs3 always call open() directly, and you are trapping this, then your purpose is being served. I was kind of hoping you had found a way around --disable-hidden-plt, so I could steal the idea from you. Too bad. :-) Cheers, mark -- Mark Mielkem...@mielke.cc Just a FYI - I am not using booster at all on our feed boxes, this is just straight fuse and the glusterfs process [with the box we're seeing the traffic doubling on]. liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] double traffic usage since upgrade?
I've been running 2.0.3 with two backend bricks and a frontend client of mod_gluster/apache 2.2.11+worker for a few weeks now without much issue. Last night i upgraded to 2.0.6 only to find out that mod_gluster has been removed and is recommending to use the booster library - which is fine but i didnt have time to test it last night so i just mounted the whole filesystem with a fuse mount and figured id test the booster config later and then swap. I did try running the 2.0.3 mod_gluster module with the 2.0.6 bricks but apache kept segfaulting (every 10 seconds) and then would spawn another process which would reconnect and keep going. I figured it was dropping a client request every few seconds which is why i went with the fuse mount until i could test the booster library. Well, before with mod_gluster, we would be pushing around 200mbit of web traffic and it would evenly distribute that 200mbit between our two bricks - so server1 would be pushing 100mbit and server2 would be pushing another 100mbit. Basically both inbound from the backend bricks and outbound from apache was basically identical. Except of course if one of the backend glusterd processes died for whatever reason the other remaining brick would take the whole load and its traffic would double as you would expect. Perfect, all was happy. Now using gluster 2.0.6 and fuse both server bricks are pushing the full 200mbit of traffic - so i basically have 400mbit of incoming traffic from the gluster bricks but the same 200mbit of web traffic. I can deal, but i only have a shared gigabit link between my client server and backend bricks and im already eating up basically 50% of that pipe. It is also putting a much larger load on both bricks since i have basically doubled the disk IO time and traffic. Is this a feature? Bug? thanks, liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Performance
XFS has been around since 1994 - originally written by SGI and is one of the oldest journaling filesystems. It has been in the linux source tree since 2.4 and is very stable. It supports a max volume size of 16 exabytes where ext3/4 runs out at 8tb i believe. I've never had one of my xfs filesystems need recovering and use it on a bunch of larger arrays where its to large to use ext3. Just make sure you're using 64bit linux and mount the filesystem with the inode64 option so you don't run out of inodes. liam On Thu, Aug 13, 2009 at 1:31 AM, Hiren Joshi j...@moonfruit.com wrote: What are the advantages of XFS over ext3 (which I'm currently using)? My fear with XFS when selecting a filesystem was that it's not as active or as well supported as ext3 and if things go wrong, how easy would it be to recover? I have 6 x 1TB disks in a hardware raid 6 with battery backup and UPS, it's now just the performance I need to get sorted... -- *From:* Liam Slusser [mailto:lslus...@gmail.com] *Sent:* 12 August 2009 20:35 *To:* Mark Mielke *Cc:* Hiren Joshi; gluster-users@gluster.org *Subject:* Re: [Gluster-users] Performance I had a similar situation. My larger gluster cluster has two nodes but each node has 72 1.5tb hard drives. I ended up creating three 30TB 24 drive raid6 arrays, formated with xfs and 64bit-inodes, and then exporting three bricks with gluster. I would recommend using a hardware raid controller with battery backup power, UPS power, and a journaled filesystem and i think you'll be fine. I'm exporting the three bricks on each of my two nodes, the clients are using replication to replicate each of the three bricks on each server and then using distribute to tie it all together. liam On Wed, Aug 12, 2009 at 10:51 AM, Mark Mielke m...@mark.mielke.cc wrote: On 08/12/2009 01:24 PM, Hiren Joshi wrote: 36 partitions on each server - the word partition is ambiguous. Are they 36 separate drives? Or multiple partitions on the same drive. If multiple partitions on the same drive, this would be a bad idea, as it would require the disk head to move back and forth between the partitions, significantly increasing the latency, and therefore significantly reducing the performance. If each partition is on its own drive, you still won't see benefit unless you have many clients concurrently changing many different files. In your above case, it's touching a single file in sequence, and having a cluster is costing you rather than benefitting you. We went with 36 partitions (on a single raid 6 drive) incase we got file system corruption, it would take less time to fsck a 100G partition than a 3.6TB one. Would a 3.6TB single disk be better? Putting 3.6 TB on a single disk sounds like a lot of eggs in one basket. :-) If you are worried about fsck, I would definitely do as the other poster suggested and use a journalled file system. This nearly eliminates the fsck time for most situations. This would be whether using 100G partitions or using 3.6T partitions. In fact, there is very few reasons not to use a journalled file system these days. As for how to deal with data on this partition - the file system is going to have a better chance of placing files close to each other, than setting up 36 partitions and having Gluster scatter the files across all of them based on a hash. Personally, I would choose 4 x 1 Tbyte drives over 1 x 3.6 Tbyte drive, as this nearly quadruples my bandwidth and for highly concurrent loads, nearly divides by four the average latency to access files. But, if you already have the 3.6 Tbyte drive, I think the only performance-friendly use would be to partition it based upon access requirements, rather than a hash (random). That is, files that are accessed frequently should be clustered together at the front of a disk, files accessed less frequently could be in the middle, and files accessed infrequently could be at the end. This would be a three partition disk. Gluster does not have a file system that does this automatically (that I can tell), so it would probably require a software solution on your end. For example, I believe dovecot (IMAP server) allows an alternative storage location to be defined, so that infrequently read files can be moved to another disk, and it knows to check the primary storage first, and fall back to the alternative storage after. It you can't break up your storage by access patterns, then I think a 3.6 Tbyte file system might still be the next best option - it's still better than 36 partitions. But, make sure you have a good file system on it, that scales well to this size. Cheers, mark -- Mark Mielkem...@mielke.cc ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing
Re: [Gluster-users] Performance
On Wed, Aug 12, 2009 at 10:24 AM, Hiren Joshi j...@moonfruit.com wrote: We went with 36 partitions (on a single raid 6 drive) incase we got file system corruption, it would take less time to fsck a 100G partition than a 3.6TB one. Would a 3.6TB single disk be better? Have you looked at using XFS for a filesystem? Its a journaling filesystem and should almost require no rebuild/check in a crash. liam ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fuse problem
Looks like fuse isn't loaded. Have you installed fuse? The debug log below has the hint, try modprobe fuse as root. liam On Tue, Aug 11, 2009 at 7:27 AM, Hiren Joshi j...@moonfruit.com wrote: Hello all, I'm running a 64bit Centos5 setup and am trying to mount a gluster filesystem (which is exported out of the same box). glusterfs --debug --volfile=/root/gluster/webspace2.vol /home/webspace_glust/ Gives me: snip [2009-08-11 16:26:37] D [client-protocol.c:5963:init] glust1b_36: defaulting ping-timeout to 10 [2009-08-11 16:26:37] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.0.4/transport/socket.so [2009-08-11 16:26:37] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib64/glusterfs/2.0.4/transport/socket.so fuse: device not found, try 'modprobe fuse' first [2009-08-11 16:26:37] D [fuse-bridge.c:2740:init] glusterfs-fuse: fuse_mount() failed with error No such device on mount point /home/webspace_glust/ [2009-08-11 16:26:37] E [xlator.c:736:xlator_init_rec] xlator: Initialization of volume 'fuse' failed, review your volfile again [2009-08-11 16:26:37] E [glusterfsd.c:513:_xlator_graph_init] glusterfs: initializing translator failed [2009-08-11 16:26:37] E [glusterfsd.c:1217:main] glusterfs: translator initialization failed. exiting Any thoughts? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] how to kill the glusterfsd process
If all else fails, kill -9? ls On Mon, Aug 3, 2009 at 7:03 AM, Wei Dong wdong@gmail.com wrote: Hi All, I'm trying to restart the glusterfsd service on my storage nodes and find that I'm simply unable to kill the process on some of the nodes. Any suggestion? Thanks a lot, - Wei Dong ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 2.0.3 + Apache on CentOS5 performance issue
You might want to wait until 2.0.5 as there is a ton of bug fixes to booster in that release. Either way please let us know how it goes. ls On Jul 30, 2009, at 12:39 AM, Somsak Sriprayoonsakul soms...@gmail.com wrote: Thank you very much for you reply At the time we used 2.0.3, and yes we used stock Apache from CentOS. I will try 2.0.4 very soon to see if it's work. For Booster, it seems not working correctly for me. Booster complains a lots of error with plain 'ls' command (but giving the correct output). Also, with booster, Apache process refuse to start. I will try 2.0.4 to see if it improves. If not, I will attach error log next time. 2009/7/30 Raghavendra G raghaven...@gluster.com Hi Somsak, Sorry for the delayed reply. Below you've mentioned that you've problems with apache and booster. Going forward, Apache over booster will be the preferred approach. Can you tell us what version of glusterfs you are using? And as I can understand you are using apache 2.2, am I correct? regards, - Original Message - From: Liam Slusser lslus...@gmail.com To: Somsak Sriprayoonsakul soms...@gmail.com Cc: gluster-users@gluster.org Sent: Saturday, July 25, 2009 3:46:14 AM GMT +04:00 Abu Dhabi / Muscat Subject: Re: [Gluster-users] Gluster 2.0.3 + Apache on CentOS5 performance issue I haven't tried an apples to apples comparison with Apache +mod_gluster vs Apache+fuse+gluster however i do run both setups. I load tested both setups so to verified it could handle 4x our normal daily load and left it at that. I didn't actually compare the two (although that might be cool to do someday). I really like the idea of Apache+mod_gluster as I don't have to deal with the whole fuse and mounting the filesystem. It always scares me having a public facing webserver with your whole backend fileshare mounted locally. Its very slick for serving content such as media files. We serve audio content to our CDN with a pair of Apache/mod_gluster servers - pushing 200-300mbit on average daily and everything works very well. We run an apache+fuse+gluster setup because we need to run some mod_perl before serving the actual content. However performance is still very good. We do around 50-100 requests (all jpeg images) per second off of a fuse mount and everything works great. We also have a java tomcat+fuse +gluster service which does image manipulation on the fly off of a gluster mount. We have two backend gluster servers using replication which serve all this content. If you would like more information on our setup id be happy to share offline. Just email me privately. thanks, liam On Fri, Jul 24, 2009 at 8:08 AM, Somsak Sriprayoonsakul soms...@gmail.comwrote: Oh thank you, thought noone will reply me :) Have you tried Apache + Fuse over GlusterFS? How is the performance? Also, anyone in this mailing-list have tried Apache with booster? I tried it but Apache refuse to start (just hang and freeze). 2009/7/23 Liam Slusser lslus...@gmail.com We use mod_gluster and Apache 2.2 with good results. We also ran into the same issue as you that we ran out of memory past 150 threads even on a 8gig machine. We got around this by compiling Apache using mpm-worker (threads) instead of prefork - it uses 1/4 as much ram with the same number of connections (150-200) and everything has been running smoothly. I cannot see any performance difference except it using way less memory. liam On Sun, Jul 12, 2009 at 5:11 AM, Somsak Sriprayoonsakul soms...@gmail.com wrote: Hello, We have been evaluating the choice for the new platform for a webboard system. The webboard is PHP scripts that generate/modify HTML page when user posting/add comment to the page, resulting topic is actually stored as a HTML file with all related file (file attach to the topic, etc.. )stored in its own directory for each topic. In general, the web site mostly serve a lot of small static files using Apache while using PHP to do other dynamic contents. This system has been working very well in the past, with the increasing page view rate, it is very likely that we will need some kind of Cluster file system as backend very soon. We have set up a test system using Grinder as stress test tool. The test system is 11 machines of Intel Dual Core x86_64 CentOS5 with stock Apache (prefork, since the goal is to use this with PHP), linked together with Gigabit Ethernet. We try to compare the performance of either using single NFS server in sync mode against using 4 Gluster nodes (distribute of 2 replicated nodes) through Fuse. However, the transaction per second (TPS) result is not good. NFS (single server, sync mode) - 100 thread of client - Peak TPS = 1716.67, Avg. TPS = 1066, mean response time = 61.63 ms - 200 threads - Peak TPS = 2790, Avg. TPS = 1716, mean rt = 87.33 ms - 400 threads - Peak TPS = 3810, Avg
Re: [Gluster-users] Gluster 2.0.3 + Apache on CentOS5 performance issue
Its not released yet, but it is in QA. You can download it here: http://ftp.gluster.com/pub/gluster/glusterfs/qa-releases/glusterfs-2.0.5.tar.gzor grab the newest git which has all the changes in it. liam On Thu, Jul 30, 2009 at 8:45 PM, Somsak Sriprayoonsakul soms...@gmail.comwrote: Could you let me know when will this be (estimately). I can wait until 2.0.5 and test it out again. 2009/7/30 Liam Slusser lslus...@gmail.com You might want to wait until 2.0.5 as there is a ton of bug fixes to booster in that release. Either way please let us know how it goes. ls On Jul 30, 2009, at 12:39 AM, Somsak Sriprayoonsakul soms...@gmail.com wrote: Thank you very much for you reply At the time we used 2.0.3, and yes we used stock Apache from CentOS. I will try 2.0.4 very soon to see if it's work. For Booster, it seems not working correctly for me. Booster complains a lots of error with plain 'ls' command (but giving the correct output). Also, with booster, Apache process refuse to start. I will try 2.0.4 to see if it improves. If not, I will attach error log next time. 2009/7/30 Raghavendra G raghaven...@gluster.com raghaven...@gluster.com Hi Somsak, Sorry for the delayed reply. Below you've mentioned that you've problems with apache and booster. Going forward, Apache over booster will be the preferred approach. Can you tell us what version of glusterfs you are using? And as I can understand you are using apache 2.2, am I correct? regards, - Original Message - From: Liam Slusser lslus...@gmail.comlslus...@gmail.com To: Somsak Sriprayoonsakul soms...@gmail.comsoms...@gmail.com Cc: gluster-users@gluster.orggluster-users@gluster.org Sent: Saturday, July 25, 2009 3:46:14 AM GMT +04:00 Abu Dhabi / Muscat Subject: Re: [Gluster-users] Gluster 2.0.3 + Apache on CentOS5 performance issue I haven't tried an apples to apples comparison with Apache+mod_gluster vs Apache+fuse+gluster however i do run both setups. I load tested both setups so to verified it could handle 4x our normal daily load and left it at that. I didn't actually compare the two (although that might be cool to do someday). I really like the idea of Apache+mod_gluster as I don't have to deal with the whole fuse and mounting the filesystem. It always scares me having a public facing webserver with your whole backend fileshare mounted locally. Its very slick for serving content such as media files. We serve audio content to our CDN with a pair of Apache/mod_gluster servers - pushing 200-300mbit on average daily and everything works very well. We run an apache+fuse+gluster setup because we need to run some mod_perl before serving the actual content. However performance is still very good. We do around 50-100 requests (all jpeg images) per second off of a fuse mount and everything works great. We also have a java tomcat+fuse+gluster service which does image manipulation on the fly off of a gluster mount. We have two backend gluster servers using replication which serve all this content. If you would like more information on our setup id be happy to share offline. Just email me privately. thanks, liam On Fri, Jul 24, 2009 at 8:08 AM, Somsak Sriprayoonsakul soms...@gmail.comsoms...@gmail.comwrote: Oh thank you, thought noone will reply me :) Have you tried Apache + Fuse over GlusterFS? How is the performance? Also, anyone in this mailing-list have tried Apache with booster? I tried it but Apache refuse to start (just hang and freeze). 2009/7/23 Liam Slusser lslus...@gmail.comlslus...@gmail.com We use mod_gluster and Apache 2.2 with good results. We also ran into the same issue as you that we ran out of memory past 150 threads even on a 8gig machine. We got around this by compiling Apache using mpm-worker (threads) instead of prefork - it uses 1/4 as much ram with the same number of connections (150-200) and everything has been running smoothly. I cannot see any performance difference except it using way less memory. liam On Sun, Jul 12, 2009 at 5:11 AM, Somsak Sriprayoonsakul soms...@gmail.comsoms...@gmail.com wrote: Hello, We have been evaluating the choice for the new platform for a webboard system. The webboard is PHP scripts that generate/modify HTML page when user posting/add comment to the page, resulting topic is actually stored as a HTML file with all related file (file attach to the topic, etc.. )stored in its own directory for each topic. In general, the web site mostly serve a lot of small static files using Apache while using PHP to do other dynamic contents. This system has been working very well in the past, with the increasing page view rate, it is very likely that we will need some kind of Cluster file system as backend very soon. We have set up a test system using Grinder as stress test tool. The test system is 11 machines of Intel Dual Core x86_64 CentOS5
Re: [Gluster-users] any configuration guidelines?
On Wed, Jul 29, 2009 at 1:22 PM, Nathan Stratton nat...@robotics.netwrote: On Tue, 28 Jul 2009, Wei Dong wrote: Hi All, We've been using GlusterFS 2.0.1 on our lab cluster to host a large number of small images for distributed processing with Hadoop and it has been working fine without human intervention for a couple of months. Thanks for the wonderful project -- it's the only freely available cluster filesystem that fits our needs. What keeps bothering me is the extremely high flexibility of ClusterFS. There's simply so many ways to achieve the same goal that I don't know which is the best. So I'm writing to ask if there are some general guidelines of configuration to improve both data safety and performance. Totally understand, I am facing many of the same issues, I am not sure if I should be doing replicate / distribute in the frontend client config or backend server configs. -Nathan The preferred way is using the client and not the backend server. There is some documentation somewhere about it - ill see if i can dig it up. ls ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 2.0.3 + Apache on CentOS5 performance issue
I haven't tried an apples to apples comparison with Apache+mod_gluster vs Apache+fuse+gluster however i do run both setups. I load tested both setups so to verified it could handle 4x our normal daily load and left it at that. I didn't actually compare the two (although that might be cool to do someday). I really like the idea of Apache+mod_gluster as I don't have to deal with the whole fuse and mounting the filesystem. It always scares me having a public facing webserver with your whole backend fileshare mounted locally. Its very slick for serving content such as media files. We serve audio content to our CDN with a pair of Apache/mod_gluster servers - pushing 200-300mbit on average daily and everything works very well. We run an apache+fuse+gluster setup because we need to run some mod_perl before serving the actual content. However performance is still very good. We do around 50-100 requests (all jpeg images) per second off of a fuse mount and everything works great. We also have a java tomcat+fuse+gluster service which does image manipulation on the fly off of a gluster mount. We have two backend gluster servers using replication which serve all this content. If you would like more information on our setup id be happy to share offline. Just email me privately. thanks, liam On Fri, Jul 24, 2009 at 8:08 AM, Somsak Sriprayoonsakul soms...@gmail.comwrote: Oh thank you, thought noone will reply me :) Have you tried Apache + Fuse over GlusterFS? How is the performance? Also, anyone in this mailing-list have tried Apache with booster? I tried it but Apache refuse to start (just hang and freeze). 2009/7/23 Liam Slusser lslus...@gmail.com We use mod_gluster and Apache 2.2 with good results. We also ran into the same issue as you that we ran out of memory past 150 threads even on a 8gig machine. We got around this by compiling Apache using mpm-worker (threads) instead of prefork - it uses 1/4 as much ram with the same number of connections (150-200) and everything has been running smoothly. I cannot see any performance difference except it using way less memory. liam On Sun, Jul 12, 2009 at 5:11 AM, Somsak Sriprayoonsakul soms...@gmail.com wrote: Hello, We have been evaluating the choice for the new platform for a webboard system. The webboard is PHP scripts that generate/modify HTML page when user posting/add comment to the page, resulting topic is actually stored as a HTML file with all related file (file attach to the topic, etc.. )stored in its own directory for each topic. In general, the web site mostly serve a lot of small static files using Apache while using PHP to do other dynamic contents. This system has been working very well in the past, with the increasing page view rate, it is very likely that we will need some kind of Cluster file system as backend very soon. We have set up a test system using Grinder as stress test tool. The test system is 11 machines of Intel Dual Core x86_64 CentOS5 with stock Apache (prefork, since the goal is to use this with PHP), linked together with Gigabit Ethernet. We try to compare the performance of either using single NFS server in sync mode against using 4 Gluster nodes (distribute of 2 replicated nodes) through Fuse. However, the transaction per second (TPS) result is not good. NFS (single server, sync mode) - 100 thread of client - Peak TPS = 1716.67, Avg. TPS = 1066, mean response time = 61.63 ms - 200 threads - Peak TPS = 2790, Avg. TPS = 1716, mean rt = 87.33 ms - 400 threads - Peak TPS = 3810, Avg. TPS = 1800, mean rt = 165ms - 600 threads - Peak TPS = 4506.67, Avg. TPS = 1676.67, mean rt = 287.33ms 4 nodes Gluster (2 distribute of replicated 2 node) - 100 thread - peak TPS = 1293.33, Avg. TPS = 430, mean rt = 207.33ms - 200 threads - Peak TPS = 974.67, Avg. TPS = 245.33, mean rt = 672.67ms - 300 threads - Peak TPS = 861.33, Avg. TPS = 210, mean rt = 931.33 (no 400-600 threads since we run out of client machine, sorry). gfsd is configured to use 32 thread of iothread as brick. gfs-client is configured to use io-cache-write-behind-readahead-distribute-replicate. io-cache cache-size is 256MB. I used patched Fuse downloaded from Gluster web-site (build through DKMS). As the result yield, it seems that Gluster performance worse with increasing no. of client. One observation is that the glusterfs process on client is taking about 100% of CPU during all the tests. glusterfsd is utilizing only 70-80% of CPUs during the test time. Note that system is Dual core. I also tried using modglusterfs and not using fuse at all to serve all the static files and conduct another test with Grinder. The result is about the same, 1000+ peak TPS with 2-400 avg. TPS. A problem arise in this test that each Apache prefork process used more about twice more memory and we need to lower number of httpd processes by about half. I tried disable EnableMMAP and it didn't help much. Adjusting
Re: [Gluster-users] Gluster 2.0.3 + Apache on CentOS5 performance issue
We use mod_gluster and Apache 2.2 with good results. We also ran into the same issue as you that we ran out of memory past 150 threads even on a 8gig machine. We got around this by compiling Apache using mpm-worker (threads) instead of prefork - it uses 1/4 as much ram with the same number of connections (150-200) and everything has been running smoothly. I cannot see any performance difference except it using way less memory. liam On Sun, Jul 12, 2009 at 5:11 AM, Somsak Sriprayoonsakul soms...@gmail.comwrote: Hello, We have been evaluating the choice for the new platform for a webboard system. The webboard is PHP scripts that generate/modify HTML page when user posting/add comment to the page, resulting topic is actually stored as a HTML file with all related file (file attach to the topic, etc.. )stored in its own directory for each topic. In general, the web site mostly serve a lot of small static files using Apache while using PHP to do other dynamic contents. This system has been working very well in the past, with the increasing page view rate, it is very likely that we will need some kind of Cluster file system as backend very soon. We have set up a test system using Grinder as stress test tool. The test system is 11 machines of Intel Dual Core x86_64 CentOS5 with stock Apache (prefork, since the goal is to use this with PHP), linked together with Gigabit Ethernet. We try to compare the performance of either using single NFS server in sync mode against using 4 Gluster nodes (distribute of 2 replicated nodes) through Fuse. However, the transaction per second (TPS) result is not good. NFS (single server, sync mode) - 100 thread of client - Peak TPS = 1716.67, Avg. TPS = 1066, mean response time = 61.63 ms - 200 threads - Peak TPS = 2790, Avg. TPS = 1716, mean rt = 87.33 ms - 400 threads - Peak TPS = 3810, Avg. TPS = 1800, mean rt = 165ms - 600 threads - Peak TPS = 4506.67, Avg. TPS = 1676.67, mean rt = 287.33ms 4 nodes Gluster (2 distribute of replicated 2 node) - 100 thread - peak TPS = 1293.33, Avg. TPS = 430, mean rt = 207.33ms - 200 threads - Peak TPS = 974.67, Avg. TPS = 245.33, mean rt = 672.67ms - 300 threads - Peak TPS = 861.33, Avg. TPS = 210, mean rt = 931.33 (no 400-600 threads since we run out of client machine, sorry). gfsd is configured to use 32 thread of iothread as brick. gfs-client is configured to use io-cache-write-behind-readahead-distribute-replicate. io-cache cache-size is 256MB. I used patched Fuse downloaded from Gluster web-site (build through DKMS). As the result yield, it seems that Gluster performance worse with increasing no. of client. One observation is that the glusterfs process on client is taking about 100% of CPU during all the tests. glusterfsd is utilizing only 70-80% of CPUs during the test time. Note that system is Dual core. I also tried using modglusterfs and not using fuse at all to serve all the static files and conduct another test with Grinder. The result is about the same, 1000+ peak TPS with 2-400 avg. TPS. A problem arise in this test that each Apache prefork process used more about twice more memory and we need to lower number of httpd processes by about half. I tried disable EnableMMAP and it didn't help much. Adjusting readahead, write behind according to GlusterOptimization page didn't help much either. My question is, there seems to be bottleneck in this setup, but how can I track this? Note that, I didn't do any other optimization other than what said above. Are there any best practice configuration for using Apache to serve a bunch of small static files like this around? Regards, Somsak ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS Preformance
You have to remember that when you are writing with NFS you're writing to one node, where as your gluster setup below is copying the same data to two nodes; so you're doubling the bandwidth. Dont expect nfs like performance on writing with multiple storage bricks. However read performance should be quite good. liam On Wed, Jul 8, 2009 at 5:22 AM, Hiren Joshi j...@moonfruit.com wrote: Hi, I'm currently evaluating gluster with the intention of replacing our current setup and have a few questions: At the moment, we have a large SAN which is split into 10 partitions and served out via NFS. For gluster, I was thinking 12 nodes to make up about 6TB (mirrored so that's 1TB per node) and served out using gluster. What sort of filesystem should I be using for the nodes (currently on ext3) to give me the best performance and recoverability? Also, I setup a test with a simple mirrored pair with a client that looks like: volume glust3 type protocol/client option transport-type tcp/client option remote-host glust3 option remote-port 6996 option remote-subvolume brick end-volume volume glust4 type protocol/client option transport-type tcp/client option remote-host glust4 option remote-port 6996 option remote-subvolume brick end-volume volume mirror1 type cluster/replicate subvolumes glust3 glust4 end-volume volume writebehind type performance/write-behind option window-size 1MB subvolumes mirror1 end-volume volume cache type performance/io-cache option cache-size 512MB subvolumes writebehind end-volume I ran a basic test by writing 1G to an NFS server and this gluster pair: [r...@glust1 ~]# time dd if=/dev/zero of=/mnt/glust2_nfs/nfs_test bs=65536 count=15625 15625+0 records in 15625+0 records out 102400 bytes (1.0 GB) copied, 1718.16 seconds, 596 kB/s real28m38.278s user0m0.010s sys 0m0.650s [r...@glust1 ~]# time dd if=/dev/zero of=/mnt/glust/glust_test bs=65536 count=15625 15625+0 records in 15625+0 records out 102400 bytes (1.0 GB) copied, 3572.31 seconds, 287 kB/s real59m32.745s user0m0.010s sys 0m0.010s With it taking almost twice as long, can I expect this sort of performance degradation on 'real' servers? Also, what sort of setup would you recommend for us? Can anyone help? Thanks, Josh. ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Need a quick answer on Distributed Replicated Storage questions
Jonathan, You can export a Gluster mount via a client with a NFS server however the performance is pretty poor. As far as i know there is no way to export it with iSCSI. Your best option is to use a single/dual Linux/Solaris iscsi server to boot strap all your systems in xenServer and then use Gluster and fuse to mount your /data drive once the system is up and running. liam On Mon, Jun 15, 2009 at 5:15 PM, Jonathan Bayles jbay...@readytechs.comwrote: Hi all, I am attempting to prevent my company from having to buy a SAN to backend our virtualization platform(xenServer). Right now we have a light workload and 4 dell 2950's (6disks, 1 controller each) to leverage against the storage side. I like what I see in regard to the Distributed Replicated Storage where you essentially create a RAID 10 of bricks. This would work very well for me. The question is, how do I serve this storage paradigm to a front end that's expecting an NFS share or an iSCSI target? Does gluster enable me to access the entire cluster from a single IP? Or is it something I could run on a centos cluster (luci and ricci) and use the cluster suite to present the glustered file system in the form of an NFS share? Let me back up and state my needs/assumptions: * A storage cluster with the capacity equal to at least 1 node(assuming all nodes are the same). * I need to be able to lose/take down any one brick in the cluster at any time without a loss of data. * I need more than the throughput of a single server, if not in overall speed, then in width. * I need to be able to add more bricks in and have the expectation of increased storage capacity and throughput. * I need to present the storage as a single entity as an NFS share or a iSCSI target. If there are any existing models out there please point me too them, I don't mind doing the work I just don't want to re-invent the wheel. Thanks in advance for your time and effort, I know what its like to have to answer newbie questions! ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Timestamp on replicated files and dirs
Stephan, You can get the newest Gluster snapshot by using git, you can download it here http://git-scm.com/download Once you have git do: git clone git://git.sv.gnu.org/gluster.git glusterfs liam On Mon, Jun 8, 2009 at 3:20 AM, Stephan von Krawczynski sk...@ithnet.comwrote: Hello Liam, I have no idea where to download the git release (I am really looking for a tgz source archive to download). Nevertheless I found something call glusterfs-2.0.2 in the qa-releases dir and tried that. It sets the file-timestamps correctly, but not the dir-timestamps. Is there some place where one can download a daily or weekly source snapshot? Regards, Stephan On Sat, 6 Jun 2009 10:33:12 -0700 Liam Slusser lslus...@gmail.com wrote: This has been already fixed in the newest git released so grab that version if you need it today. I believe it is included in version 2.0.2. Liam ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] raid5 or raid6 level cluster
Currently no, but it's in the roadmap for a future release. ls On May 25, 2009, at 1:57 AM, Vahriç Muhtaryan vah...@doruk.net.tr wrote: Hello, İs there anyway to create raid6 or raid5 level glusterfs installati on ? From docs I undetstood that I can do raid1 base glusterfs installation or radi0 (strapting data too all servers ) and raid10 based solution but raid10 based solution is not cost effective because need too much server. Do you have a plan for keep one or two server as a parity for whole glusterfs system ? Regards Vahric ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Fwd: [Gluster-devel] rc8
I cant post to the devel list so i'll post here - i'm still seeing a memory leak in rc8. In my two node server cluster, server1's memory footprint gets larger as well does the load average, while write performance decreases. Server2 (with the identical configuration file) does not have this issue. I had this same problem with rc1, rc4, rc7, a git from last week, and now rc8. 1.3.12 works fine however. liam -- Forwarded message -- From: Gordan Bobic gor...@bobich.net Date: Mon, Apr 20, 2009 at 2:01 PM Subject: Re: [Gluster-devel] rc8 To: gluster-de...@nongnu.org Gordan Bobic wrote: First-access failing bug still seems to be present. But other than that, it seems to be distinctly better than rc4. :) Good work! :) And that massive memory leak is gone, too! The process hasn't grown by a KB after a kernel compile! :D s/Good work/Awesome work/ :) Gordan ___ Gluster-devel mailing list gluster-de...@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users