Re: [Gluster-users] ZFS + Linux + Glusterfs for a production ready 100+ TB NAS on cloud
On Sun, Sep 25, 2011 at 5:51 AM, Joe Landman wrote: > On 09/25/2011 03:56 AM, Di Pe wrote: > >> So far the discussion has been focusing on XFS vs ZFS. I admit that I >> am a fan of ZFS and I have only used XFS for performance reasons on >> mysql servers where it did well. When I read something like this >> http://oss.sgi.com/archives/xfs/2011-08/msg00320.html that makes me >> not want to use XFS for big data. You can assume that this is a real > > This is a corner case bug, and one we are hoping we can get more data to the > XFS team for. They asked for specific information that we couldn't provide > (as we had to fix the problem). Note: other file systems which allow for > sparse files *may* have similar issues. We haven't tried yet. Fair enough, but one of the things LLNL pointed out was that you have to do fsck in the first place (aka standard file systems are not self healing) > > The issues with ZFS on Linux have to do with legal hazards. Neither Oracle, > nor those who claim ZFS violates their patents, would be happy to see > license violations, or further deployment of ZFS on Linux. I know the > national labs in the US are happily doing the integration from source. But > I don't think Oracle and the patent holders would sit idly by while others > do this. So you'd need to use a ZFS based system such as Solaris 11 express > to be able to use it without hassle. BSD and Illumos may work without issue > as well, and should be somewhat better on the legal front than Linux + ZFS. > I am obviously not a lawyer, and you should consult one before you proceed > down this route. > >> recent bug because Joe is a smart guy who knows exactly what he is >> doing. Joe and the Gluster guys are vendors who can work around these >> issues and provide support. If XFS is the choice, may be you should >> hire them for this gig. >> >> ZFS typically does not have these FS repair issues in the first place. >> The motivation of Lawrence Livermore for porting ZFS to Linux was >> quite clear: >> >> http://zfsonlinux.org/docs/SC10_BoF_ZFS_on_Linux_for_Lustre.pdf >> >> OK, they have 50PB and we are talking about much smaller deployments. >> However some of the limitations they report I can confirm. Also, >> recovering from a drive failure with this whole LVM/Linux Raid stuff >> is unpredictable. Hot swapping does not always work and if you >> prioritize the re-sync of data to the new drive you can strangle the >> entire box (by default the priority of the re-sync process is low on >> linux). If you are a Linux expert you can handle this kind of stuff >> (or hire someone) but if you ever want to give this setup to a Storage >> Administrator you better give them something that they can use with >> confidence (may be less of an issue in the cloud). >> Compare to this to ZFS: re-silvering works with a very predictable >> result and timing. There is a ton of info out there on this topic. I >> think that gluster users may be getting around many of the linux raid >> issues by simply taking the entire node down (which is ok in mirrored >> node settings) or by using hardware raid controllers. (which are often >> not available in the cloud ) > > There are definite advantages to better technology. But the issue in this > case is the legal baggage that goes along with them. > > BTRFS may, eventually, be a better choice. The national labs can do this > with something of an immunity to prosecution for license violation, by > claiming the work is part of a research project, and won't actively be used > in a way that would harm Oracle's interests. And it would be ... bad ... > for Oracle (and others) to sue to government over a relatively trivial > violation. > I am trying to make sense what people discuss regarding the ZFS licensing issue. Did you hear anything from anyone at Oracle that would indicate that they don't like ZFS on Linux? If I think through it I can't see why this would make any sense. The ZFS on Linux community is extremely small and will probably always be and the main reason besides data size is that the GPL doesn't like the CDDL not vice-versa so distros shy away from it. The LLNL people have found a way around the GPL2 issue by implementing it as a driver. Why doesn't Oracle sue Nexenta? Those guys have deployed 330PB of their storage and would be a worthy target. The only company that seems to have issues with ZFS in general is NetApp and I'm sure that they don't care whether it's installed on Solaris or on Linux. NetApp interestingly sued CoRaid, a disk shelf vendor that was using Nexenta as OS but they did not sue not Nexenta itself. NetApp knew that their case was very weak. If they had sued Nexenta, Nexenta would have fought back because the very existence of the company would have been at risk. NetApp feared that Nexenta might have won which would have confirmed the legitimacy of ZFS. CoRaid on the other hand was not dependent on their ZFS solution for their business to be able to continue. They were
Re: [Gluster-users] Is gluster suitable and production ready foremail/webservers?
On 09/26/2011 03:04 PM, Emmanuel Noobadmin wrote: >> As you can guess, rsync is not so good with lots of small files, at >> least not THAT many small files, so with a 10Gigabit ethernet >> connection, on the small files we got about 10-30 megabytes per second. > 10~30MB/s is more than OK for me. However, you're on 10G while my > client has a budget I need to work within so bonded 1G with VLAN is > probably the best I can do. Any idea/data on how much an impact that > might make? I forgot to mention that our 10gigabit was also a shared vlan. We have a dedicated external ip and "virtual" internal one on a single 10gbe Ethernet Interface. However, I don't how much of an impact it would make with just a 1gbit vlan. I have only just begun using glusterfs, and this is my first server using 10gbe ethernet, so it might be that there is still some performance gain that is available through some tuning. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Community Contest Update
Hi gang, An updated leaderboard, as of Friday at 5pm: * Joe Julian 84 * Semiosis 46 * Jeff Darcy 39 * Greg Swift 19 * Steve MacGregor 12 We're coming down the stretch, with the final points tally this Friday at 5pm PDT. Look for more updates this week - http://www.gluster.org/contest/ Thanks! John Mark From: John Mark Walker Sent: Thursday, September 15, 2011 11:06 AM To: gluster-users@gluster.org Subject: Community Contest Update As a reminder, we have 15 days left in our first community contest. Here's how the leader board stacks up, as of 5pm PDT yesterday: Joe Julian 42 Jeff Darcy 16 patrick tully 7 Semiosis6 Greg Swift 5 …with a long tail of many, many others. This is for all activity that has taken place since September 1, 2011. Look for an updated leader board every week at http://www.gluster.org/contest/ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Is gluster suitable and production ready foremail/webservers?
On 9/26/11, Robert Krig wrote: > I guess the question to ask here is, do you need a lot of read/write > performance for your application, or is redundancy and synchronisation > more important? All would be nice but of course I know in the real world, there has to be some compromise. For the client's setup, I don't think performance is the #1 factor but at the very least the system has to be able to sustain 8MB/s of transfers (going by their 10Mbps~20Mbps connection, and x2 due to replication required) on bonded 1G ethernet. Just as important is the latency, which was the key problem pointed out in the rackerhacker blog, 3~4 seconds latency is bad. I'd rather have 0.5 second latency with 5MB/s than 5 seconds lag with 50MB/s performance. More importantly is the data integrity and redundancy, the former being more important since redundant corrupted data is useless. Which is why the bug about the corruption of dynamically generated/edited files is a concern. > As you can guess, rsync is not so good with lots of small files, at > least not THAT many small files, so with a 10Gigabit ethernet > connection, on the small files we got about 10-30 megabytes per second. 10~30MB/s is more than OK for me. However, you're on 10G while my client has a budget I need to work within so bonded 1G with VLAN is probably the best I can do. Any idea/data on how much an impact that might make? > Of course, regardless of what other people might have experienced. Your > best bet ist to test it with your own equipment. There are so many > variables between differing distros, kernels, optimisations, and > hardware, it's hard to guarantee any kind of minimum performance. Unfortunately, I need to make an good estimate on the best file system to go with in order to plan and go to them with a budget for the hardware before any testing could be done. While I could try to put together a test network with VM on our spare hardware, there are just too many bottlenecks and variables introduced that make such tests useless except as proof of concept that the setup is sane and would work. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Gluster geo-replication problems
Hello, I am trying again to establish geo-replication between a volume called images and a local directory named /glimages. (nfs-mounted) I get status = faulty. Looking in /var/log/glusterfs/geo-replication/images/file%3A%2F%2F%2Fglimage.log, I see: OSError: [Errno 107] Transport endpoint is not connected but then " gluster peer status" shows all peers connected. Please help! Information and logs follow. Thanks, Jojo --- [root@creator ~]# gluster peer status Number of Peers: 6 Hostname: stor-003 Uuid: 3300f1a9-9252-4d39-a8dd-6ef6de66e4c3 State: Peer in Cluster (Connected) Hostname: stor-001 Uuid: a7406cf1-c598-424e-85ab-5758016999a1 State: Peer in Cluster (Connected) Hostname: stor-008 Uuid: 0f57c4a5-9f01-475b-b295-ebd6f63e855d State: Peer in Cluster (Disconnected) Hostname: stor-007 Uuid: bd966425-576c-4cba-be5c-b16eb00d10f1 State: Peer in Cluster (Disconnected) Hostname: stor-002 Uuid: f38afa35-0c73-4c08-926f-a39953f48180 State: Peer in Cluster (Connected) Hostname: stor-004 Uuid: 13b28d31-9eed-4052-9e45-c3baf83ce01e State: Peer in Cluster (Connected) [root@creator ~]# gluster volume info Volume Name: images Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: stor-001:/glusterfs Brick2: stor-002:/glusterfs Brick3: stor-003:/glusterfs Brick4: stor-004:/glusterfs Options Reconfigured: geo-replication.indexing: on [root@creator ~]# gluster volume geo-replication images /glimage start Starting geo-replication session between images & /glimage has been successful [root@creator ~]# gluster volume geo-replication status MASTER SLAVE STATUS --- - images file:///glimage faulty [root@creator ~]# rsync rsync version 3.0.7 protocol version 30 Copyright (C) 1996-2009 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, no symtimes root@creator ~]# cat /var/log/glusterfs/geo-replication-slaves/df4e1ece-61eb-47e5-8420-2d0f081ad 0fe\:file%3A%2F%2F%2Fglimage.log [2011-09-25 17:35:59.639471] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:35:59.640634] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:36:01.677525] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:36:12.373981] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:36:12.376377] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:36:13.672760] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:36:24.898442] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:36:24.900129] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:36:26.194766] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:36:37.420826] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:36:37.421900] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:36:38.717241] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:36:49.939140] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:36:49.940651] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:36:51.241149] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:37:02.464519] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:37:02.466437] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:37:03.760050] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:37:14.985683] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:37:14.987176] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:37:16.281832] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:37:27.505346] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:37:27.506943] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:37:28.802320] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:37:40.31245] I [gsyncd(slave):286:main_i] : syncing: file:///glimage [2011-09-25 17:37:40.32705] I [resource(slave):200:service_loop] FILE: slave listening ... [2011-09-25 17:39:57.801064] I [resource(slave):200:service_loop] FILE: slave listening [2011-09-25 17:39:59.96476] I [repce(slave):61:service_loop] RepceServer: terminating on reaching EOF. [2011-09-25 17:39:59.643095] I [resource(slave):206:service_loop] FILE: connection inactive for 120
[Gluster-users] GlusterFS and Infiniband
Hi! I saw you email to the Gluster-users mailinglist about instability of GlusterFS on your 20-node cluster and was wondering if you managed to resolve your problem. http://gluster.org/pipermail/gluster-users/2011-January/006332.html -- Regards Abraham TODAY the Pond! TOMORROW the World! -- Frogs (1972) ___ Abraham vd Merwe - Frogfoot Networks (Pty) Ltd Suite 20-102D, Building 20, The Waverley Business Park Kotzee Road, Mowbray, Cape Town, South Africa, 7770 Phone: +27 21 448 7225 Cell: +27 82 565 4451 Http: http://www.frogfoot.com/ Email: a...@frogfoot.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] GLUSTERFS + ZFS ON LINUX
Hello, May be this question would have been addressed elsewhere but I did like the opinion and experience of other users. There could be some misconceptions that I might be carrying, so please be kind to point them out. Any help, advice and suggestions will be very highly appreciated. My goal is to get a greater than 100 TB gluster NAS up on the cloud. Each server will hold around 2x8TB disks. The export volume size (client disk mount size) would be greater than 20 TB. This is how I am planning to set it up all.. 16 servers each with 2x8=16 TB of space. The glusterfs will be replicate and distributed (raid-10). I did like to go with ZFS on linux for the disks. The client machines will use the glusterfs client for mounting the volumes. ext4 is limited to 16 TB due to userspace tool (e2fsprogs). Would this be considered as a production ready setup? The data housed on this cluster will is critical and hence I need to very sure before I go ahead with this kind of a setup. Or would using ZFS with Gluster makes more sense on FreeBSD or illuminos (ZFS is native there). Thanks a lot ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Is gluster suitable and production ready foremail/webservers?
On 09/26/2011 07:34 AM, Emmanuel Noobadmin wrote: > I've been leaning towards actually deploying gluster in one of my > projects for a while and finally a probable candidate project came up. > > However, researching into the specific use case, it seems that gluster > isn't really suitable for load profiles that deal with lots of > concurrent small files. e.g. > > http://www.techforce.com.br/news/linux_blog/glusterfs_tuning_small_files > http://rackerhacker.com/2010/12/02/keep-web-servers-in-sync-with-drbd-and-ocfs2/ > http://bugs.gluster.com/show_bug.cgi?id=2869 > http://gluster.org/pipermail/gluster-users/2011-June/007970.html > > The first two are rather old so maybe the situation has changed. But > the bug report and mailing list issue in June ring alarm bells. > > Is gluster really unsuited for this kind of workload or have things > improved since then? > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > I guess the question to ask here is, do you need a lot of read/write performance for your application, or is redundancy and synchronisation more important? In my own tests I used rsync to transfer 14TB of data to our new two glusterfs storage nodes. The data was composed of about 500GB of small jpegs and the rest was video files. As you can guess, rsync is not so good with lots of small files, at least not THAT many small files, so with a 10Gigabit ethernet connection, on the small files we got about 10-30 megabytes per second. Once we got to the big files, we managed about 100-150megabytes /per second. Definitely not the maximum the system was capable of, but then again, these weren't ideal testing conditions. A simple dd if=/dev/zero | pv | dd of=/storage/testfile.dmp on a locally mounted glusterfsmount resulted in about 200-250megabytes /s. Of course an iperf between the two nodes resulted in a maximum network speed of around 5 gigabits/s. Of course, regardless of what other people might have experienced. Your best bet ist to test it with your own equipment. There are so many variables between differing distros, kernels, optimisations, and hardware, it's hard to guarantee any kind of minimum performance. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users