Re: [Gluster-users] ZFS + Linux + Glusterfs for a production ready 100+ TB NAS on cloud

2011-09-26 Thread Di Pe
On Sun, Sep 25, 2011 at 5:51 AM, Joe Landman
 wrote:
> On 09/25/2011 03:56 AM, Di Pe wrote:
>
>> So far the discussion has been focusing on XFS vs ZFS. I admit that I
>> am a fan of ZFS and I have only used XFS for performance reasons on
>> mysql servers where it did well. When I read something like this
>> http://oss.sgi.com/archives/xfs/2011-08/msg00320.html that makes me
>> not want to use XFS for big data. You can assume that this is a real
>
> This is a corner case bug, and one we are hoping we can get more data to the
> XFS team for.  They asked for specific information that we couldn't provide
> (as we had to fix the problem).  Note: other file systems which allow for
> sparse files *may* have similar issues.  We haven't tried yet.

Fair enough, but one of the things LLNL pointed out was that you have
to do fsck in the first place (aka standard file systems are not self
healing)

>
> The issues with ZFS on Linux have to do with legal hazards.  Neither Oracle,
> nor those who claim ZFS violates their patents, would be happy to see
> license violations, or further deployment of ZFS on Linux.  I know the
> national labs in the US are happily doing the integration from source.  But
> I don't think Oracle and the patent holders would sit idly by while others
> do this.  So you'd need to use a ZFS based system such as Solaris 11 express
> to be able to use it without hassle.  BSD and Illumos may work without issue
> as well, and should be somewhat better on the legal front than Linux + ZFS.
>  I am obviously not a lawyer, and you should consult one before you proceed
> down this route.
>
>> recent bug because Joe is a smart guy who knows exactly what he is
>> doing. Joe and the Gluster guys are vendors who can work around these
>> issues and provide support. If XFS is the choice, may be you should
>> hire them for this gig.
>>
>> ZFS typically does not have these FS repair issues in the first place.
>> The motivation of Lawrence Livermore for porting ZFS to Linux was
>> quite clear:
>>
>> http://zfsonlinux.org/docs/SC10_BoF_ZFS_on_Linux_for_Lustre.pdf
>>
>> OK, they have 50PB and we are talking about much smaller deployments.
>> However some of the limitations they report I can confirm. Also,
>> recovering from a drive failure with this whole LVM/Linux Raid stuff
>> is unpredictable. Hot swapping does not always work and if you
>> prioritize the re-sync of data to the new drive you can strangle the
>> entire box (by default the priority of the re-sync process is low on
>> linux). If you are a Linux expert you can handle this kind of stuff
>> (or hire someone) but if you ever want to give this setup to a Storage
>> Administrator you better give them something that they can use with
>> confidence (may be less of an issue in the cloud).
>> Compare to this to ZFS: re-silvering works with a very predictable
>> result and timing. There is a ton of info out there on this topic.  I
>> think that gluster users may be getting around many of the linux raid
>> issues by simply taking the entire node down (which is ok in mirrored
>> node settings) or by using hardware raid controllers. (which are often
>> not available in the cloud )
>
> There are definite advantages to better technology.  But the issue in this
> case is the legal baggage that goes along with them.
>
> BTRFS may, eventually, be a better choice.  The national labs can do this
> with something of an immunity to prosecution for license violation, by
> claiming the work is part of a research project, and won't actively be used
> in a way that would harm Oracle's interests.  And it would be ... bad ...
> for Oracle (and others) to sue to government over a relatively trivial
> violation.
>

I am trying to make sense what people discuss regarding the ZFS
licensing issue. Did you hear anything from anyone at Oracle that
would indicate that they don't like ZFS on Linux? If I think through
it I can't see why this would make any sense. The ZFS on Linux
community is extremely small and will probably always be and the main
reason besides data size is that the GPL doesn't like the CDDL not
vice-versa so distros shy away from it.
The LLNL people have found a way around the GPL2 issue by implementing
it as a driver.
Why doesn't Oracle sue Nexenta? Those guys have deployed 330PB of
their storage and would be a worthy  target.
The only company that seems to have issues with ZFS in general is
NetApp and I'm sure that they don't care whether it's installed on
Solaris or on Linux. NetApp interestingly sued CoRaid, a disk shelf
vendor that was using Nexenta as OS but they did not sue not Nexenta
itself. NetApp knew that their case was very weak. If they had sued
Nexenta, Nexenta would have fought back because the very existence of
the company would have been at risk. NetApp feared that Nexenta might
have won which would have confirmed the legitimacy of ZFS. CoRaid on
the other hand was not dependent on their ZFS solution for their
business to be able to continue. They were

Re: [Gluster-users] Is gluster suitable and production ready foremail/webservers?

2011-09-26 Thread Robert Krig

On 09/26/2011 03:04 PM, Emmanuel Noobadmin wrote:
>> As you can guess, rsync is not so good with lots of small files, at
>> least not THAT many small files, so with a 10Gigabit ethernet
>> connection, on the small files we got about 10-30 megabytes per second.
> 10~30MB/s is more than OK for me. However, you're on 10G while my
> client has a budget I need to work within so bonded 1G with VLAN is
> probably the best I can do. Any idea/data on how much an impact that
> might make?

I forgot to mention that our 10gigabit was also a shared vlan. We have a
dedicated external ip and "virtual" internal one on a single 10gbe
Ethernet Interface. However, I don't how much of an impact it would make
with just a 1gbit vlan. I have only just begun using glusterfs, and this
is my first server using 10gbe ethernet, so it might be that there is
still some performance gain that is available through some tuning.

 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Community Contest Update

2011-09-26 Thread John Mark Walker
Hi gang,

An updated leaderboard, as of Friday at 5pm:


  *   Joe Julian 84
  *   Semiosis 46
  *   Jeff Darcy 39
  *   Greg Swift 19
  *   Steve MacGregor 12

We're coming down the stretch, with the final points tally this Friday at 5pm 
PDT.

Look for more updates this week - http://www.gluster.org/contest/

Thanks!
John Mark



From: John Mark Walker
Sent: Thursday, September 15, 2011 11:06 AM
To: gluster-users@gluster.org
Subject: Community Contest Update

As a reminder, we have 15 days left in our first community contest. Here's how 
the leader board stacks up, as of 5pm PDT yesterday:

Joe Julian  42
Jeff Darcy  16
patrick tully   7
Semiosis6
Greg Swift  5


…with a long tail of many, many others. This is for all activity that has taken 
place since September 1, 2011.


Look for an updated leader board every week at http://www.gluster.org/contest/



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Is gluster suitable and production ready foremail/webservers?

2011-09-26 Thread Emmanuel Noobadmin
On 9/26/11, Robert Krig  wrote:
> I guess the question to ask here is, do you need a lot of read/write
> performance for your application, or is redundancy and synchronisation
> more important?

All would be nice but of course I know in the real world, there has to
be some compromise. For the client's setup, I don't think performance
is the #1 factor but at the very least the system has to be able to
sustain 8MB/s of transfers (going by their 10Mbps~20Mbps connection,
and x2 due to replication required) on bonded 1G ethernet.

Just as important is the latency, which was the key problem pointed
out in the rackerhacker blog, 3~4 seconds latency is bad. I'd rather
have 0.5 second latency with 5MB/s than 5 seconds lag with 50MB/s
performance.

More importantly is the data integrity and redundancy, the former
being more important since redundant corrupted data is useless. Which
is why the bug about the corruption of dynamically generated/edited
files is a concern.


> As you can guess, rsync is not so good with lots of small files, at
> least not THAT many small files, so with a 10Gigabit ethernet
> connection, on the small files we got about 10-30 megabytes per second.

10~30MB/s is more than OK for me. However, you're on 10G while my
client has a budget I need to work within so bonded 1G with VLAN is
probably the best I can do. Any idea/data on how much an impact that
might make?

> Of course, regardless of what other people might have experienced. Your
> best bet ist to test it with your own equipment. There are so many
> variables between differing distros, kernels, optimisations, and
> hardware, it's hard to guarantee any kind of minimum performance.

Unfortunately, I need to make an good estimate on the best file system
to go with in order to plan and go to them with a budget for the
hardware before any testing could be done. While I could try to put
together a test network with VM on our spare hardware, there are just
too many bottlenecks and variables introduced that make such tests
useless except as proof of concept that the setup is sane and would
work.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Gluster geo-replication problems

2011-09-26 Thread Jojo Colina
Hello,

I am trying again to establish geo-replication between a volume called
images and a local directory named /glimages. (nfs-mounted)

I get status = faulty. Looking in
/var/log/glusterfs/geo-replication/images/file%3A%2F%2F%2Fglimage.log, I
see: OSError: [Errno 107] Transport endpoint is not connected

but then " gluster peer status" shows all peers connected. Please help!

Information and logs follow.

Thanks,

Jojo



---

[root@creator ~]# gluster peer status
Number of Peers: 6

Hostname: stor-003
Uuid: 3300f1a9-9252-4d39-a8dd-6ef6de66e4c3
State: Peer in Cluster (Connected)

Hostname: stor-001
Uuid: a7406cf1-c598-424e-85ab-5758016999a1
State: Peer in Cluster (Connected)

Hostname: stor-008
Uuid: 0f57c4a5-9f01-475b-b295-ebd6f63e855d
State: Peer in Cluster (Disconnected)

Hostname: stor-007
Uuid: bd966425-576c-4cba-be5c-b16eb00d10f1
State: Peer in Cluster (Disconnected)

Hostname: stor-002
Uuid: f38afa35-0c73-4c08-926f-a39953f48180
State: Peer in Cluster (Connected)

Hostname: stor-004
Uuid: 13b28d31-9eed-4052-9e45-c3baf83ce01e
State: Peer in Cluster (Connected)




[root@creator ~]# gluster volume info

Volume Name: images
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: stor-001:/glusterfs
Brick2: stor-002:/glusterfs
Brick3: stor-003:/glusterfs
Brick4: stor-004:/glusterfs
Options Reconfigured:
geo-replication.indexing: on

[root@creator ~]# gluster volume geo-replication images /glimage start
Starting geo-replication session between images & /glimage has been
successful

[root@creator ~]# gluster volume geo-replication status
MASTER   SLAVE
STATUS
---
-
images   file:///glimage
faulty

[root@creator ~]# rsync
rsync  version 3.0.7  protocol version 30
Copyright (C) 1996-2009 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs, iconv, no symtimes


root@creator ~]# cat
/var/log/glusterfs/geo-replication-slaves/df4e1ece-61eb-47e5-8420-2d0f081ad
0fe\:file%3A%2F%2F%2Fglimage.log
[2011-09-25 17:35:59.639471] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:35:59.640634] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:36:01.677525] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:36:12.373981] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:36:12.376377] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:36:13.672760] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:36:24.898442] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:36:24.900129] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:36:26.194766] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:36:37.420826] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:36:37.421900] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:36:38.717241] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:36:49.939140] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:36:49.940651] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:36:51.241149] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:37:02.464519] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:37:02.466437] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:37:03.760050] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:37:14.985683] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:37:14.987176] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:37:16.281832] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:37:27.505346] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:37:27.506943] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:37:28.802320] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:37:40.31245] I [gsyncd(slave):286:main_i] : syncing:
file:///glimage
[2011-09-25 17:37:40.32705] I [resource(slave):200:service_loop] FILE:
slave listening
...
[2011-09-25 17:39:57.801064] I [resource(slave):200:service_loop] FILE:
slave listening
[2011-09-25 17:39:59.96476] I [repce(slave):61:service_loop] RepceServer:
terminating on reaching EOF.
[2011-09-25 17:39:59.643095] I [resource(slave):206:service_loop] FILE:
connection inactive for 120 

[Gluster-users] GlusterFS and Infiniband

2011-09-26 Thread Abraham van der Merwe
Hi!

I saw you email to the Gluster-users mailinglist about instability of
GlusterFS on your 20-node cluster and was wondering if you managed to
resolve your problem.

http://gluster.org/pipermail/gluster-users/2011-January/006332.html

-- 

Regards
 Abraham

TODAY the Pond!
TOMORROW the World!
-- Frogs (1972)

___
 Abraham vd Merwe - Frogfoot Networks (Pty) Ltd
 Suite 20-102D, Building 20, The Waverley Business Park
 Kotzee Road, Mowbray, Cape Town, South Africa, 7770
 Phone: +27 21 448 7225 Cell: +27 82 565 4451
 Http: http://www.frogfoot.com/ Email: a...@frogfoot.com

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] GLUSTERFS + ZFS ON LINUX

2011-09-26 Thread RDP
Hello,
  May be this question would have been addressed elsewhere but I did like
the opinion and experience of other users.

There could be some misconceptions that I might be carrying, so please be
kind to point them out. Any help, advice and suggestions will be very highly
appreciated.

My goal is to get a greater than 100 TB gluster NAS up on the cloud. Each
server will hold around 2x8TB disks. The export volume size (client disk
mount size) would be greater than 20 TB.

This is how I am planning to set it up all.. 16 servers each with 2x8=16 TB
of space. The glusterfs will be replicate and distributed (raid-10). I did
like to go with ZFS on linux for the disks.
The client machines will use the glusterfs client for mounting the volumes.

ext4 is limited to 16 TB due to userspace tool (e2fsprogs).

Would this be considered as a production ready setup? The data housed on
this cluster will is critical and hence I need to very sure before I go
ahead with this kind of a setup.

Or would using ZFS with Gluster makes more sense on FreeBSD or illuminos
(ZFS is native there).

Thanks a lot
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Is gluster suitable and production ready foremail/webservers?

2011-09-26 Thread Robert Krig

On 09/26/2011 07:34 AM, Emmanuel Noobadmin wrote:
> I've been leaning towards actually deploying gluster in one of my
> projects for a while and finally a probable candidate project came up.
>
> However, researching into the specific use case, it seems that gluster
> isn't really suitable for load profiles that deal with lots of
> concurrent small files. e.g.
>
> http://www.techforce.com.br/news/linux_blog/glusterfs_tuning_small_files
> http://rackerhacker.com/2010/12/02/keep-web-servers-in-sync-with-drbd-and-ocfs2/
> http://bugs.gluster.com/show_bug.cgi?id=2869
> http://gluster.org/pipermail/gluster-users/2011-June/007970.html
>
> The first two are rather old so maybe the situation has changed. But
> the bug report and mailing list issue in June ring alarm bells.
>
> Is gluster really unsuited for this kind of workload or have things
> improved since then?
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
I guess the question to ask here is, do you need a lot of read/write
performance for your application, or is redundancy and synchronisation
more important?

In my own tests I used rsync to transfer 14TB of data to our new two
glusterfs storage nodes.
The data was composed of about 500GB of small jpegs and the rest was
video files.
As you can guess, rsync is not so good with lots of small files, at
least not THAT many small files, so with a 10Gigabit ethernet
connection, on the small files we got about 10-30 megabytes per second.
Once we got to the big files, we managed about 100-150megabytes /per
second. Definitely not the maximum the system was capable of, but then
again, these weren't ideal testing conditions.

A simple dd if=/dev/zero | pv | dd of=/storage/testfile.dmp on a locally
mounted glusterfsmount resulted in about 200-250megabytes /s. Of course
an iperf between the two nodes resulted in a maximum network speed of
around 5 gigabits/s.


Of course, regardless of what other people might have experienced. Your
best bet ist to test it with your own equipment. There are so many
variables between differing distros, kernels, optimisations, and
hardware, it's hard to guarantee any kind of minimum performance.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users