Re: [Gluster-users] Strange - Missing hostname-trigger_ip-1 resources

2017-02-03 Thread ML Wong
Thanks so much for your promptly response, Soumya.
That helps clearing out one of my questions. I am trying to figure out why
NFS service did not failover/pick-up the NFS clients last time when one of
our cluster-nodes failed.

Though i could see, in corosync.log, a notify got sent to the cluster the
failed node, the election, and the IP failover process seems to all be
finished with in around minute. However, after the IP failover to the
destinated node, i tried to do a "showmount -e localhost" - the command got
hung. But, i still see ganesha-nfsd is running in the host. To your
expertise, if i understand the process correctly, given that i keep all the
default timeout/interval settings for nfs-mon, nfs-grace, the entire IP
failover, and NFS service failover process should be completed within 2
minutes. Am i correct?

Your help is again appreciated.

On Thu, Feb 2, 2017 at 11:42 PM, Soumya Koduri  wrote:

> Hi,
>
> On 02/03/2017 07:52 AM, ML Wong wrote:
>
>> Hello All,
>> Any pointers will be very-much appreciated. Thanks in advance!
>>
>> Environment:
>> Running centOS 7.2.511
>> Gluster: 3.7.16, with nfs-ganesha on 2.3.0.1 from centos-gluster37 repo
>> sha1: cab5df4064e3a31d1d92786d91bd41d91517fba8  ganesha-ha.sh
>>
>> we have used this set up in 3 different gluster, nfs-ganesha
>> environment. The cluster got setup when we do 'gluster nfs-ganesha
>> enable' , and we can serve NFS without issues. And i see all the
>> resources got created, but not the *hostname*-trigger_ip-1 resources? Is
>> that normal?
>>
>>
> Yes it is normal. With change [1], new resource agent attributes have been
> introduced in place of *-trigger_ip-1 to monitor, move the VIP  and put the
> cluster in grace. More details are in the change# commit msg.
>
> Thanks,
> Soumya
>
> [1] https://github.com/gluster/glusterfs/commit/e8121c4afb3680f5
> 32b450872b5a3ffcb3766a97
>
> without *hostname*-trigger_ip-1, according to ganesha-ha.sh, wouldn't it
>> affect the NFS going into grace, and help to transition the NFS service
>> to other member nodes at the times of node-failures? please correct me
>> if i misunderstood.
>>
>> I tried issuing both 'gluster nfs-ganesha enable', and 'bash -x
>> /usr/libexec/ganesha/ganesha-ha.sh --setup'. In both scenarios, i still
>> don't see the *hostname*-trigger_ip-1 got created.
>>
>> below is my ganesha-ha.conf
>> HA_NAME="ganesha-ha-01"
>> HA_VOL_SERVER="vm-fusion1"
>> HA_CLUSTER_NODES="vm-fusion1,vm-fusion3"
>> VIP_vm-fusion1="192.168.30.211"
>> VIP_vm-fusion3="192.168.30.213"
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Quick performance check?

2017-02-03 Thread Momonth
I've tested the worst case scenario on purpose, by increasing number
of threads it was able to get more throughout, but it didn't change
linearly.

Thanks for the links =)

On Fri, Feb 3, 2017 at 3:40 PM, Gambit15  wrote:
> On 3 February 2017 at 11:09, Momonth  wrote:
>>
>> Hi,
>>
>> I ran some benchmarking on SSD enabled servers, 10Gb connected, see
>> the file attached.
>>
>> I'm still looking at GlusterFS as a persistent storage for containers,
>> and it's clear it's not going to compete with local file system
>> performance.
>
>
> Well that's kind of a given, with the standard rep 3, you're doing a sort of
> RAID 5 across the network. However depending on your use case & setup, you
> can get performance boosts akin to RAID 10 setups, multiplied bu the number
> of nodes/bricks in the cluster.
>
> http://blog.gluster.org/category/performance/
> https://s3.amazonaws.com/aws001/guided_trek/Performance_in_a_Gluster_Systemv6F.pdf
>
> I couldn't find the particular doc, but I've seen some ludicrous throughputs
> from configs using multiple nodes running SSDs in RAID 10 and peering over
> Infiband.
>
> D
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick performance check?

2017-02-03 Thread Gambit15
On 3 February 2017 at 11:09, Momonth  wrote:

> Hi,
>
> I ran some benchmarking on SSD enabled servers, 10Gb connected, see
> the file attached.
>
> I'm still looking at GlusterFS as a persistent storage for containers,
> and it's clear it's not going to compete with local file system
> performance.
>

Well that's kind of a given, with the standard rep 3, you're doing a sort
of RAID 5 across the network. However depending on your use case & setup,
you *can* get performance boosts akin to RAID 10 setups, multiplied bu the
number of nodes/bricks in the cluster.

http://blog.gluster.org/category/performance/
https://s3.amazonaws.com/aws001/guided_trek/Performance_in_a_Gluster_Systemv6F.pdf

I couldn't find the particular doc, but I've seen some ludicrous
throughputs from configs using multiple nodes running SSDs in RAID 10 and
peering over Infiband.

D
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Quick performance check?

2017-02-03 Thread Momonth
Hi,

I ran some benchmarking on SSD enabled servers, 10Gb connected, see
the file attached.

I'm still looking at GlusterFS as a persistent storage for containers,
and it's clear it's not going to compete with local file system
performance.

Cheers,
Vladimir

On Fri, Feb 3, 2017 at 12:28 PM, Alex Sudakar  wrote:
> Hi.  I'm looking for a clustered filesystem for a very simple
> scenario.  I've set up Gluster but my tests have shown quite a
> performance penalty when compared to using a local XFS filesystem.
> This no doubt reflects the reality of moving to a proper distributed
> filesystem, but I'd like to quickly check that I haven't missed
> something obvious that might improve performance.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Quick performance check?

2017-02-03 Thread Cedric Lemarchand

> On 3 Feb 2017, at 13:48, Gambit15  wrote:
> 
> Hi Alex,
>  I don't use Gluster for storing large amounts of small files, however from 
> what I've read, that does appear to its big achilles heel.

I am not an expert but I agree, due to its distributed nature, the induced per 
file access latency plays a big role when you have to deal with lot of small 
files, but it seems there are some tuning options available, a good place to 
start could be : 
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html

> Personally, if you're not looking to scale out to a lot more servers, I'd go 
> with Ceph or DRBD. Gluster's best features are in its scalability.

AFAIK Ceph need at least 3 monitors (aka a quorum) to be fully “hight 
available”, so the entry ticket is pretty high and from my point of view 
over-kill for such needs, except if you plane to scale out too. DRBD seems a 
more reasonable approach.

Cheers 

> Also, it's worth pointing out that in any setup, you've got to be careful 
> with 2 node configurations as they're highly vulnerable to split-brain 
> scenarios.
> 
> Given the relatively small size of your data, caching tweaks & an arbiter may 
> well save you here, however I don't use enough of its caching features to be 
> able to give advice on it.
> 
> D
> 
> On 3 February 2017 at 08:28, Alex Sudakar  > wrote:
> Hi.  I'm looking for a clustered filesystem for a very simple
> scenario.  I've set up Gluster but my tests have shown quite a
> performance penalty when compared to using a local XFS filesystem.
> This no doubt reflects the reality of moving to a proper distributed
> filesystem, but I'd like to quickly check that I haven't missed
> something obvious that might improve performance.
> 
> I plan to have two Amazon AWS EC2 instances (virtual machines) both
> accessing the same filesystem for read/writes.  Access will be almost
> entirely reads, with the occasional modification, deletion or creation
> of files.  Ideally I wanted all those reads going straight to the
> local XFS filesystem and just the writes incurring a distributed
> performance penalty.  :-)
> 
> So I've set up two VMs with Centos 7.2 and Gluster 3.8.8, each machine
> running as a combined Gluster server and client.  One brick on each
> machine, one volume in a 1 x 2 replica configuration.
> 
> Everything works, it's just the performance penalty which is a surprise.  :-)
> 
> My test directory has 9,066 files and directories; 7,987 actual files.
> Total size is 63MB data, 85MB allocated; an average size of 8KB data
> per file.  The brick's files have a total of 117MB allocated, with the
> extra 32MB working out pretty much to be exactly the sum of the extra
> 4KB extents that would have been allocated for the XFS attributes per
> file - the VMs were installed with the default 256 byte inode size for
> the local filesystem, and from what I've read Gluster will force the
> filesystem to allocate an extent for its attributes.  'xfs_bmap' on a
> few files shows this is the case.
> 
> A simple 'cat' of every file when laid out in 'native' directories on
> the XFS filesystem takes about 3 seconds.  A cat of all the files in
> the brick's directory on the same filesystem takes about 6.4 seconds,
> which I figure is due to the extra I/O for the inode metadata extents
> (although not quite certain; the additional extents added about 40%
> extra to the disk block allocation, so I'm unsure as to why the time
> increase was 100%).
> 
> Doing the same test through the glusterfs mount takes about 25
> seconds; roughly four times longer than reading those same files
> directly from the brick itself.
> 
> It took 30 seconds until I applied the 'md-cache' settings (for those
> variables that still exist in 3.8.8) mentioned in this very helpful
> article:
> 
>   http://blog.gluster.org/category/performance/ 
> 
> 
> So use of the md-cache in a 'cold run' shaved off 5 seconds - due to
> common directory LOOKUP operations being cached I guess.
> 
> Output of a 'volume info' is as follows:
> 
> Volume Name: g1
> Type: Replicate
> Volume ID: bac6cd70-ca0d-4173-9122-644051444fe5
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: serverA:/data/brick1
> Brick2: serverC:/data/brick1
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> cluster.self-heal-daemon: enable
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.md-cache-timeout: 60
> network.inode-lru-limit: 9
> 
> The article suggests a value of 600 for
> features.cache-invalidation-timeout but my Gluster version only
> permits a maximum value of 60.
> 
> Network speed between the two VMs is about 120 MBytes/sec - the two
> VMs inhabit the same Amazon Virtu

Re: [Gluster-users] Quick performance check?

2017-02-03 Thread Gambit15
Hi Alex,
 I don't use Gluster for storing large amounts of small files, however from
what I've read, that does appear to its big achilles heel.
Personally, if you're not looking to scale out to a lot more servers, I'd
go with Ceph or DRBD. Gluster's best features are in its scalability.
Also, it's worth pointing out that in any setup, you've got to be careful
with 2 node configurations as they're highly vulnerable to split-brain
scenarios.

Given the relatively small size of your data, caching tweaks & an arbiter
may well save you here, however I don't use enough of its caching features
to be able to give advice on it.

D

On 3 February 2017 at 08:28, Alex Sudakar  wrote:

> Hi.  I'm looking for a clustered filesystem for a very simple
> scenario.  I've set up Gluster but my tests have shown quite a
> performance penalty when compared to using a local XFS filesystem.
> This no doubt reflects the reality of moving to a proper distributed
> filesystem, but I'd like to quickly check that I haven't missed
> something obvious that might improve performance.
>
> I plan to have two Amazon AWS EC2 instances (virtual machines) both
> accessing the same filesystem for read/writes.  Access will be almost
> entirely reads, with the occasional modification, deletion or creation
> of files.  Ideally I wanted all those reads going straight to the
> local XFS filesystem and just the writes incurring a distributed
> performance penalty.  :-)
>
> So I've set up two VMs with Centos 7.2 and Gluster 3.8.8, each machine
> running as a combined Gluster server and client.  One brick on each
> machine, one volume in a 1 x 2 replica configuration.
>
> Everything works, it's just the performance penalty which is a surprise.
> :-)
>
> My test directory has 9,066 files and directories; 7,987 actual files.
> Total size is 63MB data, 85MB allocated; an average size of 8KB data
> per file.  The brick's files have a total of 117MB allocated, with the
> extra 32MB working out pretty much to be exactly the sum of the extra
> 4KB extents that would have been allocated for the XFS attributes per
> file - the VMs were installed with the default 256 byte inode size for
> the local filesystem, and from what I've read Gluster will force the
> filesystem to allocate an extent for its attributes.  'xfs_bmap' on a
> few files shows this is the case.
>
> A simple 'cat' of every file when laid out in 'native' directories on
> the XFS filesystem takes about 3 seconds.  A cat of all the files in
> the brick's directory on the same filesystem takes about 6.4 seconds,
> which I figure is due to the extra I/O for the inode metadata extents
> (although not quite certain; the additional extents added about 40%
> extra to the disk block allocation, so I'm unsure as to why the time
> increase was 100%).
>
> Doing the same test through the glusterfs mount takes about 25
> seconds; roughly four times longer than reading those same files
> directly from the brick itself.
>
> It took 30 seconds until I applied the 'md-cache' settings (for those
> variables that still exist in 3.8.8) mentioned in this very helpful
> article:
>
>   http://blog.gluster.org/category/performance/
>
> So use of the md-cache in a 'cold run' shaved off 5 seconds - due to
> common directory LOOKUP operations being cached I guess.
>
> Output of a 'volume info' is as follows:
>
> Volume Name: g1
> Type: Replicate
> Volume ID: bac6cd70-ca0d-4173-9122-644051444fe5
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: serverA:/data/brick1
> Brick2: serverC:/data/brick1
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> cluster.self-heal-daemon: enable
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.md-cache-timeout: 60
> network.inode-lru-limit: 9
>
> The article suggests a value of 600 for
> features.cache-invalidation-timeout but my Gluster version only
> permits a maximum value of 60.
>
> Network speed between the two VMs is about 120 MBytes/sec - the two
> VMs inhabit the same Amazon Virtual Private Cloud - so I don't think
> bandwidth is a factor.
>
> The 400% slowdown is no doubt the penalty incurred in moving to a
> proper distributed filesystem.  That article and other web pages I've
> read all say that each open of a file results in synchronous LOOKUP
> operations on all the replicas, so I'm guessing it just takes that
> much time for everything to happen before a file can be opened.
> Gluster profiling shows that there are 11,198 LOOKUP operations on the
> test cat of the 7,987 files.
>
> As a Gluster newbie I'd appreciate some quick advice if possible -
>
> 1.  Is this sort of performance hit - on directories of small files -
> typical for such a simple Gluster configuration?
>
> 2.  Is there anything I can do to speed things up?  :-)
>
> 3.  Repeating the 'cat' test immediately after the first

[Gluster-users] Quick performance check?

2017-02-03 Thread Alex Sudakar
Hi.  I'm looking for a clustered filesystem for a very simple
scenario.  I've set up Gluster but my tests have shown quite a
performance penalty when compared to using a local XFS filesystem.
This no doubt reflects the reality of moving to a proper distributed
filesystem, but I'd like to quickly check that I haven't missed
something obvious that might improve performance.

I plan to have two Amazon AWS EC2 instances (virtual machines) both
accessing the same filesystem for read/writes.  Access will be almost
entirely reads, with the occasional modification, deletion or creation
of files.  Ideally I wanted all those reads going straight to the
local XFS filesystem and just the writes incurring a distributed
performance penalty.  :-)

So I've set up two VMs with Centos 7.2 and Gluster 3.8.8, each machine
running as a combined Gluster server and client.  One brick on each
machine, one volume in a 1 x 2 replica configuration.

Everything works, it's just the performance penalty which is a surprise.  :-)

My test directory has 9,066 files and directories; 7,987 actual files.
Total size is 63MB data, 85MB allocated; an average size of 8KB data
per file.  The brick's files have a total of 117MB allocated, with the
extra 32MB working out pretty much to be exactly the sum of the extra
4KB extents that would have been allocated for the XFS attributes per
file - the VMs were installed with the default 256 byte inode size for
the local filesystem, and from what I've read Gluster will force the
filesystem to allocate an extent for its attributes.  'xfs_bmap' on a
few files shows this is the case.

A simple 'cat' of every file when laid out in 'native' directories on
the XFS filesystem takes about 3 seconds.  A cat of all the files in
the brick's directory on the same filesystem takes about 6.4 seconds,
which I figure is due to the extra I/O for the inode metadata extents
(although not quite certain; the additional extents added about 40%
extra to the disk block allocation, so I'm unsure as to why the time
increase was 100%).

Doing the same test through the glusterfs mount takes about 25
seconds; roughly four times longer than reading those same files
directly from the brick itself.

It took 30 seconds until I applied the 'md-cache' settings (for those
variables that still exist in 3.8.8) mentioned in this very helpful
article:

  http://blog.gluster.org/category/performance/

So use of the md-cache in a 'cold run' shaved off 5 seconds - due to
common directory LOOKUP operations being cached I guess.

Output of a 'volume info' is as follows:

Volume Name: g1
Type: Replicate
Volume ID: bac6cd70-ca0d-4173-9122-644051444fe5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: serverA:/data/brick1
Brick2: serverC:/data/brick1
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.self-heal-daemon: enable
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.md-cache-timeout: 60
network.inode-lru-limit: 9

The article suggests a value of 600 for
features.cache-invalidation-timeout but my Gluster version only
permits a maximum value of 60.

Network speed between the two VMs is about 120 MBytes/sec - the two
VMs inhabit the same Amazon Virtual Private Cloud - so I don't think
bandwidth is a factor.

The 400% slowdown is no doubt the penalty incurred in moving to a
proper distributed filesystem.  That article and other web pages I've
read all say that each open of a file results in synchronous LOOKUP
operations on all the replicas, so I'm guessing it just takes that
much time for everything to happen before a file can be opened.
Gluster profiling shows that there are 11,198 LOOKUP operations on the
test cat of the 7,987 files.

As a Gluster newbie I'd appreciate some quick advice if possible -

1.  Is this sort of performance hit - on directories of small files -
typical for such a simple Gluster configuration?

2.  Is there anything I can do to speed things up?  :-)

3.  Repeating the 'cat' test immediately after the first test run saw
the time dive from 25 seconds down to 4 seconds.  Before I'd set those
md-cache variables it had taken 17 seconds, due, I assume, to the
actual file data being cached in the Linux buffer cache.  So those
md-cache settings really did make a change - taking off another 13
seconds - once everything was cached.

Flushing/invalidating the Linux memory cache made the next test go
back to the 25 seconds.  So it seems to me that the md-cache must hold
its contents in the Linux memory buffers cache ... which surprised me,
because I thought a user-space system like Gluster would have the
cache within the daemons or maybe a shared memory segment, nothing
that would be affected by clearing the Linux buffer cache.  I was
expecting a run after invalidating the linux cache would take
something between 4 seconds and 25 seconds, with the md-cache still

[Gluster-users] initial pool setup -- bidirectional probe required?

2017-02-03 Thread Joseph Lorenzini
All:

According to the docs, when you initially set up a gluster storage pool,
the first two servers need to probe each other. However, after that, you
add additional servers in by probing from a node that's already in the
pool.

However, when I follow the directions with gluster 3.8, the behavior
doesn't seem to match up when I do the initial set up of two nodes. I probe
from server 1 to server 2 but I do not probe from server 2 to server 1. My
expectation would be that either the pool or peer commands would indicate
the server 2 does not  "trust" server 1 but it in fact server 2 just
indicates its successfully connected in a pool and server 1 is a trusted
peer.

In addition, if I do a probe from server 2 to server 1, it does not just
say probe success. Instead it says, "probe successful host already in peer
list".

So here are my questions: is this initial probe each server from the other
dance actually required? And if it is, is a there a way to tell through a
command or looking at a log whether that's occurred or not?

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Storage%20Pools/

Thanks,
Joe
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users