from:"Ben Turner"

Re: [Gluster-users] Finding performance bottlenecks

2018-05-07 Thread Ben Turner

- Original Message -
> From: "Tony Hoyle" 
> To: "Gluster Users" 
> Sent: Tuesday, May 1, 2018 5:38:38 AM
> Subject: Re: [Gluster-users] Finding performance bottlenecks
> 
> On 01/05/2018 02:27, Thing wrote:
> > Hi,
> > 
> > So is the KVM or Vmware as the host(s)?  I basically have the same setup
> > ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking.  I do notice with
> > vmware using NFS disk was pretty slow (40% of a single disk) but this
> > was over 1gb networking which was clearly saturating.  Hence I am moving
> > to KVM to use glusterfs hoping for better performance and bonding, it
> > will be interesting to see which host type runs faster.
> 
> 1gb will always be the bottleneck in that situation - that's going too
> max out at the speed of a single disk or lower.  You need at minimum to
> bond interfaces and preferably go to 10gb to do that.
> 
> Our NFS actually ends up faster than local disk because the read speed
> of the raid is faster than the read speed of the local disk.
> 
> > Which operating system is gluster on?
> 
> Debian Linux.  Supermicro motherboards, 24 core i7 with 128GB of RAM on
> the VM hosts.
> 
> > Did you do iperf between all nodes?
> 
> Yes, around 9.7Gb/s
> 
> It doesn't appear to be raw read speed but iowait.  Under nfs load with
> multiple VMs I get an iowait of around 0.3%.  Under gluster, never less
> than 10% and glusterfsd is often the top of the CPU usage.  This causes
> a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
> esp. Windows ones - one machine I set booting and it was still booting
> 30 minutes later!

Are you properly aligned?  This sounds like the xattr reads / writes used by 
gluster may be eating you IOPs, this is exacerbated when storage is misaligned. 
 I suggest getting on the latest version of oVirt(I have seen this help) and 
evaluate your storage stack.

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/formatting_and_mounting_bricks

pvcreate --dataalign = full stripe(RAID stripe * # of data disks)
vgcreate --extensize = full stripe
lvcreate like normal
mkfs.xfs -f -i size=512 -n size=8192 -d su=,sw= DEVICE

And mount with:

/dev/rhs_vg/rhs_lv/mountpoint  xfs rw,inode64,noatime,nouuid  1 2

I normally used tuned profile rhgs-random-io and the gluster v set group 
virtualization.

HTH

-b

> 
> Tony
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Finding performance bottlenecks

2018-05-07 Thread Ben Turner

- Original Message -
> From: "Darrell Budic" 
> To: "Vincent Royer" , t...@hoyle.me.uk
> Cc: gluster-users@gluster.org
> Sent: Thursday, May 3, 2018 5:24:53 PM
> Subject: Re: [Gluster-users] Finding performance bottlenecks
> 
> Tony’s performance sounds significantly sub par from my experience. I did
> some testing with gluster 3.12 and Ovirt 3.9, on my running production
> cluster when I enabled the glfsapi, even my pre numbers are significantly
> better than what Tony is reporting:
> 
> ———
> Before using gfapi:
> 
> ]# dd if=/dev/urandom of=test.file bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 90.1843 s, 11.9 MB/s
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=test.file of=/dev/null
> 2097152+0 records in
> 2097152+0 records out
> 1073741824 bytes (1.1 GB) copied, 3.94715 s, 272 MB/s

This is no where near what I would expect.  With VMs I am able to saturate a 
10G interface if I run enough IOs from enough VMs and use LVM striping(8 files 
/ PVs) inside the VMs.  So thats 1200 MB of aggregate throughput and each VM 
will do 200-300+ MB / sec writes, 300-400+ reads.

I have seen this issue before though, once it was resolved by an upgrade of 
oVIRT another time I fixed the alignment of the RAID / LVM / XFS stack.  There 
is one instance I haven't yet figured out yet :/  I want to build on a fresh HW 
stack.  Make sure you have everything aligned in the storage stack, writeback 
cache on the RAID controller, jumbo frames, the gluster VM group set, and a 
random IO tuned profile.  If you want to tinker with LVM striping inside the VM 
I have had success with that as well.

Also note:

Using urandom will significantly lower perf, it is dependent on how fast your 
CPU can create random data.  Try /dev/zero or FIO / IOzone / smallfile - 
https://github.com/bengland2/smallfile, that will eliminate CPU as a bottleneck.

Also remember VMs are a heavy random IO workload, you need IOPs on your disks 
to see good perf.  Also, since gluster doesn't have a MD server those IOs are 
moved to xattrs on teh files themselves.  This is a bit of a double edged sword 
as these take IOPs as well and if the backend is not properly aligned this can 
double or triple the IOPs overhead on these small reads and writes that gluster 
uses to in place of a MD server.

HTH

-b

> 
> # hdparm -tT /dev/vda
> 
> /dev/vda:
> Timing cached reads: 17322 MB in 2.00 seconds = 8673.49 MB/sec
> Timing buffered disk reads: 996 MB in 3.00 seconds = 331.97 MB/sec
> 
> # bonnie++ -d . -s 8G -n 0 -m pre-glapi -f -b -u root
> 
> Version 1.97 --Sequential Output-- --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> pre-glapi 8G 196245 30 105331 15 962775 49 1638 34
> Latency 1578ms 1383ms 201ms 301ms
> 
> Version 1.97 --Sequential Output-- --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> pre-glapi 8G 155937 27 102899 14 1030285 54 1763 45
> Latency 694ms 1333ms 114ms 229ms
> 
> (note, sequential reads seem to have been influenced by caching somewhere…)
> 
> After switching to gfapi:
> 
> # dd if=/dev/urandom of=test.file bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 80.8317 s, 13.3 MB/s
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=test.file of=/dev/null
> 2097152+0 records in
> 2097152+0 records out
> 1073741824 bytes (1.1 GB) copied, 3.3473 s, 321 MB/s
> 
> # hdparm -tT /dev/vda
> 
> /dev/vda:
> Timing cached reads: 17112 MB in 2.00 seconds = 8568.86 MB/sec
> Timing buffered disk reads: 1406 MB in 3.01 seconds = 467.70 MB/sec
> 
> #bonnie++ -d . -s 8G -n 0 -m glapi -f -b -u root
> 
> Version 1.97 --Sequential Output-- --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> glapi 8G 359100 59 185289 24 489575 31 2079 67
> Latency 160ms 355ms 36041us 185ms
> 
> Version 1.97 --Sequential Output-- --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> glapi 8G 341307 57 180546 24 472572 35 2655 61
> Latency 153ms 394ms 101ms 116ms
> 
> So excellent improvement in write throughput, but the significant improvement
> in latency is what was most noticed by users. Anecdotal reports of 2x+
> performance improvements, with one remarking that it’s like having dedicated
> disks :)
> 
> This system is on my production cluster, so it’s not getting exclusive disk
> access, but this VM is not doing anything else itself. The cluster is 3 xeon
> E5-2609 v3 @

Re: [Gluster-users] arbiter node on client?

2018-05-07 Thread Ben Turner

One thing to remember with arbiters is that they need IOPs, not capacity as 
much.  With a VM use case this is less impactful, but workloads with lots of 
smallfiles can become heavily bottlenecked at the arbiter.  Arbiters only save 
metadata, not data, but metadata needs lots of small reads and writes.  I have 
seen many instances where the the arbiter had considerably less IOPs than the 
other bricks and it lead to perf issues.  With VMs you don't have thousands of 
files so its prolly not a big deal, but in more general purpose workloads its 
important to remember this.

HTH!

-b

- Original Message -
> From: "Dave Sherohman" 
> To: gluster-users@gluster.org
> Sent: Monday, May 7, 2018 7:21:49 AM
> Subject: Re: [Gluster-users] arbiter node on client?
> 
> On Sun, May 06, 2018 at 11:15:32AM +, Gandalf Corvotempesta wrote:
> > is possible to add an arbiter node on the client?
> 
> I've been running in that configuration for a couple months now with no
> problems.  I have 6 data + 3 arbiter bricks hosting VM disk images and
> all three of my arbiter bricks are on one of the kvm hosts.
> 
> > Can I use multiple arbiter for the same volume ? In example, one arbiter on
> > each client.
> 
> I'm pretty sure that you can only have one arbiter per subvolume, and
> I'm not even sure what the point of multiple arbiters over the same data
> would be.
> 
> In my case, I have three subvolumes (three replica pairs), which means I
> need three arbiters and those could be spread across multiple nodes, of
> course, but I don't think saying "I want 12 arbiters instead of 3!"
> would be supported.
> 
> --
> Dave Sherohman
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Fwd: gluster performance

2018-02-16 Thread Ben Turner

I am forwarding this for Ryan, @Ryan - did you join the gluster users mailing 
list yet?  That may be why you are having issues sending messages.

- Forwarded Message -
From: "Ryan Wilkinson" 
To: btur...@redhat.com
Sent: Wednesday, February 14, 2018 4:46:10 PM
Subject: gluster performance

I have a 3 host gluster replicated cluster that is providing storage for our 
RHEV environment.  We've been having issues with inconsistent performance from 
the VMs depending on which Hypervisor they are running on.  I've confirmed 
throughput to be ~9Gb/s to each of the storage hosts from the hypervisors.  I'm 
getting ~300MB/s disk read spead when our test vm is on the slow Hypervisors 
and over 500 on the faster ones.  The performance doesn't seem to be affected 
much by the cpu, memory that are in the hypervisors.  I have tried a couple of 
really old boxes and got over 500 MB/s.  The common thread seems to be that the 
poorly perfoming hosts all have Dell's Idrac 7 Enterprise.  I have one 
Hypervisor that has Idrac 7 express and it performs well.  We've compared 
system packages and versions til we're blue in the face and have been 
struggling with this for a couple months but that seems to be the only common 
denominator.  I've tried on one of those Idrac 7 hosts to disable the nic, 
virtual drive, etc, etc. but no change in performance.
Ryan Wilkinson, Sales Engineer
[cid:image002.jpg@01CFC38F.41FEC8F0]
2312 West 700 South #2 | Springville, UT 84663
CelL: 801-358-2816 | AZ Tel: 480-535-6686  x300 | UT Tel: 
385-325-0010 |Fax: 801-326-6051
r...@centriserve.net

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

2017-12-20 Thread Ben Turner

Here is the process for resolving split brain on replica 2:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html

It should be pretty much the same for replica 3, you change the xattrs with 
something like:

# setfattr -n trusted.afr.vol-client-0 -v 0x0001 
/gfs/brick-b/a

When I try to decide which copy to use I normally run things like:

# stat //pat/to/file

Check out the access and change times of the file on the back end bricks.  I 
normally pick the copy with the latest access / change times.  I'll also check:

# md5sum //pat/to/file

Compare the hashes of the file on both bricks to see if the data actually 
differs.  If the data is the same it makes choosing the proper replica easier.

Any idea how you got in this situation?  Did you have a loss of NW 
connectivity?  I see you are using server side quorum, maybe check the logs for 
any loss of quorum?  I wonder if there was a loos of quorum and there was some 
sort of race condition hit:

http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls

"Unlike in client-quorum where the volume becomes read-only when quorum is 
lost, loss of server-quorum in a particular node makes glusterd kill the brick 
processes on that node (for the participating volumes) making even reads 
impossible."

I wonder if the killing of brick processes could have led to some sort of race 
condition where writes were serviced on one brick / the arbiter and not the 
other?

If you can find a reproducer for this please open a BZ with it, I have been 
seeing something similar(I think) but I haven't been able to run the issue down 
yet.

-b

- Original Message -
> From: "Henrik Juul Pedersen" 
> To: gluster-users@gluster.org
> Cc: "Henrik Juul Pedersen" 
> Sent: Wednesday, December 20, 2017 1:26:37 PM
> Subject: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain.
> gluster cli seems unaware
> 
> Hi,
> 
> I have the following volume:
> 
> Volume Name: virt_images
> Type: Replicate
> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
> Status: Started
> Snapshot Count: 2
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: virt3:/data/virt_images/brick
> Brick2: virt2:/data/virt_images/brick
> Brick3: printserver:/data/virt_images/brick (arbiter)
> Options Reconfigured:
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> features.barrier: disable
> features.scrub: Active
> features.bitrot: on
> nfs.rpc-auth-allow: on
> server.allow-insecure: on
> user.cifs: off
> features.shard: off
> cluster.shd-wait-qlength: 1
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: enable
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> nfs.disable: on
> transport.address-family: inet
> server.outstanding-rpc-limit: 512
> 
> After a server reboot (brick 1) a single file has become unavailable:
> # touch fedora27.qcow2
> touch: setting times of 'fedora27.qcow2': Input/output error
> 
> Looking at the split brain status from the client side cli:
> # getfattr -n replica.split-brain-status fedora27.qcow2
> # file: fedora27.qcow2
> replica.split-brain-status="The file is not under data or metadata
> split-brain"
> 
> However, in the client side log, a split brain is mentioned:
> [2017-12-20 18:05:23.570762] E [MSGID: 108008]
> [afr-transaction.c:2629:afr_write_txn_refresh_done]
> 0-virt_images-replicate-0: Failing SETATTR on gfid
> 7a36937d-52fc-4b55-a932-99e2328f02ba: split-brain observed.
> [Input/output error]
> [2017-12-20 18:05:23.576046] W [MSGID: 108027]
> [afr-common.c:2733:afr_discover_done] 0-virt_images-replicate-0: no
> read subvols for /fedora27.qcow2
> [2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_setattr_cbk]
> 0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1 (Input/output
> error)
> 
> = Server side
> 
> No mention of a possible split brain:
> # gluster volume heal virt_images info split-brain
> Brick virt3:/data/virt_images/brick
> Status: Connected
> Number of entries in split-brain: 0
> 
> Brick virt2:/data/virt_images/brick
> Status: Connected
> Number of entries in split-brain: 0
> 
> Brick printserver:/data/virt_images/brick
> Status: Connected
> Number of entries in split-brain: 0
> 
> The info command shows the file:
> ]# gluster volume heal virt_images info
> Brick virt3:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
> 
> Brick virt2:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
> 
> Brick printserver:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
> 
> 
> The heal and

Re: [Gluster-users] How to make sure self-heal backlog is empty ?

2017-12-20 Thread Ben Turner

You can try kicking off a client side heal by running:

ls -laR /your-gluster-mount/*

Sometimes when I see just the GFID instead of the file name I have found that 
if I stat the file the name shows up in heal info.

Before running that make sure that you don't have any split brain files:

gluster v heal your-vol info split-brain

If you do have split brain files follow:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html

HTH!

-b

- Original Message -
> From: "Hoggins!" 
> To: "gluster-users" 
> Sent: Tuesday, December 19, 2017 1:26:08 PM
> Subject: [Gluster-users] How to make sure self-heal backlog is empty ?
> 
> Hello list,
> 
> I'm not sure what to look for here, not sure if what I'm seeing is the
> actual "backlog" (that we need to make sure is empty while performing a
> rolling upgrade before going to the next node), how can I tell, while
> reading this, if it's okay to reboot / upgrade my next node in the pool ?
> Here is what I do for checking :
> 
> for i in `gluster volume list`; do gluster volume heal $i info; done
> 
> And here is what I get :
> 
> Brick ngluster-1.network.hoggins.fr:/export/brick/clem
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-2.network.hoggins.fr:/export/brick/clem
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-3.network.hoggins.fr:/export/brick/clem
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-1.network.hoggins.fr:/export/brick/mailer
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-2.network.hoggins.fr:/export/brick/mailer
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-3.network.hoggins.fr:/export/brick/mailer
> 
> Status: Connected
> Number of entries: 1
> 
> Brick ngluster-1.network.hoggins.fr:/export/brick/rom
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-2.network.hoggins.fr:/export/brick/rom
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-3.network.hoggins.fr:/export/brick/rom
> 
> Status: Connected
> Number of entries: 1
> 
> Brick ngluster-1.network.hoggins.fr:/export/brick/thedude
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-2.network.hoggins.fr:/export/brick/thedude
> 
> Status: Connected
> Number of entries: 1
> 
> Brick ngluster-3.network.hoggins.fr:/export/brick/thedude
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-1.network.hoggins.fr:/export/brick/web
> Status: Connected
> Number of entries: 0
> 
> Brick ngluster-2.network.hoggins.fr:/export/brick/web
> 
> 
> 
> Status: Connected
> Number of entries: 3
> 
> Brick ngluster-3.network.hoggins.fr:/export/brick/web
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Status: Connected
> Number of entries: 11
> 
> 
> Should I be worrying with this never ending ?
> 
>     Thank you,
> 
>         Hoggins!
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Testing sharding on tiered volume

2017-12-17 Thread Ben Turner

- Original Message -
> From: "Viktor Nosov" 
> To: gluster-users@gluster.org
> Cc: vno...@stonefly.com
> Sent: Friday, December 8, 2017 5:45:25 PM
> Subject: [Gluster-users] Testing sharding on tiered volume
> 
> Hi,
> 
> I'm looking to use sharding on tiered volume. This is very attractive
> feature that could benefit tiered volume to let it handle larger files
> without hitting the "out of (hot)space problem".
> I decided to set  test configuration on GlusterFS 3.12.3  when tiered volume
> has 2TB cold and 1GB hot segments. Shard size is set to be 16MB.
> For testing 100GB files are used. It seems writes and reads are going well.
> But I hit problem trying to delete files from the volume. One of GlusterFS
> processes hit segmentation fault.
> The problem is reproducible each time.  It was submitted to Red Hat Bugzilla
> bug list and has ID 1521119.
> You can find details at the attachments to the bug.
> 
> I'm wondering are there other users who are interested to apply sharding to
> tiered volumes and are experienced similar problems?
>  How  this problem can be resolved or could it be avoided?

This isn't a config I have tried before, from the BZ it mentions:

-The VOL is shared out over SMB to a windows client
-You have a 1GB hot tier, 2099GB cold tier
-You have features.shard-block-size: 16MB and cluster.tier-demote-frequency: 150

What are you using for the hot tier that has only 1GB, some sort of RAM disk or 
battery back flash or something?  

With that small of a hot tier you may run into some strange performance 
characteristics, AFAIK the current tiering implementation uses rebalance to 
move files between tiers when the tier demote freq times out.  You may end up 
spending alot of time waiting for your hot files to rebalance to the cold tier 
since its out of space, you will also probably have other files being written 
to the cold tier with the hot tier full, further using up your IOPs.  

I don't know how tiering would treat sharded files, would it only promote the 
shards of the file that are in use or would it try to put the whole file / all 
the shards on the hot tier?  

If you get a free min update me on what you are trying todo, happy to help 
however I can.

-b


> 
> Best regards,
> 
>  Viktor Nosov
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How large the Arbiter node?

2017-12-17 Thread Ben Turner

- Original Message -
> From: "Martin Toth" 
> To: "Nux!" 
> Cc: "gluster-users" , "Gluster Devel" 
> 
> Sent: Monday, December 11, 2017 11:58:39 AM
> Subject: Re: [Gluster-users] How large the Arbiter node?
> 
> Hi,
> 
> there is good suggestion here :
> http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#arbiter-bricks-sizing
> Since the arbiter brick does not store file data, its disk usage will be
> considerably less than the other bricks of the replica. The sizing of the
> brick will depend on how many files you plan to store in the volume. A good
> estimate will be 4kb times the number of files in the replica.

You can see the explanation from the DEV here:

http://lists.gluster.org/pipermail/gluster-users/2016-March/025732.html

The 4k number was derived by adding 100 xattrs to a file:

[root at ravi4 brick]# touch file

[root at ravi4 brick]# ls -l file
-rw-r--r-- 1 root root 0 Mar  8 12:54 file

[root at ravi4 brick]# du file
*0   file**
*
[root at ravi4 brick]# for i in {1..100}
 > do
 > setfattr -n user.value$i -v value$i file
 > done

[root at ravi4 brick]# ll -l file
-rw-r--r-- 1 root root 0 Mar  8 12:54 file

[root at ravi4 brick]# du -h file
*4.0Kfile**

The 4k number may be a little conservative but in all of the clusters I help 
architect I follow that rule.

HTH!

-b


> 
> BR,
> 
> Martin
> 
> 
> 
> 
> On 11 Dec 2017, at 17:43, Nux! < n...@li.nux.ro > wrote:
> 
> Hi,
> 
> I see gluster now recommends the use of an arbiter brick in "replica 2"
> situations.
> How large should this brick be? I understand only metadata is to be stored.
> Let's say total storage usage will be 5TB of mixed size files. How large
> should such a brick be?
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Production Volume will not start

2017-12-17 Thread Ben Turner

- Original Message -
> From: "Matt Waymack" 
> To: "gluster-users" 
> Sent: Friday, December 15, 2017 2:15:48 PM
> Subject: [Gluster-users] Production Volume will not start
> 
> 
> 
> Hi all,
> 
> 
> 
> I have an issue where our volume will not start from any node. When
> attempting to start the volume it will eventually return:
> 
> Error: Request timed out
> 
> 
> 
> For some time after that, the volume is locked and we either have to wait or
> restart Gluster services. In the gluserd.log, it shows the following:
> 
> 
> 
> [2017-12-15 18:00:12.423478] I [glusterd-utils.c:5926:glusterd_brick_start]
> 0-management: starting a fresh brick process for brick /exp/b1/gv0
> 
> [2017-12-15 18:03:12.673885] I
> [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In
> gd_mgmt_v3_unlock_timer_cbk
> 
> [2017-12-15 18:06:34.304868] I [MSGID: 106499]
> [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume gv0
> 
> [2017-12-15 18:06:34.306603] E [MSGID: 106301]
> [glusterd-syncop.c:1353:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Status' failed on localhost : Volume gv0 is not started
> 
> [2017-12-15 18:11:39.412700] I [glusterd-utils.c:5926:glusterd_brick_start]
> 0-management: starting a fresh brick process for brick /exp/b2/gv0
> 
> [2017-12-15 18:11:42.405966] I [MSGID: 106143]
> [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b2/gv0 on
> port 49153
> 
> [2017-12-15 18:11:42.406415] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> 
> [2017-12-15 18:11:42.406669] I [glusterd-utils.c:5926:glusterd_brick_start]
> 0-management: starting a fresh brick process for brick /exp/b3/gv0
> 
> [2017-12-15 18:14:39.737192] I
> [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In
> gd_mgmt_v3_unlock_timer_cbk
> 
> [2017-12-15 18:35:20.856849] I [MSGID: 106143]
> [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b1/gv0 on
> port 49152
> 
> [2017-12-15 18:35:20.857508] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> 
> [2017-12-15 18:35:20.858277] I [glusterd-utils.c:5926:glusterd_brick_start]
> 0-management: starting a fresh brick process for brick /exp/b4/gv0
> 
> [2017-12-15 18:46:07.953995] I [MSGID: 106143]
> [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b3/gv0 on
> port 49154
> 
> [2017-12-15 18:46:07.954432] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> 
> [2017-12-15 18:46:07.971355] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-snapd: setting frame-timeout to 600
> 
> [2017-12-15 18:46:07.989392] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-nfs: setting frame-timeout to 600
> 
> [2017-12-15 18:46:07.989543] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
> stopped
> 
> [2017-12-15 18:46:07.989562] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is
> stopped
> 
> [2017-12-15 18:46:07.989575] I [MSGID: 106600]
> [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so
> xlator is not installed
> 
> [2017-12-15 18:46:07.989601] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-glustershd: setting frame-timeout to 600
> 
> [2017-12-15 18:46:08.003011] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd
> already stopped
> 
> [2017-12-15 18:46:08.003039] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service
> is stopped
> 
> [2017-12-15 18:46:08.003079] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting
> glustershd service
> 
> [2017-12-15 18:46:09.005173] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-quotad: setting frame-timeout to 600
> 
> [2017-12-15 18:46:09.005569] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-bitd: setting frame-timeout to 600
> 
> [2017-12-15 18:46:09.005673] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already
> stopped
> 
> [2017-12-15 18:46:09.005689] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is
> stopped
> 
> [2017-12-15 18:46:09.005712] I [rpc-clnt.c:1044:rpc_clnt_connection_init]
> 0-scrub: setting frame-timeout to 600
> 
> [2017-12-15 18:46:09.005892] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
> stopped
> 
> [2017-12-15 18:46:09.005912] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is
> stopped
> 
> [2017-12-15 18:46:09.026559] I [socket.c:3672:socket_submit_reply]
> 0-socket.management: not connected (priv->connected = -1)
> 
> [2017-12-15 18:46:09.026568] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message

Re: [Gluster-users] What is the difference between FORGET and UNLINK fops

2017-11-13 Thread Ben Turner

Here is some of the ones we collected:

https://github.com/bengland2/gluster-profile-analysis

It doesn't have some of the ones you are looking for but if you get definitions 
of what they are I would like to add them to what we have in the profile 
analysis tool's README.

HTH!

-b

- Original Message -
> From: "Jeevan Patnaik" 
> To: gluster-users@gluster.org
> Sent: Monday, November 13, 2017 11:22:26 AM
> Subject: Re: [Gluster-users] What is the difference between FORGET and
> UNLINK fops
> 
> Filtering the brick logs in TRACE mode with rpcsvc.c does show the FOPS.
> 
> From this, I've realized that LOOKUP is actually dns lookup. This actually
> differs from NFS lookup operation. Please correct me if I'm wrong.
> 
> Regards,
> Jeevan.
> 
> On Nov 13, 2017 9:40 PM, "Jeevan Patnaik" < g1patn...@gmail.com > wrote:
> 
> 
> 
> Hi,
> 
> Can I get a brief description of all the FOPS in gluster or the location of
> the source code file so that I will try to get an understanding myself?
> 
> Few FOPS I'm not clear like FORGET, UNLINK, FLUSH, LOOKUP
> 
> 
> Or is there a way I can tunnel through the FOPS that that are happening in
> the background for each operation? I have tried this to find from a brick
> logfile in TRACE mode, but there are way too many calls, but are in form of
> some system calls but not FOPS.
> 
> Regards,
> Jeevan.
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs segmentation fault in rdma mode

2017-11-04 Thread Ben Turner

This looks like there could be some some problem requesting / leaking / 
whatever memory but without looking at the core its tought to tell for sure.   
Note:

/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7f95bc54e618]

Can you open up a bugzilla and get us the core file to review?

-b

- Original Message -
> From: "自由人" <21291...@qq.com>
> To: "gluster-users" 
> Sent: Saturday, November 4, 2017 5:27:50 AM
> Subject: [Gluster-users] glusterfs segmentation fault in rdma mode
> 
> 
> 
> Hi, All,
> 
> 
> 
> 
> I used Infiniband to connect all GlusterFS nodes and the clients. Previously
> I run IP over IB and everything was OK. Now I used rdma transport mode
> instead. And then I ran the traffic. After I while, the glusterfs process
> exited because of segmentation fault.
> 
> 
> 
> 
> Here were the messages when I saw segmentation fault:
> 
> pending frames:
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(1) op(WRITE)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> patchset: git:// git.gluster.org/glusterfs.git
> 
> signal received: 11
> 
> time of crash:
> 
> 2017-11-01 11:11:23
> 
> configuration details:
> 
> argp 1
> 
> backtrace 1
> 
> dlfcn 1
> 
> libpthread 1
> 
> llistxattr 1
> 
> setfsid 1
> 
> spinlock 1
> 
> epoll.h 1
> 
> xattr.h 1
> 
> st_atim.tv_nsec 1
> 
> package-string: glusterfs 3.11.0
> 
> /usr/lib64/ libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7f95bc54e618 ]
> 
> /usr/lib64/ libglusterfs.so.0(gf_print_trace+0x324)[0x7f95bc557834 ]
> 
> /lib64/ libc.so.6(+0x32510)[0x7f95bace2510 ]
> 
> The client OS was CentOS 7.3. The server OS was CentOS 6.5. The GlusterFS
> version was 3.11.0 both in clients and servers. The Infiniband card was
> Mellanox. The Mellanox IB driver version was v4.1-1.0.2 (27 Jun 2017) both
> in clients and servers.
> 
> 
> Is rdma code stable for GlusterFS? Need I upgrade the IB driver or apply a
> patch?
> 
> Thanks!
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] BoF - Gluster for VM store use case

2017-10-31 Thread Ben Turner

- Original Message -
> From: "Sahina Bose" 
> To: gluster-users@gluster.org
> Cc: "Gluster Devel" 
> Sent: Tuesday, October 31, 2017 11:46:57 AM
> Subject: [Gluster-users] BoF - Gluster for VM store use case
> 
> During Gluster Summit, we discussed gluster volumes as storage for VM images
> - feedback on the usecase and upcoming features that may benefit this
> usecase.
> 
> Some of the points discussed
> 
> * Need to ensure there are no issues when expanding a gluster volume when
> sharding is turned on.
> * Throttling feature for self-heal, rebalance process could be useful for
> this usecase
> * Erasure coded volumes with sharding - seen as a good fit for VM disk
> storage

I am working on this with a customer, we have been able to do 400-500 MB / sec 
writes!  Normally things max out at ~150-250.  The trick is to use multiple 
files, create the lvm stack and use native LVM striping.  We have found that 
4-6 files seems to give the best perf on our setup.  I don't think we are using 
sharding on the EC vols, just multiple files and LVM striping.  Sharding may be 
able to avoid the LVM striping, but I bet dollars to doughnuts you won't see 
this level of perf :)  I am working on a blog post for RHHI and RHEV + RHS 
performance where I am able to in some cases get 2x+ the performance out of VMs 
/ VM storage.  I'd be happy to share my data / findings.

> * Performance related
> ** accessing qemu images using gfapi driver does not perform as well as fuse
> access. Need to understand why.

+1 I have some ideas here that I have came up with in my research.  Happy to 
share these as well.

> ** Using zfs with cache or lvmcache for xfs filesystem is seen to improve
> performance

I have done some interesting stuff with customers here too, nothing with VMs 
iirc it was more for backing up bricks without geo-rep(was too slow for them).

-b

> 
> If you have any further inputs on this topic, please add to thread.
> 
> thanks!
> sahina
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] small files performance

2017-10-15 Thread Ben Turner

I get well over 2k IOPs in my OLD 12 disk RAID 6 HW in the lab(4 nodes 2x2 
volume):

https://access.redhat.com/sites/default/files/attachments/rhgs3_1_perfbrief_portalv1.pdf

That data is from 3.1, things have improved much since then(I think closer to 
3.2k IOPs on the same HW?).  I have a total of 48 disks though(20 data, 8 
parity, 20 redundancy) I'm not sure what you have.  I can extract a kernel in 
between 4 minutes and 1 min 30 secs depending on tunibles and if I use multi 
threaded TAR tools developed by Ben England.  If you don't have access to the 
RH paywall you will just have to trust me since the perf brief requires a sub.  
The key to getting smallfile perf out of gluster is to use multiple threads and 
multiple clients.

What is your back end like?

-b

- Original Message -
> From: "Gandalf Corvotempesta" 
> To: "Szymon Miotk" , "gluster-users" 
> 
> Sent: Friday, October 13, 2017 3:56:14 AM
> Subject: Re: [Gluster-users] small files performance
> 
> Where did you read 2k IOPS?
> 
> Each disk is able to do about 75iops as I'm using SATA disk, getting even
> closer to 2000 it's impossible
> 
> Il 13 ott 2017 9:42 AM, "Szymon Miotk" < szymon.mi...@gmail.com > ha scritto:
> 
> 
> Depends what you need.
> 2K iops for small file writes is not a bad result.
> In my case I had a system that was just poorly written and it was
> using 300-1000 iops for constant operations and was choking on
> cleanup.
> 
> 
> On Thu, Oct 12, 2017 at 6:23 PM, Gandalf Corvotempesta
> < gandalf.corvotempe...@gmail.com > wrote:
> > So, even with latest version, gluster is still unusable with small files ?
> > 
> > 2017-10-12 10:51 GMT+02:00 Szymon Miotk < szymon.mi...@gmail.com >:
> >> I've analyzed small files performance few months ago, because I had
> >> huge performance problems with small files writes on Gluster.
> >> The read performance has been improved in many ways in recent releases
> >> (md-cache, parallel-readdir, hot-tier).
> >> But write performance is more or less the same and you cannot go above
> >> 10K smallfiles create - even with SSD or Optane drives.
> >> Even ramdisk is not helping much here, because the bottleneck is not
> >> in the storage performance.
> >> Key problems I've noticed:
> >> - LOOKUPs are expensive, because there is separate query for every
> >> depth level of destination directory (md-cache helps here a bit,
> >> unless you are creating lot of directories). So the deeper the
> >> directory structure, the worse.
> >> - for every file created, Gluster creates another file in .glusterfs
> >> directory, doubling the required IO and network latency. What's worse,
> >> XFS, the recommended filesystem, doesn't like flat directory sturcture
> >> with thousands files in each directory. But that's exactly how Gluster
> >> stores its metadata in .glusterfs, so the performance decreases by
> >> 40-50% after 10M files.
> >> - complete directory structure is created on each of the bricks. So
> >> every mkdir results in io on every brick you have in the volume.
> >> - hot-tier may be great for improving reads, but for small files
> >> writes it actually kills performance even more.
> >> - FUSE driver requires context switch between userspace and kernel
> >> each time you create a file, so with small files the context switches
> >> are also taking their toll
> >> 
> >> The best results I got were:
> >> - create big file on Gluster, mount it as XFS over loopback interface
> >> - 13.5K smallfile writes. Drawback - you can use it only on one
> >> server, as XFS will crash when two servers will write to it.
> >> - use libgfapi - 20K smallfile writes performance. Drawback - no nice
> >> POSIX filesystem, huge CPU usage on Gluster server.
> >> 
> >> I was testing with 1KB files, so really small.
> >> 
> >> Best regards,
> >> Szymon Miotk
> >> 
> >> On Fri, Oct 6, 2017 at 4:43 PM, Gandalf Corvotempesta
> >> < gandalf.corvotempe...@gmail.com > wrote:
> >>> Any update about this?
> >>> I've seen some works about optimizing performance for small files, is
> >>> now gluster "usable" for storing, in example, Maildirs or git sources
> >>> ?
> >>> 
> >>> at least in 3.7 (or 3.8, I don't remember exactly), extracting kernel
> >>> sources took about 4-5 minutes.
> >>> ___
> >>> Gluster-users mailing list
> >>> Gluster-users@gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] sparse files on EC volume

2017-09-27 Thread Ben Turner

Have you done any testing with replica 2/3?  IIRC my replica 2/3 tests out 
performed EC on smallfile workloads, it may be worth looking into if you can't 
get EC up to where you need it to be.

-b

- Original Message -
> From: "Dmitri Chebotarov" <4dim...@gmail.com>
> Cc: "gluster-users" 
> Sent: Tuesday, September 26, 2017 9:57:55 AM
> Subject: Re: [Gluster-users] sparse files on EC volume
> 
> Hi Xavi
> 
> At this time I'm using 'plain' bricks with XFS. I'll be moving to LVM cached
> bricks.
> There is no RAID for data bricks, but I'll be using hardware RAID10 for SSD
> cache disks (I can use 'writeback' cache in this case).
> 
> 'small file performance' is the main reason I'm looking at different options,
> i.e. using formated sparse files.
> I spent considerable amount of time tuning 10GB/kernel/gluster to reduce
> latency - the small file performance improved ~50% but it's still no good
> enough, especially when I need to use Gluster for /home folders.
> 
> I understand limitations and single point of failure in case with sparse
> files. I'm considering different options to provide HA (pacemaker/corosync,
> keepalived or using VMs - RHEV - to deliver storage).
> 
> Thank you for your reply.
> 
> 
> On Tue, Sep 26, 2017 at 3:55 AM, Xavi Hernandez < jaher...@redhat.com >
> wrote:
> 
> 
> Hi Dmitri,
> 
> On 22/09/17 17:07, Dmitri Chebotarov wrote:
> 
> 
> 
> Hello
> 
> I'm running some tests to compare performance between Gluster FUSE mount and
> formated sparse files (located on the same Gluster FUSE mount).
> 
> The Gluster volume is EC (same for both tests).
> 
> I'm seeing HUGE difference and trying to figure out why.
> 
> Could you explain what hardware configuration are you using ?
> 
> Do you have a plain disk for each brick formatted in XFS, or do you have some
> RAID configuration ?
> 
> 
> 
> 
> Here is an example:
> 
> GlusterFUSE mount:
> 
> # cd /mnt/glusterfs
> # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB) copied, 9.74757 s, *110 MB/s*
> 
> Sparse file (located on GlusterFUSE mount):
> 
> # truncate -l 100GB /mnt/glusterfs/xfs-100G.img
> # mkfs.xfs /mnt/glusterfs/xfs-100G.img
> # mount -o loop /mnt/glusterfs/xfs-100G.img /mnt/xfs-100G
> # cd /mnt/xfs-100G
> # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB) copied, 1.20576 s, *891 MB/s*
> 
> The same goes for working with small files (i.e. code file, make, etc) with
> the same data located on FUSE mount vs formated sparse file on the same FUSE
> mount.
> 
> What would explain such difference?
> 
> First of all, doing tests with relatively small files tends to be misleading
> because of caching capacity of the operating system (to minimize that, you
> can add 'conv=fsync' option to dd). You should do tests with file sizes
> bigger than the amount of physical memory on servers. This way you minimize
> cache effects and see the real sustained performance.
> 
> A second important point to note is that gluster is a distributed file system
> that can be accessed simultaneously by more than one client. This means that
> consistency must be assured in all cases, which makes things go to bricks
> sooner than local filesystems normally do.
> 
> In your case, all data saved to the fuse volume will most probably be present
> on bricks once the dd command completes. On the other side, the test through
> the formatted sparse file, most probably, is keeping most of the data in the
> cache of the client machine.
> 
> Note that using the formatted sparse file makes it possible a better use of
> local cache, improving (relatively) small file access, but on the other
> side, this filesystem can only be used from a single client (single mount).
> If this client fails for some reason, you will loose access to your data.
> 
> 
> 
> 
> How does Gluster work with sparse files in general? I may move some of the
> data on gluster volumes to formated sparse files..
> 
> Gluster works fine with sparse files. However you should consider the
> previous points before choosing the formatted sparse files option. I guess
> that the sustained throughput will be very similar for bigger files.
> 
> Regards,
> 
> Xavi
> 
> 
> 
> 
> Thank you.
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Confusing lstat() performance

2017-09-20 Thread Ben Turner

- Original Message -
> From: "Niklas Hambüchen" 
> To: "Sam McLeod" 
> Cc: gluster-users@gluster.org
> Sent: Sunday, September 17, 2017 7:34:15 PM
> Subject: Re: [Gluster-users] Confusing lstat() performance
> 
> I found the reason now, at least for this set of lstat()s I was looking at.
> 
> bup first does all getdents(), obtaining all file names in the
> directory, and then stat()s them.

+1 I was thinking of a case just like this.

-b

> 
> Apparently this destroys some of gluster's caching, making stat()s ~100x
> slower.
> 
> What caching could this be, and how could I convince gluster to serve
> these stat()s as fast as if a getdents() had been done just before them?
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Confusing lstat() performance

2017-09-19 Thread Ben Turner

- Original Message -
> From: "Niklas Hambüchen" <m...@nh2.me>
> To: "Ben Turner" <btur...@redhat.com>
> Cc: "Gluster Users" <gluster-users@gluster.org>
> Sent: Monday, September 18, 2017 11:27:33 AM
> Subject: Re: [Gluster-users] Confusing lstat() performance
> 
> On 18/09/17 17:23, Ben Turner wrote:
> > Do you want tuned or untuned?  If tuned I'd like to try one of my tunings
> > for metadata, but I will use yours if you want.
> 
> (Re-CC'd list)
> 
> I would be interested in both, if possible: To confirm that it's not
> only my machines that exhibit this behaviour given my settings, and to
> see what can be achieved with your tuned settings.

I was just gonna get the details from you and post results back to list, just 
didn't want to spam people while we worked out the details.

Here is untuned + quota enabled:

strace -f -w -c rsync -a --dry-run ./ /tmp/
strace: Process 21797 attached
strace: Process 21827 attached
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 60.347.5255674849  1552 3 select
 28.633.571200 821  4351   getdents
 10.851.352934   72310 lstat
  0.050.006722   9   733   write
  0.050.005798   7   817   read
  0.040.0050012501 2   openat
  0.010.0017221722 1   execve
  0.010.001627  2663   munmap
  0.000.000552   961   mmap
  0.000.000228 114 2   clone
  0.000.000224  1219   close
  0.000.000123   718   fcntl
  0.000.90  10 9 2 open
  0.000.70   710   mprotect
  0.000.66   710 5 wait4
  0.000.63   512   rt_sigaction
  0.000.60  15 4   mremap
  0.000.37   6 6   brk
  0.000.36   5 7   fstat
  0.000.33  11 3   socketpair
  0.000.31  10 3 1 stat
  0.000.29  15 2   dup2
  0.000.11  11 1   kill
  0.000.10   5 2 2 rt_sigreturn
  0.000.08   8 1   getcwd
  0.000.07   7 1 1 access
  0.000.07   7 1   chdir
  0.000.06   6 1   rt_sigprocmask
  0.000.05   5 1   umask
  0.000.05   5 1   geteuid
  0.000.05   5 1   arch_prctl
-- --- --- - - 
100.00   12.472277207698100014 total

And BUP:

strace: Process 103678 detached
strace: Process 103679 detached
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 80.56 1627.233677   26201 62106   read
  9.02  182.162398   6  30924134   lseek
  7.46  150.672844 225670684 2 lstat
  1.21   24.461651 300 81647 10513 open
  0.83   16.696162 190 87869  1019 close
  0.254.993062 318 15702   llistxattr
  0.193.874752 165 23548   msync
  0.183.630711  23159461   write
  0.061.284745   623   getcwd
  0.040.880300 112  7850   rename
  0.040.800025   6134108   fstat
  0.040.725158  15 47288   munmap
  0.030.549348  14 37922   select
  0.020.361083   8 47593   mmap
  0.020.349493  22 15699  7850 unlink
  0.010.275029   6 47116   fcntl
  0.010.253821  32  7895  7879 ioctl
  0.010.243284  31  7851  7851 getxattr
  0.010.215476  27  7851  7851 lgetxattr
  0.010.136050  15  8832  8707 stat
  0.000.091960   6 15701   dup
  0.000.0249558318 3   execve
  0.000.004210   8   524   brk
  0.000.001756   8   210   mprotect
  0.000.001450 725 2   setsid
  0.000.001162   6   205   rt_sigaction
  0.000.000670 335 2   clone
  0.000.000200  1712   getdents
  0.000.70  12 6   openat
  0.000.54

Re: [Gluster-users] Confusing lstat() performance

2017-09-19 Thread Ben Turner

- Original Message -
> From: "Niklas Hambüchen" <m...@nh2.me>
> To: "Ben Turner" <btur...@redhat.com>
> Cc: gluster-users@gluster.org
> Sent: Sunday, September 17, 2017 9:49:10 PM
> Subject: Re: [Gluster-users] Confusing lstat() performance
> 
> Hi Ben,
> 
> do you know if the smallfile benchmark also does interleaved getdents()
> and lstat, which is what I found as being the key difference that
> creates the performance gap (further down this thread)?

I am not sure, you can have a look at it:

https://github.com/bengland2/smallfile


> 
> Also, wouldn't `--threads 8` change the performance numbers by factor 8
> versus the plain `ls` and `rsync` that I did?

Maybe not 8x but it will DEF improve things.  I just recycled what was in my 
history buffer, I just wanted to illustrate that even though you see the stat 
calls in the strace application behavior can have a big impact on performance.

> 
> Would you mind running those commands directly/plainly on your cluster
> to confirm or refute my numbers?

I wouldn't mind, but I don't have your dataset.  Thats why I wanted to bring in 
a perf test tool that we could compare things apples to apples.  What about 
running on the data that smallfile creates and comparing that?

-b


> 
> Thanks!
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Confusing lstat() performance

2017-09-19 Thread Ben Turner

I attached my strace output for you to look at:

Smallfile stat:
files/sec = 2270.307299
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 84.48  272.3244123351 81274  1141 stat
 10.20   32.880871   81997   401   read
  2.788.957013   9735992   write
  2.447.854060  43633718   select
  0.030.085606  67  1277   984 open
  0.030.082586   13764 6   getdents
  0.020.048819488210   unlink
  0.010.046833   46833 1   mkdir
  0.010.027127   27127 1   rmdir
  0.000.012896   12896 1   fsync
  0.000.003129   6   531   fstat
  0.000.002766  6642 2 wait4

Smallfile ls -l:
files/sec = 22852.145303
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 60.26   18.5581473849  4822   getdents
 26.968.302826  46126818   select
  4.921.5151951868   811   openat
  4.831.486988  18 81294  1161 stat
  2.230.686146745892   write
  0.260.080318803210   unlink
  0.170.050832  40  1277   984 open
  0.100.030263   30263 1   rmdir
  0.080.023776   23776 1   fsync
  0.050.016408  41   401   read
  0.050.016061   16061 1   mkdir
  0.040.011154  10  1108   close
  0.010.003229   6   531   fstat
  0.010.002840  6842 2 wait4

Look at the difference between my two stat calls:

 84.48  272.3244123351 81274  1141 stat   <--- stat

  4.831.486988  18 81294  1161 stat   <--- ls -l

Maybe you two applications are behaving differently, like smallfile stat and 
smallfile ls-l are?

-b

- Original Message -
> From: "Ben Turner" <btur...@redhat.com>
> To: "Niklas Hambüchen" <m...@nh2.me>
> Cc: gluster-users@gluster.org
> Sent: Sunday, September 17, 2017 8:54:01 PM
> Subject: Re: [Gluster-users] Confusing lstat() performance
> 
> I did a quick test on one of my lab clusters with no tuning except for quota
> being enabled:
> 
> [root@dell-per730-03 ~]# gluster v info
>  
> Volume Name: vmstore
> Type: Replicate
> Volume ID: 0d2e4c49-334b-47c9-8e72-86a4c040a7bd
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.50.1:/rhgs/brick1/vmstore
> Brick2: 192.168.50.2:/rhgs/brick1/vmstore
> Brick3: 192.168.50.3:/rhgs/ssd/vmstore (arbiter)
> Options Reconfigured:
> features.quota-deem-statfs: on
> nfs.disable: on
> features.inode-quota: on
> features.quota: on
> 
> And I ran the smallfile benchmark, created 80k 64KB files.  After that I
> clear cache everywhere and ran a smallfile stat test
> 
> [root@dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files
> 1 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation
> stat
>  version : 3.1
>hosts in test : None
>top test directory(s) : ['/gluster-mount/s-file']
>operation : stat
> files/thread : 1
>  threads : 8
>record size (KB, 0 = maximum) : 0
>   file size (KB) : 64
>   file size distribution : fixed
>files per dir : 100
> dirs per dir : 10
>   threads share directories? : N
>  filename prefix :
>  filename suffix :
>  hash file number into dir.? : N
>  fsync after modify? : N
>   pause between files (microsec) : 0
> finish all requests? : Y
>   stonewall? : Y
>  measure response times? : N
> verify read? : Y
> verbose? : False
>   log to stderr? : False
>ext.attr.size : 0
>   ext.attr.count : 0
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed =
> 33.513184,files = 9400,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed =
> 33.322282,files =

Re: [Gluster-users] Confusing lstat() performance

2017-09-19 Thread Ben Turner

I did a quick test on one of my lab clusters with no tuning except for quota 
being enabled:

[root@dell-per730-03 ~]# gluster v info
 
Volume Name: vmstore
Type: Replicate
Volume ID: 0d2e4c49-334b-47c9-8e72-86a4c040a7bd
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.50.1:/rhgs/brick1/vmstore
Brick2: 192.168.50.2:/rhgs/brick1/vmstore
Brick3: 192.168.50.3:/rhgs/ssd/vmstore (arbiter)
Options Reconfigured:
features.quota-deem-statfs: on
nfs.disable: on
features.inode-quota: on
features.quota: on

And I ran the smallfile benchmark, created 80k 64KB files.  After that I clear 
cache everywhere and ran a smallfile stat test

[root@dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files 1 
--file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation stat
 version : 3.1
   hosts in test : None
   top test directory(s) : ['/gluster-mount/s-file']
   operation : stat
files/thread : 1
 threads : 8
   record size (KB, 0 = maximum) : 0
  file size (KB) : 64
  file size distribution : fixed
   files per dir : 100
dirs per dir : 10
  threads share directories? : N
 filename prefix : 
 filename suffix : 
 hash file number into dir.? : N
 fsync after modify? : N
  pause between files (microsec) : 0
finish all requests? : Y
  stonewall? : Y
 measure response times? : N
verify read? : Y
verbose? : False
  log to stderr? : False
   ext.attr.size : 0
  ext.attr.count : 0
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed = 
33.513184,files = 9400,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed = 
33.322282,files = 9700,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed = 
33.233768,files = 9600,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed = 
33.145645,files = 1,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed = 
33.974151,files = 9600,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed = 
33.220816,files = 9300,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed = 
33.304850,files = 9900,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed = 
33.482701,files = 9600,records = 0,status = ok
total threads = 8
total files = 77100
total IOPS = 0
 96.38% of requested files processed, minimum is  90.00
elapsed time =33.974
files/sec = 2269.372389

So I was only able to stat 2269 files / sec.  Given this I think stating 1 
million files(at least on my config) should take about 440 seconds.  I don't 
know what kind of HW you are using but 50 seconds to stat 1 million files seems 
faster than what I would think gluster can do.

Using ls -l is a different story:

[root@dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files 1 
--file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation ls-l
 version : 3.1
   hosts in test : None
   top test directory(s) : ['/gluster-mount/s-file']
   operation : ls-l
files/thread : 1
 threads : 8
   record size (KB, 0 = maximum) : 0
  file size (KB) : 64
  file size distribution : fixed
   files per dir : 100
dirs per dir : 10
  threads share directories? : N
 filename prefix : 
 filename suffix : 
 hash file number into dir.? : N
 fsync after modify? : N
  pause between files (microsec) : 0
finish all requests? : Y
  stonewall? : Y
 measure response times? : N
verify read? : Y
verbose? : False
  log to stderr? : False
   ext.attr.size : 0
  ext.attr.count : 0
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed = 
2.867676,files = 9500,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed =

Re: [Gluster-users] Slow performance of gluster volume

2017-09-11 Thread Ben Turner

- Original Message -
> From: "Abi Askushi" <rightkickt...@gmail.com>
> To: "Ben Turner" <btur...@redhat.com>
> Cc: "Krutika Dhananjay" <kdhan...@redhat.com>, "gluster-user" 
> <gluster-users@gluster.org>
> Sent: Monday, September 11, 2017 1:40:42 AM
> Subject: Re: [Gluster-users] Slow performance of gluster volume
> 
> Did not upgrade yet gluster. I am still  using 3.8.12. Only the mentioned
> changes did provide the performance boost.
> 
> From which version to which version did you see such performance boost? I
> will try to upgrade and check difference also.

Unfortunately I didn't record the package versions, I also may have done the 
same thing as you :)

-b 

> 
> On Sep 11, 2017 2:45 AM, "Ben Turner" <btur...@redhat.com> wrote:
> 
> Great to hear!
> 
> - Original Message -
> > From: "Abi Askushi" <rightkickt...@gmail.com>
> > To: "Krutika Dhananjay" <kdhan...@redhat.com>
> > Cc: "gluster-user" <gluster-users@gluster.org>
> > Sent: Friday, September 8, 2017 7:01:00 PM
> > Subject: Re: [Gluster-users] Slow performance of gluster volume
> >
> > Following changes resolved the perf issue:
> >
> > Added the option
> > /etc/glusterfs/glusterd.vol :
> > option rpc-auth-allow-insecure on
> 
> Was it this setting or was it the gluster upgrade, do you know for sure?
> It may be helpful to others to know for sure(Im interested too:).
> 
> -b
> 
> >
> > restarted glusterd
> >
> > Then set the volume option:
> > gluster volume set vms server.allow-insecure on
> >
> > I am reaching now the max network bandwidth and performance of VMs is
> quite
> > good.
> >
> > Did not upgrade the glusterd.
> >
> > As a next try I am thinking to upgrade gluster to 3.12 + test libgfapi
> > integration of qemu by upgrading to ovirt 4.1.5 and check vm perf.
> >
> >
> > On Sep 6, 2017 1:20 PM, "Abi Askushi" < rightkickt...@gmail.com > wrote:
> >
> >
> >
> > I tried to follow step from
> > https://wiki.centos.org/SpecialInterestGroup/Storage to install latest
> > gluster on the first node.
> > It installed 3.10 and not 3.11. I am not sure how to install 3.11 without
> > compiling it.
> > Then when tried to start the gluster on the node the bricks were reported
> > down (the other 2 nodes have still 3.8). No sure why. The logs were
> showing
> > the below (even after rebooting the server):
> >
> > [2017-09-06 10:56:09.023777] E [rpcsvc.c:557:rpcsvc_check_and_reply_error]
> > 0-rpcsvc: rpc actor failed to complete successfully
> > [2017-09-06 10:56:09.024122] E [server-helpers.c:395:server_alloc_frame]
> > (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7f2d0ec20905]
> > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0x3006b)
> > [0x7f2cfa4bf06b]
> > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0xdb34)
> > [0x7f2cfa49cb34] ) 0-server: invalid argument: client [Invalid argument]
> >
> > Do I need to upgrade all nodes before I attempt to start the gluster
> > services?
> > I reverted the first node back to 3.8 at the moment and all restored.
> > Also tests with eager lock disabled did not make any difference.
> >
> >
> >
> >
> > On Wed, Sep 6, 2017 at 11:15 AM, Krutika Dhananjay < kdhan...@redhat.com >
> > wrote:
> >
> >
> >
> > Do you see any improvement with 3.11.1 as that has a patch that improves
> perf
> > for this kind of a workload
> >
> > Also, could you disable eager-lock and check if that helps? I see that max
> > time is being spent in acquiring locks.
> >
> > -Krutika
> >
> > On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi < rightkickt...@gmail.com >
> > wrote:
> >
> >
> >
> > Hi Krutika,
> >
> > Is it anything in the profile indicating what is causing this bottleneck?
> In
> > case i can collect any other info let me know.
> >
> > Thanx
> >
> > On Sep 5, 2017 13:27, "Abi Askushi" < rightkickt...@gmail.com > wrote:
> >
> >
> >
> > Hi Krutika,
> >
> > Attached the profile stats. I enabled profiling then ran some dd tests.
> Also
> > 3 Windows VMs are running on top this volume but did not do any stress
> > testing on the VMs. I have left the profiling enabled in case more time is
> > needed for useful stats.
> >
> > Thanx
> >
> > On Tue, Sep 5, 2017 at 12

Re: [Gluster-users] Slow performance of gluster volume

2017-09-10 Thread Ben Turner

Great to hear!

- Original Message -
> From: "Abi Askushi" 
> To: "Krutika Dhananjay" 
> Cc: "gluster-user" 
> Sent: Friday, September 8, 2017 7:01:00 PM
> Subject: Re: [Gluster-users] Slow performance of gluster volume
> 
> Following changes resolved the perf issue:
> 
> Added the option
> /etc/glusterfs/glusterd.vol :
> option rpc-auth-allow-insecure on

Was it this setting or was it the gluster upgrade, do you know for sure?  It 
may be helpful to others to know for sure(Im interested too:).

-b

> 
> restarted glusterd
> 
> Then set the volume option:
> gluster volume set vms server.allow-insecure on
> 
> I am reaching now the max network bandwidth and performance of VMs is quite
> good.
> 
> Did not upgrade the glusterd.
> 
> As a next try I am thinking to upgrade gluster to 3.12 + test libgfapi
> integration of qemu by upgrading to ovirt 4.1.5 and check vm perf.
> 
> 
> On Sep 6, 2017 1:20 PM, "Abi Askushi" < rightkickt...@gmail.com > wrote:
> 
> 
> 
> I tried to follow step from
> https://wiki.centos.org/SpecialInterestGroup/Storage to install latest
> gluster on the first node.
> It installed 3.10 and not 3.11. I am not sure how to install 3.11 without
> compiling it.
> Then when tried to start the gluster on the node the bricks were reported
> down (the other 2 nodes have still 3.8). No sure why. The logs were showing
> the below (even after rebooting the server):
> 
> [2017-09-06 10:56:09.023777] E [rpcsvc.c:557:rpcsvc_check_and_reply_error]
> 0-rpcsvc: rpc actor failed to complete successfully
> [2017-09-06 10:56:09.024122] E [server-helpers.c:395:server_alloc_frame]
> (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7f2d0ec20905]
> -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0x3006b)
> [0x7f2cfa4bf06b]
> -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0xdb34)
> [0x7f2cfa49cb34] ) 0-server: invalid argument: client [Invalid argument]
> 
> Do I need to upgrade all nodes before I attempt to start the gluster
> services?
> I reverted the first node back to 3.8 at the moment and all restored.
> Also tests with eager lock disabled did not make any difference.
> 
> 
> 
> 
> On Wed, Sep 6, 2017 at 11:15 AM, Krutika Dhananjay < kdhan...@redhat.com >
> wrote:
> 
> 
> 
> Do you see any improvement with 3.11.1 as that has a patch that improves perf
> for this kind of a workload
> 
> Also, could you disable eager-lock and check if that helps? I see that max
> time is being spent in acquiring locks.
> 
> -Krutika
> 
> On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi < rightkickt...@gmail.com >
> wrote:
> 
> 
> 
> Hi Krutika,
> 
> Is it anything in the profile indicating what is causing this bottleneck? In
> case i can collect any other info let me know.
> 
> Thanx
> 
> On Sep 5, 2017 13:27, "Abi Askushi" < rightkickt...@gmail.com > wrote:
> 
> 
> 
> Hi Krutika,
> 
> Attached the profile stats. I enabled profiling then ran some dd tests. Also
> 3 Windows VMs are running on top this volume but did not do any stress
> testing on the VMs. I have left the profiling enabled in case more time is
> needed for useful stats.
> 
> Thanx
> 
> On Tue, Sep 5, 2017 at 12:48 PM, Krutika Dhananjay < kdhan...@redhat.com >
> wrote:
> 
> 
> 
> OK my understanding is that with preallocated disks the performance with and
> without shard will be the same.
> 
> In any case, please attach the volume profile[1], so we can see what else is
> slowing things down.
> 
> -Krutika
> 
> [1] -
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
> 
> On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi < rightkickt...@gmail.com >
> wrote:
> 
> 
> 
> Hi Krutika,
> 
> I already have a preallocated disk on VM.
> Now I am checking performance with dd on the hypervisors which have the
> gluster volume configured.
> 
> I tried also several values of shard-block-size and I keep getting the same
> low values on write performance.
> Enabling client-io-threads also did not have any affect.
> 
> The version of gluster I am using is glusterfs 3.8.12 built on May 11 2017
> 18:46:20.
> The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using gluster as
> storage.
> 
> Below are the current settings:
> 
> 
> Volume Name: vms
> Type: Replicate
> Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster0:/gluster/vms/brick
> Brick2: gluster1:/gluster/vms/brick
> Brick3: gluster2:/gluster/vms/brick (arbiter)
> Options Reconfigured:
> server.event-threads: 4
> client.event-threads: 4
> performance.client-io-threads: on
> features.shard-block-size: 512MB
> cluster.granular-entry-heal: enable
> performance.strict-o-direct: on
> network.ping-timeout: 30
> storage.owner-gid: 36
> storage.owner-uid: 36
> user.cifs: off
> features.shard: on
> cluster.shd-wait-qlength: 1

Re: [Gluster-users] Slow performance of gluster volume

2017-09-10 Thread Ben Turner

- Original Message -
> From: "Abi Askushi" 
> To: "Krutika Dhananjay" 
> Cc: "gluster-user" 
> Sent: Tuesday, September 5, 2017 5:02:46 AM
> Subject: Re: [Gluster-users] Slow performance of gluster volume
> 
> Hi Krutika,
> 
> I already have a preallocated disk on VM.
> Now I am checking performance with dd on the hypervisors which have the
> gluster volume configured.
> 
> I tried also several values of shard-block-size and I keep getting the same
> low values on write performance.
> Enabling client-io-threads also did not have any affect.
> 
> The version of gluster I am using is glusterfs 3.8.12 built on May 11 2017
> 18:46:20.
> The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using gluster as
> storage.
> 
> Below are the current settings:
> 
> 
> Volume Name: vms
> Type: Replicate
> Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster0:/gluster/vms/brick
> Brick2: gluster1:/gluster/vms/brick
> Brick3: gluster2:/gluster/vms/brick (arbiter)
> Options Reconfigured:
> server.event-threads: 4
> client.event-threads: 4
> performance.client-io-threads: on
> features.shard-block-size: 512MB
> cluster.granular-entry-heal: enable
> performance.strict-o-direct: on
> network.ping-timeout: 30
> storage.owner-gid: 36
> storage.owner-uid: 36
> user.cifs: off
> features.shard: on
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: off
> performance.low-prio-threads: 32
> performance.stat-prefetch: on
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> nfs.export-volumes: on
> 
> 
> I observed that when testing with dd if=/dev/zero of=testfile bs=1G count=1 I
> get 65MB/s on the vms gluster volume (and the network traffic between the
> servers reaches ~ 500Mbps), while when testing with dd if=/dev/zero
> of=testfile bs=1G count=1 oflag=direct I get a consistent 10MB/s and the
> network traffic hardly reaching 100Mbps.

I have a replica 3 volume that I was seeing ~65 MB / sec on my VMs, I ended up 
upgrading to a newer version and now I get closer to 150-180 MB / sec writes.  
Since you are using arbiter I would expect faster writes for you, what gluster 
version are you running?  What OS?

-b


> 
> Any other things one can do?
> 
> On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay < kdhan...@redhat.com >
> wrote:
> 
> 
> 
> I'm assuming you are using this volume to store vm images, because I see
> shard in the options list.
> 
> Speaking from shard translator's POV, one thing you can do to improve
> performance is to use preallocated images.
> This will at least eliminate the need for shard to perform multiple steps as
> part of the writes - such as creating the shard and then writing to it and
> then updating the aggregated file size - all of which require one network
> call each, which further get blown up once they reach AFR (replicate) into
> many more network calls.
> 
> Second, I'm assuming you're using the default shard block size of 4MB (you
> can confirm this using `gluster volume get  shard-block-size`). In our
> tests, we've found that larger shard sizes perform better. So maybe change
> the shard-block-size to 64MB (`gluster volume set  shard-block-size
> 64MB`).
> 
> Third, keep stat-prefetch enabled. We've found that qemu sends quite a lot of
> [f]stats which can be served from the (md)cache to improve performance. So
> enable that.
> 
> Also, could you also enable client-io-threads and see if that improves
> performance?
> 
> Which version of gluster are you using BTW?
> 
> -Krutika
> 
> 
> On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi < rightkickt...@gmail.com >
> wrote:
> 
> 
> 
> Hi all,
> 
> I have a gluster volume used to host several VMs (managed through oVirt).
> The volume is a replica 3 with arbiter and the 3 servers use 1 Gbit network
> for the storage.
> 
> When testing with dd (dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct)
> out of the volume (e.g. writing at /root/) the performance of the dd is
> reported to be ~ 700MB/s, which is quite decent. When testing the dd on the
> gluster volume I get ~ 43 MB/s which way lower from the previous. When
> testing with dd the gluster volume, the network traffic was not exceeding
> 450 Mbps on the network interface. I would expect to reach near 900 Mbps
> considering that there is 1 Gbit of bandwidth available. This results having
> VMs with very slow performance (especially on their write operations).
> 
> The full details of the volume are below. Any advise on what can be tweaked
> will be highly appreciated.
> 
>

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-03 Thread Ben Turner

- Original Message -
> From: "Serkan Çoban" <cobanser...@gmail.com>
> To: "Ben Turner" <btur...@redhat.com>
> Cc: "Gluster Users" <gluster-users@gluster.org>
> Sent: Sunday, September 3, 2017 2:55:06 PM
> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> 
> i usually change event threads to 4. But those logs are from a default
> installation.

Yepo me too, I did alot of the qualification for multi threaded epoll and that 
is what I found to best saturate my back end(12 disk RAID 6 spinners) without 
wasting threads.  Be careful tuning this up too high if you have alot of bricks 
per server, you could run into some contention with all of those threads 
fighting for CPU time.

On the hooks stuff on my system I have:

-rwxr-xr-x. 1 root root 1459 Jun  1 06:35 S29CTDB-teardown.sh
-rwxr-xr-x. 1 root root 1736 Jun  1 06:35 S30samba-stop.sh

Do you have SMB installed on these systems?  IIRC the scripts are only run if 
the service is chkconfigged on, if you don't have SMB installed and chkconfiged 
on I don't think these are the problem.

-b

> 
> On Sun, Sep 3, 2017 at 9:52 PM, Ben Turner <btur...@redhat.com> wrote:
> > - Original Message -
> >> From: "Ben Turner" <btur...@redhat.com>
> >> To: "Serkan Çoban" <cobanser...@gmail.com>
> >> Cc: "Gluster Users" <gluster-users@gluster.org>
> >> Sent: Sunday, September 3, 2017 2:30:31 PM
> >> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> >>
> >> - Original Message -
> >> > From: "Milind Changire" <mchan...@redhat.com>
> >> > To: "Serkan Çoban" <cobanser...@gmail.com>
> >> > Cc: "Gluster Users" <gluster-users@gluster.org>
> >> > Sent: Saturday, September 2, 2017 11:44:40 PM
> >> > Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> >> >
> >> > No worries Serkan,
> >> > You can continue to use your 40 node clusters.
> >> >
> >> > The backtrace has resolved the function names and it should be
> >> > sufficient
> >> > to
> >> > debug the issue.
> >> > Thanks for letting us know.
> >> >
> >> > We'll post on this thread again to notify you about the findings.
> >>
> >> One of the things I find interesting is seeing:
> >>
> >>  #1  0x7f928450099b in hooks_worker () from
> >>
> >> The "hooks" scripts are usually shell scripts that get run when volumes
> >> are
> >> started / stopped / etc.  It may be worth looking into what hooks scripts
> >> are getting run at shutdown and think about how one of them could hang up
> >> the system.  This may be a red herring but I don't see much else going on
> >> in
> >> the stack trace that I looked at.  The thread with the deepest stack is
> >> the
> >> hooks worker one, all of the other look to be in some sort of wait / sleep
> >> /
> >> listen state.
> >
> > Sorry the hooks call doesn't have the deepest stack, I didn't see the other
> > thread below it.
> >
> > In the logs I see:
> >
> > [2017-08-22 10:53:39.267860] I [socket.c:2426:socket_event_handler]
> > 0-transport: EPOLLERR - disconnecting now
> >
> > You mentioned changing event threads?  Even threads controls the number of
> > epoll listener threads, what did you change it to?  IIRC 2 is the default
> > value.  This may be some sort of race condition?  Just my $0.02.
> >
> > -b
> >
> >>
> >> -b
> >>
> >> >
> >> >
> >> >
> >> > On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanser...@gmail.com >
> >> > wrote:
> >> >
> >> >
> >> > Hi Milind,
> >> >
> >> > Anything new about the issue? Can you able to find the problem,
> >> > anything else you need?
> >> > I will continue with two clusters each 40 servers, so I will not be
> >> > able to provide any further info for 80 servers.
> >> >
> >> > On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanser...@gmail.com >
> >> > wrote:
> >> > > Hi,
> >> > > You can find pstack sampes here:
> >> > > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
> >> > >
> >> > > Here is the first one:
> >> > > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
>

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-03 Thread Ben Turner

- Original Message -
> From: "Ben Turner" <btur...@redhat.com>
> To: "Serkan Çoban" <cobanser...@gmail.com>
> Cc: "Gluster Users" <gluster-users@gluster.org>
> Sent: Sunday, September 3, 2017 2:30:31 PM
> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> 
> - Original Message -
> > From: "Milind Changire" <mchan...@redhat.com>
> > To: "Serkan Çoban" <cobanser...@gmail.com>
> > Cc: "Gluster Users" <gluster-users@gluster.org>
> > Sent: Saturday, September 2, 2017 11:44:40 PM
> > Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> > 
> > No worries Serkan,
> > You can continue to use your 40 node clusters.
> > 
> > The backtrace has resolved the function names and it should be sufficient
> > to
> > debug the issue.
> > Thanks for letting us know.
> > 
> > We'll post on this thread again to notify you about the findings.
> 
> One of the things I find interesting is seeing:
> 
>  #1  0x7f928450099b in hooks_worker () from
> 
> The "hooks" scripts are usually shell scripts that get run when volumes are
> started / stopped / etc.  It may be worth looking into what hooks scripts
> are getting run at shutdown and think about how one of them could hang up
> the system.  This may be a red herring but I don't see much else going on in
> the stack trace that I looked at.  The thread with the deepest stack is the
> hooks worker one, all of the other look to be in some sort of wait / sleep /
> listen state.

Sorry the hooks call doesn't have the deepest stack, I didn't see the other 
thread below it.

In the logs I see:

[2017-08-22 10:53:39.267860] I [socket.c:2426:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now

You mentioned changing event threads?  Even threads controls the number of 
epoll listener threads, what did you change it to?  IIRC 2 is the default 
value.  This may be some sort of race condition?  Just my $0.02.

-b

> 
> -b
> 
> > 
> > 
> > 
> > On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanser...@gmail.com >
> > wrote:
> > 
> > 
> > Hi Milind,
> > 
> > Anything new about the issue? Can you able to find the problem,
> > anything else you need?
> > I will continue with two clusters each 40 servers, so I will not be
> > able to provide any further info for 80 servers.
> > 
> > On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanser...@gmail.com >
> > wrote:
> > > Hi,
> > > You can find pstack sampes here:
> > > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
> > > 
> > > Here is the first one:
> > > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
> > > #0 0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> > > #1 0x00310fe37d57 in gf_timer_proc () from
> > > /usr/lib64/libglusterfs.so.0
> > > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > > Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
> > > #0 0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
> > > #1 0x0040643b in glusterfs_sigwaiter ()
> > > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > > Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
> > > #0 0x003d998acc4d in nanosleep () from /lib64/libc.so.6
> > > #1 0x003d998acac0 in sleep () from /lib64/libc.so.6
> > > #2 0x00310fe528fb in pool_sweeper () from
> > > /usr/lib64/libglusterfs.so.0
> > > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > > Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
> > > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > > /lib64/libpthread.so.0
> > > #1 0x00310fe64afc in syncenv_task () from
> > > /usr/lib64/libglusterfs.so.0
> > > #2 0x00310fe729f0 in syncenv_processor () from
> > > /usr/lib64/libglusterfs.so.0
> > > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > > Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
> > > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > > /lib64/libpthread.so.0
> > > #1 0x00310fe64afc in syncenv_task () from
> > > /

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-03 Thread Ben Turner

- Original Message -
> From: "Milind Changire" 
> To: "Serkan Çoban" 
> Cc: "Gluster Users" 
> Sent: Saturday, September 2, 2017 11:44:40 PM
> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> 
> No worries Serkan,
> You can continue to use your 40 node clusters.
> 
> The backtrace has resolved the function names and it should be sufficient to
> debug the issue.
> Thanks for letting us know.
> 
> We'll post on this thread again to notify you about the findings.

One of the things I find interesting is seeing:

 #1  0x7f928450099b in hooks_worker () from

The "hooks" scripts are usually shell scripts that get run when volumes are 
started / stopped / etc.  It may be worth looking into what hooks scripts are 
getting run at shutdown and think about how one of them could hang up the 
system.  This may be a red herring but I don't see much else going on in the 
stack trace that I looked at.  The thread with the deepest stack is the hooks 
worker one, all of the other look to be in some sort of wait / sleep / listen 
state.

-b

> 
> 
> 
> On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanser...@gmail.com > wrote:
> 
> 
> Hi Milind,
> 
> Anything new about the issue? Can you able to find the problem,
> anything else you need?
> I will continue with two clusters each 40 servers, so I will not be
> able to provide any further info for 80 servers.
> 
> On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanser...@gmail.com >
> wrote:
> > Hi,
> > You can find pstack sampes here:
> > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
> > 
> > Here is the first one:
> > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
> > #0 0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> > #1 0x00310fe37d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
> > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
> > #0 0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
> > #1 0x0040643b in glusterfs_sigwaiter ()
> > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
> > #0 0x003d998acc4d in nanosleep () from /lib64/libc.so.6
> > #1 0x003d998acac0 in sleep () from /lib64/libc.so.6
> > #2 0x00310fe528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
> > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1 0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> > #2 0x00310fe729f0 in syncenv_processor () from
> > /usr/lib64/libglusterfs.so.0
> > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
> > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1 0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> > #2 0x00310fe729f0 in syncenv_processor () from
> > /usr/lib64/libglusterfs.so.0
> > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 3 (Thread 0x7f9282ecc700 (LWP 78915)):
> > #0 0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1 0x7f928450099b in hooks_worker () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 2 (Thread 0x7f92824cb700 (LWP 78916)):
> > #0 0x003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6
> > #1 0x00310fe2244a in dict_lookup_common () from
> > /usr/lib64/libglusterfs.so.0
> > #2 0x00310fe2433d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
> > #3 0x00310fe245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> > #4 0x00310fe2524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> > #5 0x7f928453a8c4 in gd_add_brick_snap_details_to_dict () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #6 0x7f928447b0df in glusterd_add_volume_to_dict () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #7 0x7f928447b47c in glusterd_add_volumes_to_export_dict () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #8 0x7f9284491edf in glusterd_rpc_friend_add () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #9 0x7f92844528f7 in glusterd_ac_friend_add ()

Re: [Gluster-users] GFID attir is missing after adding large amounts of data

2017-08-31 Thread Ben Turner

I re-added gluster-users to get some more eye on this.

- Original Message -
> From: "Christoph Schäbel" <christoph.schae...@dc-square.de>
> To: "Ben Turner" <btur...@redhat.com>
> Sent: Wednesday, August 30, 2017 8:18:31 AM
> Subject: Re: [Gluster-users] GFID attir is missing after adding large amounts 
> of  data
> 
> Hello Ben,
> 
> thank you for offering your help.
> 
> Here are outputs from all the gluster commands I could think of.
> Note that we had to remove the terrabytes of data to keep the system
> operational, because it is a live system.
> 
> # gluster volume status
> 
> Status of volume: gv0
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick 10.191.206.15:/mnt/brick1/gv0 49154 0  Y   2675
> Brick 10.191.198.15:/mnt/brick1/gv0 49154 0  Y   2679
> Self-heal Daemon on localhost   N/A   N/AY
> 12309
> Self-heal Daemon on 10.191.206.15   N/A   N/AY   2670
> 
> Task Status of Volume gv0
> --
> There are no active volume tasks

OK so your bricks are all online, you have two nodes with 1 brick per node.

> 
> # gluster volume info
> 
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5e47d0b8-b348-45bb-9a2a-800f301df95b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.191.206.15:/mnt/brick1/gv0
> Brick2: 10.191.198.15:/mnt/brick1/gv0
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on

You are using a replicate volume with 2 copies of your data, it looks like you 
are using the defaults as I don't see any tuning.

> 
> # gluster peer status
> 
> Number of Peers: 1
> 
> Hostname: 10.191.206.15
> Uuid: 030a879d-da93-4a48-8c69-1c552d3399d2
> State: Peer in Cluster (Connected)
> 
> 
> # gluster —version
> 
> glusterfs 3.8.11 built on Apr 11 2017 09:50:39
> Repository revision: git://git.gluster.com/glusterfs.git
> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU General
> Public License.

You are running Gluster 3.8 which is the latest upstream release marked stable.

> 
> # df -h
> 
> Filesystem   Size  Used Avail Use% Mounted on
> /dev/mapper/vg00-root 75G  5.7G   69G   8% /
> devtmpfs 1.9G 0  1.9G   0% /dev
> tmpfs1.9G 0  1.9G   0% /dev/shm
> tmpfs1.9G   17M  1.9G   1% /run
> tmpfs1.9G 0  1.9G   0% /sys/fs/cgroup
> /dev/sda1477M  151M  297M  34% /boot
> /dev/mapper/vg10-brick1  8.0T  700M  8.0T   1% /mnt/brick1
> localhost:/gv0   8.0T  768M  8.0T   1% /mnt/glusterfs_client
> tmpfs380M 0  380M   0% /run/user/0
>

Your brick is:

 /dev/mapper/vg10-brick1  8.0T  700M  8.0T   1% /mnt/brick1

The block device is 8TB.  Can you tell me more about your brick?  Is it a 
single disk or a RAID?  If its a RAID can you tell me about the disks?  I am 
interested in:

-Size of disks
-RAID type
-Stripe size
-RAID controller

I also see:

 localhost:/gv0   8.0T  768M  8.0T   1% /mnt/glusterfs_client

So you are mounting your volume on the local node, is this the mount where you 
are writing data to?
 
> 
> 
> The setup of the servers is done via shell script on CentOS 7 containing the
> following commands:
> 
> yum install -y centos-release-gluster
> yum install -y glusterfs-server
> 
> mkdir /mnt/brick1
> ssm create -s 999G -n brick1 --fstype xfs -p vg10 /dev/sdb /mnt/brick1

I haven't used system-storage-manager before, do you know if it takes care of 
properly tuning your storage stack(if you have a RAID that is)?  If you don't 
have a RAID its prolly not that big of a deal, if you do have a RAID we should 
make sure everything is aware of your stripe size and tune appropriately.

> 
> echo "/dev/mapper/vg10-brick1   /mnt/brick1 xfs defaults1   2" >>
> /etc/fstab
> mount -a && mount
> mkdir /mnt/brick1/gv0
> 
> gluster peer probe OTHER_SERVER_IP
> 
> gluster pool list
> gluster volume create gv0 replica 2 OWN_SERVER_IP:/mnt/brick1/gv0
> OTHER_SERVER_IP:/mnt/brick1/gv0
> gluster volume start gv0
> gluster volume info gv0
> gluster volume set gv0 network.ping-timeout "10"
> gluster volume info gv0
> 
> # mount as client for archiving cronjo

Re: [Gluster-users] GFID attir is missing after adding large amounts of data

2017-08-28 Thread Ben Turner

Also include gluster v status, I want to check the status of your bricks and 
SHD processes.

-b

- Original Message -
> From: "Ben Turner" <btur...@redhat.com>
> To: "Christoph Schäbel" <christoph.schae...@dc-square.de>
> Cc: gluster-users@gluster.org
> Sent: Tuesday, August 29, 2017 12:35:05 AM
> Subject: Re: [Gluster-users] GFID attir is missing after adding large amounts 
> of  data
> 
> This is strange, a couple of questions:
> 
> 1.  What volume type is this?  What tuning have you done?  gluster v info
> output would be helpful here.
> 
> 2.  How big are your bricks?
> 
> 3.  Can you write me a quick reproducer so I can try this in the lab?  Is it
> just a single multi TB file you are untarring or many?  If you give me the
> steps to repro, and I hit it, we can get a bug open.
> 
> 4.  Other than this are you seeing any other problems?  What if you untar a
> smaller file(s)?  Can you read and write to the volume with say DD without
> any problems?
> 
> It sounds like you have some other issues affecting things here, there is no
> reason why you shouldn't be able to untar and write multiple TBs of data to
> gluster.  Go ahead and answer those questions and I'll see what I can do to
> help you out.
> 
> -b
> 
> - Original Message -
> > From: "Christoph Schäbel" <christoph.schae...@dc-square.de>
> > To: gluster-users@gluster.org
> > Sent: Monday, August 28, 2017 3:55:31 AM
> > Subject: [Gluster-users] GFID attir is missing after adding large amounts
> > of  data
> > 
> > Hi Cluster Community,
> > 
> > we are seeing some problems when adding multiple terrabytes of data to a 2
> > node replicated GlusterFS installation.
> > 
> > The version is 3.8.11 on CentOS 7.
> > The machines are connected via 10Gbit LAN and are running 24/7. The OS is
> > virtualized on VMWare.
> > 
> > After a restart of node-1 we see that the log files are growing to multiple
> > Gigabytes a day.
> > 
> > Also there seem to be problems with the replication.
> > The setup worked fine until sometime after we added the additional data
> > (around 3 TB in size) to node-1. We added the data to a mountpoint via the
> > client, not directly to the brick.
> > What we did is add tar files via a client-mount and then untar them while
> > in
> > the client-mount folder.
> > The brick (/mnt/brick1/gv0) is using the XFS filesystem.
> > 
> > When checking the file attributes of one of the files mentioned in the
> > brick
> > logs, i can see that the gfid attribute is missing on node-1. On node-2 the
> > file does not even exist.
> > 
> > getfattr -m . -d -e hex
> > mnt/brick1/gv0/.glusterfs/40/59/40598e46-9868-4d7c-b494-7b978e67370a/type=type1/part-r-2-4846e211-c81d-4c08-bb5e-f22fa5a4b404.gz.parquet
> > 
> > # file:
> > mnt/brick1/gv0/.glusterfs/40/59/40598e46-9868-4d7c-b494-7b978e67370a/type=type1/part-r-2-4846e211-c81d-4c08-bb5e-f22fa5a4b404.gz.parquet
> > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> > 
> > We repeated this scenario a second time with a fresh setup and got the same
> > results.
> > 
> > Does anyone know what we are doing wrong ?
> > 
> > Is there maybe a problem with glusterfs and tar ?
> > 
> > 
> > Log excerpts:
> > 
> > 
> > glustershd.log
> > 
> > [2017-07-26 15:31:36.290908] I [MSGID: 108026]
> > [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> > performing entry selfheal on fe5c42ac-5fda-47d4-8221-484c8d826c06
> > [2017-07-26 15:31:36.294289] W [MSGID: 114031]
> > [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> > operation failed. Path: (null) (----) [No
> > data available]
> > [2017-07-26 15:31:36.298287] I [MSGID: 108026]
> > [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> > performing entry selfheal on e31ae2ca-a3d2-4a27-a6ce-9aae24608141
> > [2017-07-26 15:31:36.300695] W [MSGID: 114031]
> > [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> > operation failed. Path: (null) (----) [No
> > data available]
> > [2017-07-26 15:31:36.303626] I [MSGID: 108026]
> > [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> > performing entry selfheal on 2cc9dafe-64d3-454a-a647-20deddfaebfe
> > [2017-07-26 15:31:36.305763] W [MSGID: 114031]
> > [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> > operation

Re: [Gluster-users] GFID attir is missing after adding large amounts of data

2017-08-28 Thread Ben Turner

This is strange, a couple of questions:

1.  What volume type is this?  What tuning have you done?  gluster v info 
output would be helpful here.

2.  How big are your bricks?

3.  Can you write me a quick reproducer so I can try this in the lab?  Is it 
just a single multi TB file you are untarring or many?  If you give me the 
steps to repro, and I hit it, we can get a bug open.  

4.  Other than this are you seeing any other problems?  What if you untar a 
smaller file(s)?  Can you read and write to the volume with say DD without any 
problems?

It sounds like you have some other issues affecting things here, there is no 
reason why you shouldn't be able to untar and write multiple TBs of data to 
gluster.  Go ahead and answer those questions and I'll see what I can do to 
help you out.

-b

- Original Message -
> From: "Christoph Schäbel" 
> To: gluster-users@gluster.org
> Sent: Monday, August 28, 2017 3:55:31 AM
> Subject: [Gluster-users] GFID attir is missing after adding large amounts of  
> data
> 
> Hi Cluster Community,
> 
> we are seeing some problems when adding multiple terrabytes of data to a 2
> node replicated GlusterFS installation.
> 
> The version is 3.8.11 on CentOS 7.
> The machines are connected via 10Gbit LAN and are running 24/7. The OS is
> virtualized on VMWare.
> 
> After a restart of node-1 we see that the log files are growing to multiple
> Gigabytes a day.
> 
> Also there seem to be problems with the replication.
> The setup worked fine until sometime after we added the additional data
> (around 3 TB in size) to node-1. We added the data to a mountpoint via the
> client, not directly to the brick.
> What we did is add tar files via a client-mount and then untar them while in
> the client-mount folder.
> The brick (/mnt/brick1/gv0) is using the XFS filesystem.
> 
> When checking the file attributes of one of the files mentioned in the brick
> logs, i can see that the gfid attribute is missing on node-1. On node-2 the
> file does not even exist.
> 
> getfattr -m . -d -e hex
> mnt/brick1/gv0/.glusterfs/40/59/40598e46-9868-4d7c-b494-7b978e67370a/type=type1/part-r-2-4846e211-c81d-4c08-bb5e-f22fa5a4b404.gz.parquet
> 
> # file:
> mnt/brick1/gv0/.glusterfs/40/59/40598e46-9868-4d7c-b494-7b978e67370a/type=type1/part-r-2-4846e211-c81d-4c08-bb5e-f22fa5a4b404.gz.parquet
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> 
> We repeated this scenario a second time with a fresh setup and got the same
> results.
> 
> Does anyone know what we are doing wrong ?
> 
> Is there maybe a problem with glusterfs and tar ?
> 
> 
> Log excerpts:
> 
> 
> glustershd.log
> 
> [2017-07-26 15:31:36.290908] I [MSGID: 108026]
> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> performing entry selfheal on fe5c42ac-5fda-47d4-8221-484c8d826c06
> [2017-07-26 15:31:36.294289] W [MSGID: 114031]
> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> operation failed. Path: (null) (----) [No
> data available]
> [2017-07-26 15:31:36.298287] I [MSGID: 108026]
> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> performing entry selfheal on e31ae2ca-a3d2-4a27-a6ce-9aae24608141
> [2017-07-26 15:31:36.300695] W [MSGID: 114031]
> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> operation failed. Path: (null) (----) [No
> data available]
> [2017-07-26 15:31:36.303626] I [MSGID: 108026]
> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> performing entry selfheal on 2cc9dafe-64d3-454a-a647-20deddfaebfe
> [2017-07-26 15:31:36.305763] W [MSGID: 114031]
> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> operation failed. Path: (null) (----) [No
> data available]
> [2017-07-26 15:31:36.308639] I [MSGID: 108026]
> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> performing entry selfheal on cbabf9ed-41be-4d08-9cdb-5734557ddbea
> [2017-07-26 15:31:36.310819] W [MSGID: 114031]
> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> operation failed. Path: (null) (----) [No
> data available]
> [2017-07-26 15:31:36.315057] I [MSGID: 108026]
> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> performing entry selfheal on 8a3c1c16-8edf-40f0-b2ea-8e70c39e1a69
> [2017-07-26 15:31:36.317196] W [MSGID: 114031]
> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> operation failed. Path: (null) (----) [No
> data available]
> 
> 
> 
> bricks/mnt-brick1-gv0.log
> 
> 2017-07-26 15:31:36.287831] E [MSGID: 115050]
> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153546: LOOKUP
> /part-r-1-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
>

Re: [Gluster-users] self-heal not working

2017-08-28 Thread Ben Turner

- Original Message -
> From: "mabi" <m...@protonmail.ch>
> To: "Ravishankar N" <ravishan...@redhat.com>
> Cc: "Ben Turner" <btur...@redhat.com>, "Gluster Users" 
> <gluster-users@gluster.org>
> Sent: Monday, August 28, 2017 3:59:06 AM
> Subject: Re: [Gluster-users] self-heal not working
> 
> Excuse me for my naive questions but how do I reset the afr.dirty xattr on
> the file to be healed? and do I need to do that through a FUSE mount? or
> simply on every bricks directly?

Don't worry its not naive, I didn't put the command before I asked cause I 
couldn't remember of it :)  Well and I wanted to be sure.  I am going over my 
mails, if I don't see it in a later post I'll add it.

-b


> 
> >  Original Message 
> > Subject: Re: [Gluster-users] self-heal not working
> > Local Time: August 28, 2017 5:58 AM
> > UTC Time: August 28, 2017 3:58 AM
> > From: ravishan...@redhat.com
> > To: Ben Turner <btur...@redhat.com>, mabi <m...@protonmail.ch>
> > Gluster Users <gluster-users@gluster.org>
> >
> > On 08/28/2017 01:57 AM, Ben Turner wrote:
> >> - Original Message -
> >>> From: "mabi" <m...@protonmail.ch>
> >>> To: "Ravishankar N" <ravishan...@redhat.com>
> >>> Cc: "Ben Turner" <btur...@redhat.com>, "Gluster Users"
> >>> <gluster-users@gluster.org>
> >>> Sent: Sunday, August 27, 2017 3:15:33 PM
> >>> Subject: Re: [Gluster-users] self-heal not working
> >>>
> >>> Thanks Ravi for your analysis. So as far as I understand nothing to worry
> >>> about but my question now would be: how do I get rid of this file from
> >>> the
> >>> heal info?
> >> Correct me if I am wrong but clearing this is just a matter of resetting
> >> the afr.dirty xattr? @Ravi - Is this correct?
> >
> > Yes resetting the xattr and launching index heal or running heal-info
> > command should serve as a workaround.
> > -Ravi
> >
> >>
> >> -b
> >>
> >>>>  Original Message 
> >>>> Subject: Re: [Gluster-users] self-heal not working
> >>>> Local Time: August 27, 2017 3:45 PM
> >>>> UTC Time: August 27, 2017 1:45 PM
> >>>> From: ravishan...@redhat.com
> >>>> To: mabi <m...@protonmail.ch>
> >>>> Ben Turner <btur...@redhat.com>, Gluster Users
> >>>> <gluster-users@gluster.org>
> >>>>
> >>>> Yes, the shds did pick up the file for healing (I saw messages like "
> >>>> got
> >>>> entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards.
> >>>>
> >>>> Anyway I reproduced it by manually setting the afr.dirty bit for a zero
> >>>> byte file on all 3 bricks. Since there are no afr pending xattrs
> >>>> indicating good/bad copies and all files are zero bytes, the data
> >>>> self-heal algorithm just picks the file with the latest ctime as source.
> >>>> In your case that was the arbiter brick. In the code, there is a check
> >>>> to
> >>>> prevent data heals if arbiter is the source. So heal was not happening
> >>>> and
> >>>> the entries were not removed from heal-info output.
> >>>>
> >>>> Perhaps we should add a check in the code to just remove the entries
> >>>> from
> >>>> heal-info if size is zero bytes in all bricks.
> >>>>
> >>>> -Ravi
> >>>>
> >>>> On 08/25/2017 06:33 PM, mabi wrote:
> >>>>
> >>>>> Hi Ravi,
> >>>>>
> >>>>> Did you get a chance to have a look at the log files I have attached in
> >>>>> my
> >>>>> last mail?
> >>>>>
> >>>>> Best,
> >>>>> Mabi
> >>>>>
> >>>>>>  Original Message 
> >>>>>> Subject: Re: [Gluster-users] self-heal not working
> >>>>>> Local Time: August 24, 2017 12:08 PM
> >>>>>> UTC Time: August 24, 2017 10:08 AM
> >>>>>> From: m...@protonmail.ch
> >>>>>> To: Ravishankar N
> >>>>>> [<ravishan...@redhat.com>](mailto:ravishan...@redhat.com)
> >>>>>> Ben Turner [<btur...@redhat.com>](mailto:btur

Re: [Gluster-users] self-heal not working

2017-08-27 Thread Ben Turner

- Original Message -
> From: "mabi" <m...@protonmail.ch>
> To: "Ravishankar N" <ravishan...@redhat.com>
> Cc: "Ben Turner" <btur...@redhat.com>, "Gluster Users" 
> <gluster-users@gluster.org>
> Sent: Sunday, August 27, 2017 3:15:33 PM
> Subject: Re: [Gluster-users] self-heal not working
> 
> Thanks Ravi for your analysis. So as far as I understand nothing to worry
> about but my question now would be: how do I get rid of this file from the
> heal info?

Correct me if I am wrong but clearing this is just a matter of resetting the 
afr.dirty xattr?  @Ravi - Is this correct?

-b

> 
> >  Original Message 
> > Subject: Re: [Gluster-users] self-heal not working
> > Local Time: August 27, 2017 3:45 PM
> > UTC Time: August 27, 2017 1:45 PM
> > From: ravishan...@redhat.com
> > To: mabi <m...@protonmail.ch>
> > Ben Turner <btur...@redhat.com>, Gluster Users <gluster-users@gluster.org>
> >
> > Yes, the shds did pick up the file for healing (I saw messages like " got
> > entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards.
> >
> > Anyway I reproduced it by manually setting the afr.dirty bit for a zero
> > byte file on all 3 bricks. Since there are no afr pending xattrs
> > indicating good/bad copies and all files are zero bytes, the data
> > self-heal algorithm just picks the file with the latest ctime as source.
> > In your case that was the arbiter brick. In the code, there is a check to
> > prevent data heals if arbiter is the source. So heal was not happening and
> > the entries were not removed from heal-info output.
> >
> > Perhaps we should add a check in the code to just remove the entries from
> > heal-info if size is zero bytes in all bricks.
> >
> > -Ravi
> >
> > On 08/25/2017 06:33 PM, mabi wrote:
> >
> >> Hi Ravi,
> >>
> >> Did you get a chance to have a look at the log files I have attached in my
> >> last mail?
> >>
> >> Best,
> >> Mabi
> >>
> >>>  Original Message 
> >>> Subject: Re: [Gluster-users] self-heal not working
> >>> Local Time: August 24, 2017 12:08 PM
> >>> UTC Time: August 24, 2017 10:08 AM
> >>> From: m...@protonmail.ch
> >>> To: Ravishankar N
> >>> [<ravishan...@redhat.com>](mailto:ravishan...@redhat.com)
> >>> Ben Turner [<btur...@redhat.com>](mailto:btur...@redhat.com), Gluster
> >>> Users [<gluster-users@gluster.org>](mailto:gluster-users@gluster.org)
> >>>
> >>> Thanks for confirming the command. I have now enabled DEBUG
> >>> client-log-level, run a heal and then attached the glustershd log files
> >>> of all 3 nodes in this mail.
> >>>
> >>> The volume concerned is called myvol-pro, the other 3 volumes have no
> >>> problem so far.
> >>>
> >>> Also note that in the mean time it looks like the file has been deleted
> >>> by the user and as such the heal info command does not show the file
> >>> name anymore but just is GFID which is:
> >>>
> >>> gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea
> >>>
> >>> Hope that helps for debugging this issue.
> >>>
> >>>>  Original Message 
> >>>> Subject: Re: [Gluster-users] self-heal not working
> >>>> Local Time: August 24, 2017 5:58 AM
> >>>> UTC Time: August 24, 2017 3:58 AM
> >>>> From: ravishan...@redhat.com
> >>>> To: mabi [<m...@protonmail.ch>](mailto:m...@protonmail.ch)
> >>>> Ben Turner [<btur...@redhat.com>](mailto:btur...@redhat.com), Gluster
> >>>> Users [<gluster-users@gluster.org>](mailto:gluster-users@gluster.org)
> >>>>
> >>>> Unlikely. In your case only the afr.dirty is set, not the
> >>>> afr.volname-client-xx xattr.
> >>>>
> >>>> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is
> >>>> right.
> >>>>
> >>>> On 08/23/2017 10:31 PM, mabi wrote:
> >>>>
> >>>>> I just saw the following bug which was fixed in 3.8.15:
> >>>>>
> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1471613
> >>>>>
> >>>>> Is it possible that the problem I described in this post is related to
> >>>>> that bug?
> >>>>>

Re: [Gluster-users] self-heal not working

2017-08-21 Thread Ben Turner

Can you also provide:

gluster v heal  info split-brain

If it is split brain just delete the incorrect file from the brick and run heal 
again.  I haven't tried this with arbiter but I assume the process is the same.

-b

- Original Message -
> From: "mabi" <m...@protonmail.ch>
> To: "Ben Turner" <btur...@redhat.com>
> Cc: "Gluster Users" <gluster-users@gluster.org>
> Sent: Monday, August 21, 2017 4:55:59 PM
> Subject: Re: [Gluster-users] self-heal not working
> 
> Hi Ben,
> 
> So it is really a 0 kBytes file everywhere (all nodes including the arbiter
> and from the client).
> Here below you will find the output you requested. Hopefully that will help
> to find out why this specific file is not healing... Let me know if you need
> any more information. Btw node3 is my arbiter node.
> 
> NODE1:
> 
> STAT:
>   File:
>   
> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>   Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> Device: 24h/36d Inode: 10033884Links: 2
> Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2017-08-14 17:04:55.530681000 +0200
> Modify: 2017-08-14 17:11:46.407404779 +0200
> Change: 2017-08-14 17:11:46.407404779 +0200
> Birth: -
> 
> GETFATTR:
> trusted.afr.dirty=0sAQAA
> trusted.bit-rot.version=0sAgBZhuknAAlJAg==
> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=
> 
> NODE2:
> 
> STAT:
>   File:
>   
> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>   Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> Device: 26h/38d Inode: 10031330Links: 2
> Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2017-08-14 17:04:55.530681000 +0200
> Modify: 2017-08-14 17:11:46.403704181 +0200
> Change: 2017-08-14 17:11:46.403704181 +0200
> Birth: -
> 
> GETFATTR:
> trusted.afr.dirty=0sAQAA
> trusted.bit-rot.version=0sAgBZhu6wAA8Hpw==
> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=
> 
> NODE3:
> STAT:
>   File:
>   
> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>   Size: 0 Blocks: 0  IO Block: 4096   regular empty file
> Device: ca11h/51729d Inode: 405208959   Links: 2
> Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2017-08-14 17:04:55.530681000 +0200
> Modify: 2017-08-14 17:04:55.530681000 +0200
> Change: 2017-08-14 17:11:46.604380051 +0200
> Birth: -
> 
> GETFATTR:
> trusted.afr.dirty=0sAQAA
> trusted.bit-rot.version=0sAgBZe6ejAAKPAg==
> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=
> 
> CLIENT GLUSTER MOUNT:
> STAT:
>   File:
>   '/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png'
>   Size: 0 Blocks: 0  IO Block: 131072 regular empty file
> Device: 1eh/30d Inode: 11897049013408443114  Links: 1
> Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2017-08-14 17:04:55.530681000 +0200
> Modify: 2017-08-14 17:11:46.407404779 +0200
> Change: 2017-08-14 17:11:46.407404779 +0200
> Birth: -
> 
> >  Original Message 
> > Subject: Re: [Gluster-users] self-heal not working
> > Local Time: August 21, 2017 9:34 PM
> > UTC Time: August 21, 2017 7:34 PM
> > From: btur...@redhat.com
> > To: mabi <m...@protonmail.ch>
> > Gluster Users <gluster-users@gluster.org>
> >
> > - Original Message -
> >> From: "mabi" <m...@protonmail.ch>
> >> To: "Gluster Users" <gluster-users@gluster.org>
> >> Sent: Monday, August 21, 2017 9:28:24 AM
> >> Subject: [Gluster-users] self-heal not working
> >>
> >> Hi,
> >>
> >> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and there is
> >> currently one file listed to be healed as you can see below but never gets
> >> healed by the self-heal daemon:
> >>
> >> Brick node1.domain.tld:/data/myvolume/brick
> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >> Status: Connected
> >> Number of entries: 1
> >>
> >> Brick node2.domain.tld:/data/myvolume/brick
> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >> Status: Connected
> >> Number of

Re: [Gluster-users] self-heal not working

2017-08-21 Thread Ben Turner

- Original Message -
> From: "mabi" 
> To: "Gluster Users" 
> Sent: Monday, August 21, 2017 9:28:24 AM
> Subject: [Gluster-users] self-heal not working
> 
> Hi,
> 
> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and there is
> currently one file listed to be healed as you can see below but never gets
> healed by the self-heal daemon:
> 
> Brick node1.domain.tld:/data/myvolume/brick
> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> Status: Connected
> Number of entries: 1
> 
> Brick node2.domain.tld:/data/myvolume/brick
> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> Status: Connected
> Number of entries: 1
> 
> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> Status: Connected
> Number of entries: 1
> 
> As once recommended on this mailing list I have mounted that glusterfs volume
> temporarily through fuse/glusterfs and ran a "stat" on that file which is
> listed above but nothing happened.
> 
> The file itself is available on all 3 nodes/bricks but on the last node it
> has a different date. By the way this file is 0 kBytes big. Is that maybe
> the reason why the self-heal does not work?

Is the file actually 0 bytes or is it just 0 bytes on the arbiter(0 bytes are 
expected on the arbiter, it just stores metadata)?  Can you send us the output 
from stat on all 3 nodes:

$ stat 
$ getfattr -d -m - 
$ stat 

Lets see what things look like on the back end, it should tell us why healing 
is failing.

-b

> 
> And how can I now make this file to heal?
> 
> Thanks,
> Mabi
> 
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster performance with VM's

2017-08-10 Thread Ben Turner

Have a look at this:

http://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/qemu-integration/

It may be a UID / GID issue?

Setting ownership on the volume
Set the ownership of qemu:qemu on to the volume

gluster volume set  storage.owner-uid 107
gluster volume set  storage.owner-gid 107

Have a run through that DOC and make sure everything is configured properly.

-b

- Original Message -
> From: "Alexey Zakurin" 
> To: gluster-users@gluster.org
> Sent: Wednesday, August 9, 2017 7:53:00 AM
> Subject: [Gluster-users] Gluster performance with VM's
> 
> Hi, community
> 
> Please, help me with my trouble.
> 
> I have 2 Gluster nodes, with 2 bricks on each.
> Configuration:
> Node1 brick1 replicated on Node0 brick0
> Node0 brick1 replicated on Node1 brick0
> 
> Volume Name: gm0
> Type: Distributed-Replicate
> Volume ID: 5e55f511-8a50-46e4-aa2f-5d4f73c859cf
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: gl1:/mnt/brick1/gm0
> Brick2: gl0:/mnt/brick0/gm0
> Brick3: gl0:/mnt/brick1/gm0
> Brick4: gl1:/mnt/brick0/gm0
> Options Reconfigured:
> cluster.rebal-throttle: aggressive
> performance.cache-refresh-timeout: 4
> performance.cache-max-file-size: 10MB
> performance.client-io-threads: on
> diagnostics.client-log-level: WARNING
> diagnostics.brick-log-level: WARNING
> performance.write-behind-window-size: 4MB
> features.scrub: Active
> features.bitrot: on
> cluster.readdir-optimize: on
> server.event-threads: 16
> client.event-threads: 16
> cluster.lookup-optimize: on
> server.allow-insecure: on
> performance.read-ahead: disable
> performance.readdir-ahead: off
> performance.io-thread-count: 64
> performance.cache-size: 2GB
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> nfs.disable: on
> transport.address-family: inet
> cluster.self-heal-daemon: enable
> cluster.server-quorum-ratio: 51%
> 
> Each brick is software RAID10, that contains 6 disks.
> 
> 20Gb round-robin bonding between servers and clients - one network.
> 
> On storage, I have VM's images.
> VM's run on 3 clients, Xen Hypervisor.
> 
> One of VM's is the FTP-server, that contains large numbers of archives.
> 
> Problem.
> When I try to upload on FTP-server large file (50-60Gb), other VM's was
> throttled too much. Sometimes, FS on this VM's can be automatically
> re-mounted in ro-mode.
> Network monitor saying, that speed ~40MB/sec. Disk monitor displays the
> same.
> 
> I try to start VM with FTP on other server (0) - problem still persist.
> Mounting other Gluster node - problem still persist.
> 
> Please, help to solve this problem.
> 
> --
> С уважением, Закурин Алексей Евгеньевич.
> Telegram: @Zakurin
> Tel: +7 968 455 88 48
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replicated volume, one slow brick

2017-07-13 Thread Ben Turner

You won't have 3x copies of data but you could try replica2 + arbiter volume?  
Other than that I am not sure how to or if its possible to compensate for a 
slow brick like that.

-b

- Original Message -
> From: "Øyvind Krosby" 
> To: gluster-users@gluster.org
> Sent: Thursday, July 13, 2017 4:44:09 AM
> Subject: [Gluster-users] Replicated volume, one slow brick
> 
> I have been trying to figure out how glusterfs-fuse client will handle it
> when 1 of 3 bricks in a 3-way replica is slower than the others.
> 
> It looks like a glusterfs-fuse client will send requests to all 3 bricks when
> accessing a file. But what happens when one of the bricks is not responding
> in time?
> 
> We saw an issue when we added external load to the raid volume where the
> brick was located. The disk became 100% busy, and as a result the
> glusterfs-clients hang when they access the volume.
> 
> Is there a way to avoid this, and make the clients ask the other two bricks
> for the data when one brick is too slow?
> 
> Thanks
> 
> Øyvind Krosby
> SRE, Zedge.net
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport Endpoint Not connected while running sysbench on Gluster Volume

2017-06-15 Thread Ben Turner

- Original Message -
> From: "Ben Turner" <btur...@redhat.com>
> To: "Julio Guevara" <julioguevara...@gmail.com>
> Cc: gluster-users@gluster.org
> Sent: Thursday, June 15, 2017 6:10:58 PM
> Subject: Re: [Gluster-users] Transport Endpoint Not connected while running 
> sysbench on Gluster Volume
> 
> 
> 
> - Original Message -
> > From: "Julio Guevara" <julioguevara...@gmail.com>
> > To: "Ben Turner" <btur...@redhat.com>
> > Sent: Thursday, June 15, 2017 5:52:26 PM
> > Subject: Re: [Gluster-users] Transport Endpoint Not connected while running
> > sysbench on Gluster Volume
> > 
> > I stumble upon the problem.
> > 
> > We are using deep security agent (da_agent) as our main antivirus. When the
> > antivirus gets activated it installs kernel modules:
> >   redirfs
> >   gsch
> > 
> > Apparently when this modules are present and loaded to the kernel, I see
> > all the issues that i have described here.
> > Once I uninstall the agent and reboot the system (To make sure modules are
> > unloaded) glusterfs works without any issue.
> > This is the sofware version that i'm using if it is useful for anybody:
> > 
> >   CentOS 6.8
> >   kernel2.6.32-696.3.1.el6
> >   ds_agent   9.6.2-7723.el6 tried with ds_agent 9.6.2-7888.el6
> >  same issue.
> >   glusterfs-server  3.8.12-1.el6
> > 
> > @Ben the tail I sent before includes both server and client logs, even
> > bricks.
> 
> Hmm, maybe the security SW is killing / interfering some how with the gluster
> stack?  Do you know the expected behavior of the antivirus when is sees
> binaries and / or behavior it doesn't recognize?  Maybe FUSE being in user
> space is tripping it up?  Is there any way to configure the anitvirus to
> white list / not interfere with the components of the gluster stack?

I just did a quick google and saw:

http://docs.trendmicro.com/all/ent/ds/v9.5_sp1/en-us/DS_Agent-Linux_9.5_SP1_readme.txt

   - Anti-Malware is unable to scan fuse-based file-system if the 
 mount owner is not root, and the mount does not allow other users to 
 access. [26265]

So it would appear that there have been some issues with FUSE based file 
systems.  It may be worth reaching out to the vendor if you have support and 
see if there are any known issues with FUSE based systems.  In the meantime you 
may want to try NFS if you NEED the antivirus else you could leave it disabled 
until you get the issue sorted.

-b


> 
> -b
> 
> 
> > 
> > Thanks
> > Julio Guevara
> > 
> > On Wed, Jun 14, 2017 at 11:11 PM, Ben Turner <btur...@redhat.com> wrote:
> > 
> > > - Original Message -
> > > > From: "Julio Guevara" <julioguevara...@gmail.com>
> > > > To: gluster-users@gluster.org
> > > > Sent: Tuesday, June 13, 2017 4:43:06 PM
> > > > Subject: [Gluster-users] Transport Endpoint Not connected while running
> > >  sysbench on Gluster Volume
> > > >
> > > > I'm having a hard time trying to get a gluster volume up and running. I
> > > have
> > > > setup other gluster volumes on other systems without much problems but
> > > this
> > > > one is killing me.
> > > >
> > > > The gluster vol was created with the command:
> > > > gluster volume create mariadb_gluster_volume
> > > > laeft-dccdb01p:/export/mariadb/brick
> > > >
> > > > I had to lower frame-timeout since the system would become unresponsive
> > > until
> > > > the frame failed by timeout:
> > > > gluster volume set mariadb_gluster_volume networking.frame-timeout 5
> > > >
> > > > running gluster version: glusterfs 3.8.12
> > > >
> > > > The workload i'm using is: sysbench --test=fileio --file-total-size=4G
> > > > --file-num=64 prepare
> > > >
> > > > sysbench version: sysbench 0.4.12-5.el6
> > > >
> > > > kernel version: 2.6.32-696.1.1.el6
> > > >
> > > > centos: 6.8
> > > >
> > > > Issue: Whenever I run the sysbench over the mount
> > > > /var/lib/mysql_backups
> > > I
> > > > get the error that is shown on the log output.
> > > >
> > > > It is a constant issue, I can reproduce it when I start increasing the
> > > > --file-num for sysbench above 3.
> > >
> > > It looks like you may be seeing a crash.  If you look at
> > > /var/log/messages
> >

Re: [Gluster-users] Transport Endpoint Not connected while running sysbench on Gluster Volume

2017-06-15 Thread Ben Turner



- Original Message -
> From: "Julio Guevara" <julioguevara...@gmail.com>
> To: "Ben Turner" <btur...@redhat.com>
> Sent: Thursday, June 15, 2017 5:52:26 PM
> Subject: Re: [Gluster-users] Transport Endpoint Not connected while running 
> sysbench on Gluster Volume
> 
> I stumble upon the problem.
> 
> We are using deep security agent (da_agent) as our main antivirus. When the
> antivirus gets activated it installs kernel modules:
>   redirfs
>   gsch
> 
> Apparently when this modules are present and loaded to the kernel, I see
> all the issues that i have described here.
> Once I uninstall the agent and reboot the system (To make sure modules are
> unloaded) glusterfs works without any issue.
> This is the sofware version that i'm using if it is useful for anybody:
> 
>   CentOS 6.8
>   kernel2.6.32-696.3.1.el6
>   ds_agent   9.6.2-7723.el6 tried with ds_agent 9.6.2-7888.el6
>  same issue.
>   glusterfs-server  3.8.12-1.el6
> 
> @Ben the tail I sent before includes both server and client logs, even
> bricks.

Hmm, maybe the security SW is killing / interfering some how with the gluster 
stack?  Do you know the expected behavior of the antivirus when is sees 
binaries and / or behavior it doesn't recognize?  Maybe FUSE being in user 
space is tripping it up?  Is there any way to configure the anitvirus to white 
list / not interfere with the components of the gluster stack?

-b


> 
> Thanks
> Julio Guevara
> 
> On Wed, Jun 14, 2017 at 11:11 PM, Ben Turner <btur...@redhat.com> wrote:
> 
> > - Original Message -
> > > From: "Julio Guevara" <julioguevara...@gmail.com>
> > > To: gluster-users@gluster.org
> > > Sent: Tuesday, June 13, 2017 4:43:06 PM
> > > Subject: [Gluster-users] Transport Endpoint Not connected while running
> >  sysbench on Gluster Volume
> > >
> > > I'm having a hard time trying to get a gluster volume up and running. I
> > have
> > > setup other gluster volumes on other systems without much problems but
> > this
> > > one is killing me.
> > >
> > > The gluster vol was created with the command:
> > > gluster volume create mariadb_gluster_volume
> > > laeft-dccdb01p:/export/mariadb/brick
> > >
> > > I had to lower frame-timeout since the system would become unresponsive
> > until
> > > the frame failed by timeout:
> > > gluster volume set mariadb_gluster_volume networking.frame-timeout 5
> > >
> > > running gluster version: glusterfs 3.8.12
> > >
> > > The workload i'm using is: sysbench --test=fileio --file-total-size=4G
> > > --file-num=64 prepare
> > >
> > > sysbench version: sysbench 0.4.12-5.el6
> > >
> > > kernel version: 2.6.32-696.1.1.el6
> > >
> > > centos: 6.8
> > >
> > > Issue: Whenever I run the sysbench over the mount /var/lib/mysql_backups
> > I
> > > get the error that is shown on the log output.
> > >
> > > It is a constant issue, I can reproduce it when I start increasing the
> > > --file-num for sysbench above 3.
> >
> > It looks like you may be seeing a crash.  If you look at /var/log/messages
> > on all of the clients / servers do you see any crashes / seg faults / ABRT
> > messages in the log?  If so can you open a BZ with the core / other info
> > here?  Here is an example of a crash on one of the bricks:
> >
> > http://lists.gluster.org/pipermail/gluster-users/2016-February/025460.html
> >
> > My guess is something is happening client sidesince we don't see anything
> > in the server logs, check the client mount
> > log(/var/log/glusterfs/.log
> > and the messages file on your client.  Also check messages on the servers.
> > If you see anything shoot us out the info and lets get a BZ open, if not
> > maybe someone else on the list has some other ideas.
> >
> > -b
> >
> > >
> > >
> > >
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport Endpoint Not connected while running sysbench on Gluster Volume

2017-06-14 Thread Ben Turner

- Original Message -
> From: "Julio Guevara" 
> To: gluster-users@gluster.org
> Sent: Tuesday, June 13, 2017 4:43:06 PM
> Subject: [Gluster-users] Transport Endpoint Not connected while running   
> sysbench on Gluster Volume
> 
> I'm having a hard time trying to get a gluster volume up and running. I have
> setup other gluster volumes on other systems without much problems but this
> one is killing me.
> 
> The gluster vol was created with the command:
> gluster volume create mariadb_gluster_volume
> laeft-dccdb01p:/export/mariadb/brick
> 
> I had to lower frame-timeout since the system would become unresponsive until
> the frame failed by timeout:
> gluster volume set mariadb_gluster_volume networking.frame-timeout 5
> 
> running gluster version: glusterfs 3.8.12
> 
> The workload i'm using is: sysbench --test=fileio --file-total-size=4G
> --file-num=64 prepare
> 
> sysbench version: sysbench 0.4.12-5.el6
> 
> kernel version: 2.6.32-696.1.1.el6
> 
> centos: 6.8
> 
> Issue: Whenever I run the sysbench over the mount /var/lib/mysql_backups I
> get the error that is shown on the log output.
> 
> It is a constant issue, I can reproduce it when I start increasing the
> --file-num for sysbench above 3.

It looks like you may be seeing a crash.  If you look at /var/log/messages on 
all of the clients / servers do you see any crashes / seg faults / ABRT 
messages in the log?  If so can you open a BZ with the core / other info here?  
Here is an example of a crash on one of the bricks:

http://lists.gluster.org/pipermail/gluster-users/2016-February/025460.html

My guess is something is happening client sidesince we don't see anything in 
the server logs, check the client mount 
log(/var/log/glusterfs/.log and the messages file on your client.  
Also check messages on the servers.  If you see anything shoot us out the info 
and lets get a BZ open, if not maybe someone else on the list has some other 
ideas.

-b

> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow write times to gluster disk

2017-06-12 Thread Ben Turner

- Original Message -
> From: "Pat Haley" <pha...@mit.edu>
> To: "Ben Turner" <btur...@redhat.com>, "Pranith Kumar Karampuri" 
> <pkara...@redhat.com>
> Cc: "Ravishankar N" <ravishan...@redhat.com>, gluster-users@gluster.org, 
> "Steve Postma" <spos...@ztechnet.com>
> Sent: Monday, June 12, 2017 2:35:41 PM
> Subject: Re: [Gluster-users] Slow write times to gluster disk
> 
> 
> Hi Guys,
> 
> I was wondering what our next steps should be to solve the slow write times.
> 
> Recently I was debugging a large code and writing a lot of output at
> every time step.  When I tried writing to our gluster disks, it was
> taking over a day to do a single time step whereas if I had the same
> program (same hardware, network) write to our nfs disk the time per
> time-step was about 45 minutes. What we are shooting for here would be
> to have similar times to either gluster of nfs.

I can see in your test:

http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt

You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / sec} / 
#replicas{2} = 600).  Gluster does client side replication so with replica 2 
you will only ever see 1/2 the speed of your slowest part of the stack(NW, 
disk, RAM, CPU).  This is usually NW or disk and 600 is normally a best case.  
Now in your output I do see the instances where you went down to 200 MB / sec.  
I can only explain this in three ways:

1.  You are not using conv=fdatasync and writes are actually going to page 
cache and then being flushed to disk.  During the fsync the memory is not yet 
available and the disks are busy flushing dirty pages.
2.  Your storage RAID group is shared across multiple LUNS(like in a SAN) and 
when write times are slow the RAID group is busy serviceing other LUNs.
3.  Gluster bug / config issue / some other unknown unknown.

So I see 2 issues here:

1.  NFS does in 45 minutes what gluster can do in 24 hours.
2.  Sometimes your throughput drops dramatically.

WRT #1 - have a look at my estimates above.  My formula for guestimating 
gluster perf is: throughput = NIC throughput or storage(whatever is slower) / # 
replicas * overhead(figure .7 or .8).  Also the larger the record size the 
better for glusterfs mounts, I normally like to be at LEAST 64k up to 1024k:

# dd if=/dev/zero of=/gluster-mount/file bs=1024k count=1 conv=fdatasync

WRT #2 - Again, I question your testing and your storage config.  Try using 
conv=fdatasync for your DDs, use a larger record size, and make sure that your 
back end storage is not causing your slowdowns.  Also remember that with 
replica 2 you will take ~50% hit on writes because the client uses 50% of its 
bandwidth to write to one replica and 50% to the other.

-b

> 
> Thanks
> 
> Pat
> 
> 
> On 06/02/2017 01:07 AM, Ben Turner wrote:
> > Are you sure using conv=sync is what you want?  I normally use
> > conv=fdatasync, I'll look up the difference between the two and see if it
> > affects your test.
> >
> >
> > -b
> >
> > - Original Message -
> >> From: "Pat Haley" <pha...@mit.edu>
> >> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> >> Cc: "Ravishankar N" <ravishan...@redhat.com>, gluster-users@gluster.org,
> >> "Steve Postma" <spos...@ztechnet.com>, "Ben
> >> Turner" <btur...@redhat.com>
> >> Sent: Tuesday, May 30, 2017 9:40:34 PM
> >> Subject: Re: [Gluster-users] Slow write times to gluster disk
> >>
> >>
> >> Hi Pranith,
> >>
> >> The "dd" command was:
> >>
> >>   dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
> >>
> >> There were 2 instances where dd reported 22 seconds. The output from the
> >> dd tests are in
> >>
> >> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> >>
> >> Pat
> >>
> >> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> >>> Pat,
> >>> What is the command you used? As per the following output, it
> >>> seems like at least one write operation took 16 seconds. Which is
> >>> really bad.
> >>>96.391165.10 us  89.00 us*16487014.00 us*  393212
> >>>WRITE
> >>>
> >>>
> >>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <pha...@mit.edu
> >>> <mailto:pha...@mit.edu>> wrote:
> >>>
> >>>
> >>>  Hi Pranith,
> >>>
> >>>  I ran the same 'dd' test both in the gl

Re: [Gluster-users] Slow write times to gluster disk

2017-06-01 Thread Ben Turner

Are you sure using conv=sync is what you want?  I normally use conv=fdatasync, 
I'll look up the difference between the two and see if it affects your test.


-b

- Original Message -
> From: "Pat Haley" <pha...@mit.edu>
> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> Cc: "Ravishankar N" <ravishan...@redhat.com>, gluster-users@gluster.org, 
> "Steve Postma" <spos...@ztechnet.com>, "Ben
> Turner" <btur...@redhat.com>
> Sent: Tuesday, May 30, 2017 9:40:34 PM
> Subject: Re: [Gluster-users] Slow write times to gluster disk
> 
> 
> Hi Pranith,
> 
> The "dd" command was:
> 
>  dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
> 
> There were 2 instances where dd reported 22 seconds. The output from the
> dd tests are in
> 
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> 
> Pat
> 
> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> > Pat,
> >What is the command you used? As per the following output, it
> > seems like at least one write operation took 16 seconds. Which is
> > really bad.
> >   96.391165.10 us  89.00 us*16487014.00 us*  393212
> >   WRITE
> >
> >
> > On Tue, May 30, 2017 at 10:36 PM, Pat Haley <pha...@mit.edu
> > <mailto:pha...@mit.edu>> wrote:
> >
> >
> > Hi Pranith,
> >
> > I ran the same 'dd' test both in the gluster test volume and in
> > the .glusterfs directory of each brick.  The median results (12 dd
> > trials in each test) are similar to before
> >
> >   * gluster test volume: 586.5 MB/s
> >   * bricks (in .glusterfs): 1.4 GB/s
> >
> > The profile for the gluster test-volume is in
> >
> > 
> > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
> > 
> > <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
> >
> > Thanks
> >
> > Pat
> >
> >
> >
> >
> > On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
> >> Let's start with the same 'dd' test we were testing with to see,
> >> what the numbers are. Please provide profile numbers for the
> >> same. From there on we will start tuning the volume to see what
> >> we can do.
> >>
> >> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <pha...@mit.edu
> >> <mailto:pha...@mit.edu>> wrote:
> >>
> >>
> >> Hi Pranith,
> >>
> >> Thanks for the tip.  We now have the gluster volume mounted
> >> under /home.  What tests do you recommend we run?
> >>
> >> Thanks
> >>
> >> Pat
> >>
> >>
> >>
> >> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
> >>>
> >>>
> >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <pha...@mit.edu
> >>> <mailto:pha...@mit.edu>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> Sorry for the delay.  I never saw received your reply
> >>> (but I did receive Ben Turner's follow-up to your
> >>> reply).  So we tried to create a gluster volume under
> >>> /home using different variations of
> >>>
> >>> gluster volume create test-volume
> >>> mseas-data2:/home/gbrick_test_1
> >>> mseas-data2:/home/gbrick_test_2 transport tcp
> >>>
> >>> However we keep getting errors of the form
> >>>
> >>> Wrong brick type: transport, use
> >>> :
> >>>
> >>> Any thoughts on what we're doing wrong?
> >>>
> >>>
> >>> You should give transport tcp at the beginning I think.
> >>> Anyways, transport tcp is the default, so no need to specify
> >>> so remove those two words from the CLI.
> >>>
> >>>
> >>> Also do you have a list of the test we should be running
> >>> once we get this volume created?  Given the time-zone
> >>> difference it might help if we can run a small battery
> >>> of tests and post the results rather than test-post-new
&g

[Gluster-users] Testing for gbench.

2017-05-15 Thread Ben Turner

Hi all!  A while back I created a benchmark kit for Gluster:

https://github.com/gluster/gbench

To run it just check the help file:

[bturner@ben-laptop bt--0001]$ python GlusterBench.py -h
Gluster Benchmark Kit Options:
  -h --help   Print gbench options.
  -v  Verbose Output.
  -r --record-sizeRecord size to write in for large files in KB
  -s --seq-file-size  The size of file each IOZone thread creates in GB
  -f --files  The nuber of files to create for smallfile tests in KB
  -l --sm-file-size   The size of files to create for smallfile tests in KB
  -n --sample-sizeThe number of samples to collect for each test
  -t --threadsThe number of threads to run applications with
  -m --mount-pointThe mount point gbench runs against
Example: GlusterBench.py -r 1024 -s 8 -f 1 -l 1024 -n 3 -t 4 -m 
/gluster-mount -v

To run it just cd to the dir and run GlusterBench.py:

 $ git clone https://github.com/gluster/gbench.git
 $ cd gbench/bench-tests/bt--0001/
 $ python GlusterBench.py -r 1024 -s 8 -f 1 -l 1024 -n 3 -t 4 -m 
/gluster-mount -v

Gbench will install smallfile for you and create any config files, but you will 
need to have IOzone installed your self as I haven't yet found a reliable repo 
for IOZone.

If anyone is interested in benchmarking their cluster or testing the tool I 
would appreciate it.  Also any problems / enhancements / whatever either email 
me or open an issue on github.  In the future I would love to have a page where 
you can upload your results and see how your cluster's performance compares to 
others.  We are always looking for users / contributors so if you know python 
and want to contribute it would be appreciated!  Thanks!

-b
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow write times to gluster disk

2017-05-14 Thread Ben Turner

- Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Pat Haley" 
> Cc: gluster-users@gluster.org, "Steve Postma" 
> Sent: Friday, May 12, 2017 11:17:11 PM
> Subject: Re: [Gluster-users] Slow write times to gluster disk
> 
> 
> 
> On Sat, May 13, 2017 at 8:44 AM, Pranith Kumar Karampuri <
> pkara...@redhat.com > wrote:
> 
> 
> 
> 
> 
> On Fri, May 12, 2017 at 8:04 PM, Pat Haley < pha...@mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> My question was about setting up a gluster volume on an ext4 partition. I
> thought we had the bricks mounted as xfs for compatibility with gluster?
> 
> Oh that should not be a problem. It works fine.
> 
> Just that xfs doesn't have limits for anything, where as ext4 does for things
> like hardlinks etc(At least last time I checked :-) ). So it is better to
> have xfs.

One of the biggest reasons to use XFS IMHO is that most of the testing / large 
scale deployments(at least that I know of) / etc are done using XFS as a 
backend.  While EXT4 should work I don't think that it has the same level of 
testing as XFS.

-b 



> 
> 
> 
> 
> 
> 
> 
> Pat
> 
> 
> 
> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> 
> 
> On Thu, May 11, 2017 at 9:32 PM, Pat Haley < pha...@mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> The /home partition is mounted as ext4
> /home ext4 defaults,usrquota,grpquota 1 2
> 
> The brick partitions are mounted ax xfs
> /mnt/brick1 xfs defaults 0 0
> /mnt/brick2 xfs defaults 0 0
> 
> Will this cause a problem with creating a volume under /home?
> 
> I don't think the bottleneck is disk. You can do the same tests you did on
> your new volume to confirm?
> 
> 
> 
> 
> Pat
> 
> 
> 
> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> 
> 
> On Thu, May 11, 2017 at 8:57 PM, Pat Haley < pha...@mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> Unfortunately, we don't have similar hardware for a small scale test. All we
> have is our production hardware.
> 
> You said something about /home partition which has lesser disks, we can
> create plain distribute volume inside one of those directories. After we are
> done, we can remove the setup. What do you say?
> 
> 
> 
> 
> 
> Pat
> 
> 
> 
> 
> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> 
> 
> On Thu, May 11, 2017 at 2:48 AM, Pat Haley < pha...@mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> Since we are mounting the partitions as the bricks, I tried the dd test
> writing to /.glusterfs/. The
> results without oflag=sync were 1.6 Gb/s (faster than gluster but not as
> fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ fewer
> disks).
> 
> Okay, then 1.6Gb/s is what we need to target for, considering your volume is
> just distribute. Is there any way you can do tests on similar hardware but
> at a small scale? Just so we can run the workload to learn more about the
> bottlenecks in the system? We can probably try to get the speed to 1.2Gb/s
> on your /home partition you were telling me yesterday. Let me know if that
> is something you are okay to do.
> 
> 
> 
> 
> Pat
> 
> 
> 
> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> 
> 
> On Wed, May 10, 2017 at 10:15 PM, Pat Haley < pha...@mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> Not entirely sure (this isn't my area of expertise). I'll run your answer by
> some other people who are more familiar with this.
> 
> I am also uncertain about how to interpret the results when we also add the
> dd tests writing to the /home area (no gluster, still on the same machine)
> 
> 
> * dd test without oflag=sync (rough average of multiple tests)
> 
> 
> * gluster w/ fuse mount : 570 Mb/s
> * gluster w/ nfs mount: 390 Mb/s
> * nfs (no gluster): 1.2 Gb/s
> * dd test with oflag=sync (rough average of multiple tests)
> 
> * gluster w/ fuse mount: 5 Mb/s
> * gluster w/ nfs mount: 200 Mb/s
> * nfs (no gluster): 20 Mb/s
> 
> Given that the non-gluster area is a RAID-6 of 4 disks while each brick of
> the gluster area is a RAID-6 of 32 disks, I would naively expect the writes
> to the gluster area to be roughly 8x faster than to the non-gluster.
> 
> I think a better test is to try and write to a file using nfs without any
> gluster to a location that is not inside the brick but someother location
> that is on same disk(s). If you are mounting the partition as the brick,
> then we can write to a file inside .glusterfs directory, something like
> /.glusterfs/.
> 
> 
> 
> 
> 
> I still think we have a speed issue, I can't tell if fuse vs nfs is part of
> the problem.
> 
> I got interested in the post because I read that fuse speed is lesser than
> nfs speed which is counter-intuitive to my understanding. So wanted
> clarifications. Now that I got my clarifications where fuse outperformed nfs
> without sync, we can resume testing as described above and try to find what
> it is.

Re: [Gluster-users] Very odd performance issue

2017-05-04 Thread Ben Turner

- Original Message -
> From: "David Miller" 
> To: gluster-users@gluster.org
> Sent: Thursday, May 4, 2017 2:48:38 PM
> Subject: [Gluster-users] Very odd performance issue
> 
> Background: 4 identical gluster servers with 15 TB each in 2x2 setup.
> CentOS Linux release 7.3.1611 (Core)
> glust erfs-server-3.9.1-1.el7.x86_64
> client systems are using:
> glusterfs-client 3.5.2-2+deb8u3
> 
> The cluster has ~12 TB in use with 21 million files. Lots of jpgs. About 12
> clients are mounting gluster volumes.
> 
> Network load is light: iftop shows each server has 10-15 Mbit reads and about
> half that in writes.
> 
> What I’m seeing that concerns me is that one box, gluster4, has roughly twice
> the CPU utilization and twice or more the load average of the other three
> servers. gluster4 has a 24 hour average of about 30% CPU utilization,
> something that seems to me to be way out of line for a couple MB/sec of
> traffic.
> 
> In running volume top, the odd thing I see is that for gluster1-3 I get
> latency summaries like this:
> Brick: gluster1.publicinteractive.com :/gluster/drupal_prod
> —
> %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
>  --- --- ---  
> 
> 9.96 675.07 us 15.00 us 1067793.00 us 205060 INODELK
> 15.85 3414.20 us 16.00 us 773621.00 us 64494 READ
> 51.35 2235.96 us 12.00 us 1093609.00 us 319120 LOOKUP
> 
> … but my problem server has far more inodelk latency:
> 
> 12.01 4712.03 us 17.00 us 1773590.00 us 47214 READ
> 27.50 2390.27 us 14.00 us 1877571.00 us 213121 INODELK
> 28.70 1643.65 us 12.00 us 1837696.00 us 323407 LOOKUP
> 
> The servers are intended to be identical, and are indeed identical hardware.
> 
> Suggestions on where to look or which FM to RT ver welcome indeed.

IIRC INODELK is for internal locking / synchronization:

"GlusterFS has locks translator which provides the following internal locking 
operations called  inodelk, entrylk which are used by afr to achieve 
synchronization of operations on files or directories that conflict with each 
other."

I found a bug where there was a leak:

https://bugzilla.redhat.com/show_bug.cgi?id=1405886

It was fixed in the 3.8 line, it may be worth looking into upgrading the 
gluster version on your clients to eliminate any issues that were fixed between 
3.5(your client version) and 3.9(your server version).

Also, have a look at the brick and client logs.  You could try searching them 
for "INODELK".  Are your clients accessing alot of the same files at the same 
time?  Also on the server where you are seeing the higher load check the self 
heal daemon logs to see if there is any healing happening.

Sorry I don't have anything concrete, like I said it may be worth upgrading the 
clients and having a look at your logs to see if you can glean any information 
from them.

-b

> 
> Thanks,
> 
> David
> 
> 
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS throughput inconsistent

2017-02-22 Thread Ben Turner

- Original Message -
> From: "Deepak Naidu" 
> To: gluster-users@gluster.org
> Sent: Wednesday, February 22, 2017 3:36:22 PM
> Subject: [Gluster-users] GlusterFS throughput inconsistent
> 
> 
> 
> Hello,
> 
> 
> 
> I have GlusterFS 3.8.8. I am using IB RDMA. I have noticed during Write or
> Read the throughput doesn’t seem consistent for same workload(fio command).
> Sometimes I get higher throughput sometimes it quickly goes into half, then
> stays there.

This is strange, if it were me I would try to create a TCP volume transport 
instead of RDMA and see if you can reproduce there.

> 
> 
> 
> I cannot predict a consistent behavior every time when I run the same
> workload. The time to complete varies. Is there any log file or something I
> can look into, to understand this behavior. I am single client(fuse) running
> 32 thread, 1mb block size, creating 200GB or reading 200GB files randomly
> with directIO.

To see upto +-5-10% variance between runs is normal(closer to 5%, 10 is a 
little high).  I haven't seen a 50% like you mentioned above, like I said I 
wonder if this is reproducible on a TCP transport volume?  Can you provide some 
FIO output from a few runs for us to have a look at?

-b


> 
> 
> 
> --
> 
> Deepak
> 
> This email message is for the sole use of the intended recipient(s) and may
> contain confidential information. Any unauthorized review, use, disclosure
> or distribution is prohibited. If you are not the intended recipient, please
> contact the sender by reply email and destroy all copies of the original
> message.
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] rsync to gluster mount: self-heal and bad performance

2015-11-16 Thread Ben Turner

- Original Message -
> From: "Tiemen Ruiten" <t.rui...@rdmedia.com>
> To: "Ben Turner" <btur...@redhat.com>
> Cc: "gluster-users" <gluster-users@gluster.org>
> Sent: Monday, November 16, 2015 5:00:20 AM
> Subject: Re: [Gluster-users] rsync to gluster mount: self-heal and bad 
> performance
> 
> Hello Ben,
> 
> Thank you for your answer. I don't see the same errors when just creating a
> number of files, eg. touch test{..}. Performance is not great, but
> after a few minutes it finishes successfully.

We have made alot of small file perf enhancements, try the following:

-Run on RHEL 7, I have seen a good improvement on el7 over el6.
-Set lookup optimize on
-Set client and server event threads to 4
-There is a metadata perf regression that could be affecting your rsync as 
well, keep an eye on - https://bugzilla.redhat.com/show_bug.cgi?id=1250803 for 
the fix.


> 
> I'm running rsync through lsyncd, the options are:
> 
> /usr/bin/rsync --delete --ignore-errors -zslt -r $source $destination
> 
> I'm running it over a LAN network, between two VMs. The volume is indeed
> mounted with --acl, but on the directory I'm syncing to I haven't set them
> explicitly:

Do you need ACLs?  If not can you try without that option?  I am wondering if 
there is a bug with ACLs that could be causing the self heals to happen.  If we 
don't see it without ACLs that can give us somewhere to look at.

> 
> [tiemen@iron2 test]$ getfacl stg/
> # file: stg/
> # owner: root
> # group: rdcompany
> # flags: -s-
> user::rwx
> group::rwx
> other::r-x
> 
> Volume options:
> 
> [tiemen@iron2 test]$ sudo gluster volume info lpxassets
> 
> Volume Name: lpxassets
> Type: Replicate
> Volume ID: fea00430-63b1-4a4e-bc38-b74d3732acf4
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: iron2:/data/brick/lpxassets
> Brick2: cobalt2:/data/brick/lpxassets
> Brick3: arbiter:/data/arbiter/lpxassets
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> cluster.quorum-type: auto
> cluster.enable-shared-storage: enable
> nfs-ganesha: disable
> 
> Any other info I could provide?
> 
> 
> On 15 November 2015 at 18:11, Ben Turner <btur...@redhat.com> wrote:
> 
> > - Original Message -
> > > From: "Tiemen Ruiten" <t.rui...@rdmedia.com>
> > > To: "gluster-users" <gluster-users@gluster.org>
> > > Sent: Sunday, November 15, 2015 5:22:08 AM
> > > Subject: Re: [Gluster-users] rsync to gluster mount: self-heal and bad
> >   performance
> > >
> > > Any other suggestions?
> >
> > You are correct, rsync should not cause self heal on every file.  It makes
> > me think that Ernie is correct and that something isn't correct.  If you
> > just create a bunch of files out side of rsync do you see the same
> > behavior?  What rsync command are you running, where are you syncing the
> > data from?  I see you have the acl mount option, are you using ACLs?
> >
> > -b
> >
> >
> > >
> > > On 13 November 2015 at 09:56, Tiemen Ruiten < t.rui...@rdmedia.com >
> > wrote:
> > >
> > >
> > >
> > > Hello Ernie, list,
> > >
> > > No, that's not the case. The volume is mounted through glusterfs-fuse -
> > on
> > > the same server running one of the bricks. The fstab:
> > >
> > > # /etc/fstab
> > > # Created by anaconda on Tue Aug 18 18:10:49 2015
> > > #
> > > # Accessible filesystems, by reference, are maintained under '/dev/disk'
> > > # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more
> > info
> > > #
> > > UUID=56778fed-bf3f-435e-8c32-edaa8c707f29 / xfs defaults 0 0
> > > UUID=a44e32ed-cfbe-4ba0-896f-1efff9397ba1 /boot xfs defaults 0 0
> > > UUID=a344d2bc-266d-4905-85b1-fbb7fe927659 swap swap defaults 0 0
> > > /dev/vdb1 /data/brick xfs defaults 1 2
> > > iron2:/lpxassets /mnt/lpxassets glusterfs _netdev,acl 0 0
> > >
> > >
> > >
> > >
> > > On 12 November 2015 at 22:50, Ernie Dunbar < maill...@lightspeed.ca >
> > wrote:
> > >
> > >
> > > Hi Tiemen
> > >
> > > It sounds like you're trying to rsync files onto your Gluster server,
> > rather
> > > than to the Gluster filesystem. You want to copy these files into the
> > > mounted filesystem (typically on some other system than the Gluster
> > > servers), because Gluster is designed to handle it that way.
>

Re: [Gluster-users] File changed as we read it

2015-10-20 Thread Ben Turner

- Original Message -
> From: "Gabriel Kuri" 
> To: hm...@t-hamel.fr
> Cc: "gluster-users" 
> Sent: Tuesday, October 20, 2015 12:18:19 PM
> Subject: Re: [Gluster-users] File changed as we read it
> 
> Well, that fixed it. I unmounted/remounted the client after turning on
> 'cluster.consistent-metadata' on the volume and it works fine now. What, if
> any, side effects are there for turning on cluster.consistent-metadata?

Turning on consistent-metadata will decrease metadata workload(think stat, ls 
-l, get xattr, etc) performance to ~1/5 of what it is when that setting is 
disabled in my test env.  I'm sure that varies depending on your #of bricks, 
backend perf, etc.

-b

 
> On Mon, Oct 19, 2015 at 1:30 AM, < hm...@t-hamel.fr > wrote:
> 
> 
> You have to umount/mount on the client, for the parameter to work.
> 
> Thomas
> 
> On 2015-10-15 18:18, Gabriel Kuri wrote:
> 
> 
> I enabled the option on the volume, but it did not fix the issue, the
> problem is still there.
> 
> Gabe
> 
> On Tue, Oct 6, 2015 at 6:10 PM, Krutika Dhananjay
> < kdhan...@redhat.com > wrote:
> 
> 
> 
> Let me know if this helps:
> 
> 
> http://www.gluster.org/pipermail/gluster-users/2015-September/023641.html
> 
> 
> [1]
> 
> -Krutika
> 
> -
> FROM: "Gabriel Kuri" < gk...@ieee.org >
> TO: "Marco" < marco.brign...@lorentino.ns0.it >
> CC: "gluster-users" < Gluster-users@gluster.org >
> SENT: Wednesday, October 7, 2015 5:26:47 AM
> SUBJECT: Re: [Gluster-users] File changed as we read it
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1251291 [2]
> 
> I opened this bug a couple months ago. It's related to a couple
> other bugs that were closed but not fixed. I haven't heard anything
> yet, since it was opened. You're not the only one hitting this
> problem ...
> 
> On Tuesday, October 6, 2015, Marco < marco.brign...@marcobaldo.ch >
> wrote:
> Hello.
> 
> I have got a big number of messages similar to this one by using tar
> on
> a gluster volume
> 
> tar: ./games/pacman-python/screenshot-1.png: file changed as we read
> it
> 
> The files have been for sure not be changed by any other process,
> and I
> have not found any error in the logs nor heal problems.
> 
> Any hypotesis?
> 
> Marco
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org [3]
> http://www.gluster.org/mailman/listinfo/gluster-users [4]
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users [4]
> 
> 
> 
> Links:
> --
> [1] http://www.gluster.org/pipermail/gluster-users/2015-September/023641.html
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1251291
> [3] http://JAVASCRIPT-BLOCKED ;
> [4] http://www.gluster.org/mailman/listinfo/gluster-users
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.6.6 issues

2015-10-19 Thread Ben Turner

Hi David.  Is the cluster still in this state?  If so can you grab a couple 
stack traces from the offending brick (gfs01a) process with gstack?  Make sure 
that its the brick process spinning your CPUs with top or something, we want to 
be sure the stack traces are from the offending process.  That will give us an 
idea of what it is chewing on.  Other than that maybe you could take a couple 
sosreports on the servers and open a BZ.  It may be a good idea to roll back 
versions until we can get this sorted, I don't know how long you can have the 
cluster in this state.  Once you get a bugzilla open I'll try to repro what you 
are seeing to see if this is reproducible.

-b

- Original Message -
> From: "David Robinson" 
> To: gluster-users@gluster.org, "Gluster Devel" 
> Sent: Saturday, October 17, 2015 12:19:36 PM
> Subject: [Gluster-users] 3.6.6 issues
> 
> I upgraded my storage server from 3.6.3 to 3.6.6 and am now having issues. My
> setup (4x2) is shown below. One of the bricks (gfs01a) has a very high
> cpu-load even though the load on the other 3-bricks (gfs01b, gfs02a, gfs02b)
> is almost zero. The FUSE mounted partition is extremely slow and basically
> unuseable since the upgrade. I am getting a lot of the messages shown below
> in the logs on gfs01a and gfs01b. Nothing out of the ordinary is showing up
> on the gfs02a/gfs02b bricks.
> Can someone help?
> [root@gfs01b glusterfs]# gluster volume info homegfs
> 
> Volume Name: homegfs
> Type: Distributed-Replicate
> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
> Options Reconfigured:
> changelog.rollover-time: 15
> changelog.fsync-interval: 3
> changelog.changelog: on
> geo-replication.ignore-pid-check: on
> geo-replication.indexing: off
> storage.owner-gid: 100
> network.ping-timeout: 10
> server.allow-insecure: on
> performance.write-behind-window-size: 128MB
> performance.cache-size: 128MB
> performance.io-thread-count: 32
> server.manage-gids: on
> [root@ gfs01a glusterfs]# tail -f cli.log
> [2015-10-17 16:05:44.299933] I [socket.c:2353:socket_event_handler]
> 0-transport: disconnecting now
> [2015-10-17 16:05:44.331233] I [input.c:36:cli_batch] 0-: Exiting with: 0
> [2015-10-17 16:06:33.397631] I [socket.c:2353:socket_event_handler]
> 0-transport: disconnecting now
> [2015-10-17 16:06:33.432970] I [input.c:36:cli_batch] 0-: Exiting with: 0
> [2015-10-17 16:11:22.441290] I [socket.c:2353:socket_event_handler]
> 0-transport: disconnecting now
> [2015-10-17 16:11:22.472227] I [input.c:36:cli_batch] 0-: Exiting with: 0
> [2015-10-17 16:15:44.176391] I [socket.c:2353:socket_event_handler]
> 0-transport: disconnecting now
> [2015-10-17 16:15:44.205064] I [input.c:36:cli_batch] 0-: Exiting with: 0
> [2015-10-17 16:16:33.366424] I [socket.c:2353:socket_event_handler]
> 0-transport: disconnecting now
> [2015-10-17 16:16:33.377160] I [input.c:36:cli_batch] 0-: Exiting with: 0
> [root@ gfs01a glusterfs]# tail etc-glusterfs-glusterd.vol.log
> [2015-10-17 15:56:33.177207] I
> [glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume Source
> [2015-10-17 16:01:22.303635] I
> [glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume Software
> [2015-10-17 16:05:44.320555] I
> [glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume homegfs
> [2015-10-17 16:06:17.204783] W [rpcsvc.c:254:rpcsvc_program_actor]
> 0-rpc-service: RPC program not available (req 1298437 330)
> [2015-10-17 16:06:17.204811] E [rpcsvc.c:544:rpcsvc_check_and_reply_error]
> 0-rpcsvc: rpc actor failed to complete successfully
> [2015-10-17 16:06:33.408695] I
> [glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume Source
> [2015-10-17 16:11:22.462374] I
> [glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume Software
> [2015-10-17 16:12:30.608092] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo]
> 0-: Unable to get transaction opinfo for transaction ID :
> d143b66b-2ac9-4fd9-8635-fe1eed41d56b
> [2015-10-17 16:15:44.198292] I
> [glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume homegfs
> [2015-10-17 16:16:33.368170] I
>

Re: [Gluster-users] Speed up heal performance

2015-10-14 Thread Ben Turner

- Original Message -
> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> To: "Ben Turner" <btur...@redhat.com>, "Humble Devassy Chirammal" 
> <humble.deva...@gmail.com>, "Atin Mukherjee"
> <atin.mukherje...@gmail.com>
> Cc: "gluster-users" <gluster-users@gluster.org>
> Sent: Wednesday, October 14, 2015 1:39:14 AM
> Subject: Re: [Gluster-users] Speed up heal performance
> 
> 
> 
> On 10/13/2015 07:11 PM, Ben Turner wrote:
> > - Original Message -
> >> From: "Humble Devassy Chirammal" <humble.deva...@gmail.com>
> >> To: "Atin Mukherjee" <atin.mukherje...@gmail.com>
> >> Cc: "Ben Turner" <btur...@redhat.com>, "gluster-users"
> >> <gluster-users@gluster.org>
> >> Sent: Tuesday, October 13, 2015 6:14:46 AM
> >> Subject: Re: [Gluster-users] Speed up heal performance
> >>
> >>> Good news is we already have a WIP patch review.glusterd.org/10851 to
> >> introduce multi threaded shd. Credits to Richard/Shreyas from facebook for
> >> this. IIRC, we also have a BZ for the same
> >> Isnt it the same bugzilla (
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1221737) mentioned in the
> >> commit log?
> > @Lindsay - No need for a BZ, the above BZ should suffice.
> >
> > @Anyone - In the commit I see:
> >
> >  { .key= "cluster.shd-max-threads",
> >.voltype= "cluster/replicate",
> >.option = "shd-max-threads",
> >.op_version = 1,
> >.flags  = OPT_FLAG_CLIENT_OPT
> >  },
> >  { .key= "cluster.shd-thread-batch-size",
> >.voltype= "cluster/replicate",
> >.option = "shd-thread-batch-size",
> >.op_version = 1,
> >.flags  = OPT_FLAG_CLIENT_OPT
> >  },
> >
> > So we can tune max threads and thread batch size?  I understand max
> > threads, but what is batch size?  In my testing on 10G NICs with a backend
> > that will service 10G throughput I see about 1.5 GB per minute of SH
> > throughput.  To Lindsay's other point, will this patch improve SH
> > throughput?  My systems can write at 1.5 GB / Sec and NICs can to 1.2 GB /
> > sec but I only see ~1.5 GB per _minute_ of SH throughput.  If we can not
> > only make SH multi threaded, but improve the performance of a single
> > thread that would be awesome.  Super bonus points if we can have some sort
> > of tunible that can limit the bandwidth each thread can consume.  It would
> > be great to be able to crank things up when the systems aren't busy and
> > slow things down when load increases.
> This patch is not merged because I thought we needed throttling feature
> to go in before we can merge this for better control of the self-heal
> speed. We are doing that for 3.8. So expect to see both of these for 3.8.

Great news!  You da man Pranith, next time I am on your side of the world beers 
are on me :)

-b

> 
> Pranith
> >
> > -b
> >
> >
> >> --Humble
> >>
> >>
> >> On Tue, Oct 13, 2015 at 7:26 AM, Atin Mukherjee
> >> <atin.mukherje...@gmail.com>
> >> wrote:
> >>
> >>> -Atin
> >>> Sent from one plus one
> >>> On Oct 13, 2015 3:16 AM, "Ben Turner" <btur...@redhat.com> wrote:
> >>>> - Original Message -
> >>>>> From: "Lindsay Mathieson" <lindsay.mathie...@gmail.com>
> >>>>> To: "gluster-users" <gluster-users@gluster.org>
> >>>>> Sent: Friday, October 9, 2015 9:18:11 AM
> >>>>> Subject: [Gluster-users] Speed up heal performance
> >>>>>
> >>>>> Is there any way to max out heal performance? My cluster is unused
> >>> overnight,
> >>>>> and lightly used at lunchtimes, it would be handy to speed up a heal.
> >>>>>
> >>>>> The only tuneable I found was cluster.self-heal-window-size, which
> >>> doesn't
> >>>>> seem to make much difference.
> >>>> I don't know of any way to speed this up, maybe someone else could chime
> >>> in here that knows the heal daemon better than me.  Maybe you could open
> >>> an
> >>> RFE on this?  In my testing I only see 2 files getting healed at a time
> >>> per
&g

Re: [Gluster-users] Test results and Performance Tuning efforts ...

2015-10-12 Thread Ben Turner



- Original Message -
> From: "Lindsay Mathieson" 
> To: "gluster-users" 
> Sent: Thursday, October 8, 2015 8:10:09 PM
> Subject: [Gluster-users] Test results and Performance Tuning efforts ...
> 
> 
> 
> Morning, hope the folllowing ramble is ok, just examining the results of some
> extensive (and destructive  ) testing of gluster 3.6.4 on some disks I had
> spare. Cluster purpose is solely for hosting qemu vm’s via Proxmox 3.4
> 
> 
> 
> Setup: 3 Nodes, well spec’d
> 
> - 64 GB RAM
> 
> - VNB & VNG
> 
> * CPU : E5-2620
> 
> - VNA
> 
> * CPU’s : Dual E5-2660
> 
> - Already in use as a Proxmox and Ceph Cluster running 30 Windows VM’s
> 
> 
> 
> Gluster Bricks.
> 
> - All bricks on ZFS with 4 GB RAM ZIL, 1GB SSD SLOG and 10GB SSD Cache
> 
> - LZ4 Compression
> 
> - Sync disabled
> 
> 
> 
> Brick 1:
> 
> - 6 Velocitoraptors in a RAID10+ (3 Mirrors)
> 
> - High performance
> 
> - Already hosting 8 VM’s
> 
> 
> 
> Bricks 2 & 3:
> 
> - Spare external USB 1TB Toshiba Drive attached via USB3
> 
> - Crap performance  About 50/100 MB/s R/W
> 
> 
> 
> 
> 
> Overall impressions – pretty good. Installation is easy and now I’ve been
> pointed to up to date docs and got the hang of the commands, I’m happy with
> the administration – vastly simpler than Ceph. The ability to access the
> files on the native filesystem is good for peace of mind and enables some
> interesting benchmark comparisons. I simulated drive failure by killing all
> the gluster processes on a node and it seemed to cope ok.
> 
> 
> 
> I would like to see better status information such as “Heal % progress”,
> “Rebalance % progress”
> 
> 
> 
> NB: Pulling a USB external drive is a * bad * idea as it has no TLER support
> and this killed an entire node, had to hard reset it. In production I would
> use something like WD Red NAS drives.
> 
> 
> 
> 
> 
> Despite all the abuse I threw at it I had no problems with split brain etc
> and the integration with proxmox is excellent. When running write tests I
> was very pleased to see it max out my bonded 2x1GB connections, something
> ceph has never been able to do. I consistently got 110+ MB/s raw write
> results inside VM’s
> 
> 
> 
> Currently running 4 VM’s off the Gluster datastore with no issues.
> 
> 
> 
> Benchmark results – done using Crystal DiskMark inside a Windows 7 VM, with
> VIRTIO drivers and writeback enabled. I tested a Gluster replica 3 setup,
> replica 1 and direct off the disk (ZFS). Multpile tests were run to get a
> feel for average results.
> 
> 
> 
> Node VNB
> 
> - Replica 3
> 
> - Local Brick: External USB Toshiba Drive
> 
> - ---
> 
> - CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
> 
> - Crystal Dew World : http://crystalmark.info/
> 
> - ---
> 
> - * MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]
> 
> -
> 
> - Sequential Read : 738.642 MB/s
> 
> - Sequential Write : 114.461 MB/s
> 
> - Random Read 512KB : 720.623 MB/s
> 
> - Random Write 512KB : 115.084 MB/s
> 
> - Random Read 4KB (QD=1) : 9.684 MB/s [ 2364.3 IOPS]
> 
> - Random Write 4KB (QD=1) : 2.511 MB/s [ 613.0 IOPS]
> 
> - Random Read 4KB (QD=32) : 24.264 MB/s [ 5923.7 IOPS]
> 
> - Random Write 4KB (QD=32) : 5.685 MB/s [ 1387.8 IOPS]
> 
> -
> 
> - Test : 1000 MB [C: 70.1% (44.8/63.9 GB)] (x5)
> 
> - Date : 2015/10/09 9:30:37
> 
> - OS : Windows 7 Professional N SP1 [6.1 Build 7601] (x64)
> 
> 
> 
> 
> 
> Node VNA
> 
> - Replica 1 (So no writing over ethernet)
> 
> - Local Brick: High performance Velocipraptors in RAID10
> 
> - Sequential Read : 735.224 MB/s
> 
> - Sequential Write : 718.203 MB/s
> 
> - Random Read 512KB : 888.090 MB/s
> 
> - Random Write 512KB : 453.174 MB/s
> 
> - Random Read 4KB (QD=1) : 11.808 MB/s [ 2882.9 IOPS]
> 
> - Random Write 4KB (QD=1) : 4.249 MB/s [ 1037.4 IOPS]
> 
> - Random Read 4KB (QD=32) : 34.787 MB/s [ 8492.8 IOPS]
> 
> - Random Write 4KB (QD=32) : 5.487 MB/s [ 1339.5 IOPS]
> 
> 
> 
> 
> 
> Node VNA
> 
> - Direct on ZFS (No Gluster)
> 
> - Sequential Read : 2841.216 MB/s
> 
> - Sequential Write : 1568.681 MB/s
> 
> - Random Read 512KB : 1753.746 MB/s
> 
> - Random Write 512KB : 1219.437 MB/s
> 
> - Random Read 4KB (QD=1) : 26.852 MB/s [ 6555.6 IOPS]
> 
> - Random Write 4KB (QD=1) : 20.930 MB/s [ 5109.8 IOPS]
> 
> - Random Read 4KB (QD=32) : 58.515 MB/s [ 14286.0 IOPS]
> 
> - Random Write 4KB (QD=32) : 46.303 MB/s [ 11304.3 IOPS]
> 
> 
> 
> 
> 
> 
> 
> Performance:
> 
> Raw read performance is excellent, averaging 700Mb/s – I’d say the ZFS &
> Cluster caches are working well.
> 
> As mentioned raw write maxed out at 110 MB/s, near the max ethernet speed.
> 
> Random I/O is pretty average, it could be the Toshba drives bring things
> down, though even when I took them out of the equation it wasn’t much
> improved.
> 
> 
> 
> Direct off the disk was more than double the replica 1

Re: [Gluster-users] Speed up heal performance

2015-10-12 Thread Ben Turner

- Original Message -
> From: "Lindsay Mathieson" 
> To: "gluster-users" 
> Sent: Friday, October 9, 2015 9:18:11 AM
> Subject: [Gluster-users] Speed up heal performance
> 
> Is there any way to max out heal performance? My cluster is unused overnight,
> and lightly used at lunchtimes, it would be handy to speed up a heal.
> 
> The only tuneable I found was cluster.self-heal-window-size, which doesn't
> seem to make much difference.

I don't know of any way to speed this up, maybe someone else could chime in 
here that knows the heal daemon better than me.  Maybe you could open an RFE on 
this?  In my testing I only see 2 files getting healed at a time per replica 
pair.  I would like to see this be multi threaded(if its not already) with the 
ability to tune it to control resource usage(similar to what we did in the 
rebalance refactoring done recently).  If you let me know the BZ # I'll add my 
data + suggestions, I have been testing this pretty extensively in recent weeks 
and good data + some ideas on how to speed things up.

-b
 
> thanks,
> --
> Lindsay
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Tuning for small files

2015-09-30 Thread Ben Turner

- Original Message -
> From: "Iain Milne" 
> To: gluster-users@gluster.org
> Sent: Wednesday, September 30, 2015 11:00:07 AM
> Subject: Re: [Gluster-users] Tuning for small files
> 
> > Here are all 3 of the settings I was talking about:
> >
> > gluster v set testvol client.event-threads 4
> > gluster v set testvol server.event-threads 4
> > gluster v set testvol performance.lookup-optimize on
> >
> > Yes, lookup optimize needs to be enabled.
> 
> Thanks for that. The first two worked ok for me, but the third gives:
>   volume set: failed: option : performance.lookup-optimize does not exist
>   Did you mean performance.cache-size or ...lazy-open?
> 
> If I do: "gluster volume get  all | grep optimize" then I see:
>   cluster.lookup-optimize off
>   cluster.readdir-optimizeoff
> 
> Is it one of these two options perhaps?

Ya do cluster.lookup-optimize on, it was a typo on my part, was going from 
memory and I always confuse that one.

-b

 
> We're running 3.7.4 on Centos 6.7
> 
> Thanks
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Tuning for small files

2015-09-30 Thread Ben Turner

- Original Message -
> From: "Iain Milne" 
> To: gluster-users@gluster.org
> Sent: Wednesday, September 30, 2015 2:48:57 AM
> Subject: Re: [Gluster-users] Tuning for small files
> 
> > Where you run into problems with smallfiles on gluster is latency of
> sending
> > data over the wire.  For every smallfile create there are a bunch of
> different
> > file opetations we have to do on every file.  For example we will have
> to do
> > at least 1 lookup per brick to make sure that the file doesn't exist
> anywhere
> > before we create it.  We actually got it down to 1 per brick with lookup
> > optimize on, its 2 IIRC(maybe more?) with it disabled.
> 
> Is this lookup optimize something that needs to be enabled manually with
> 3.7, and if so, how?

Here are all 3 of the settings I was talking about:

gluster v set testvol client.event-threads 4
gluster v set testvol server.event-threads 4
gluster v set testvol performance.lookup-optimize on

Yes, lookup optimize needs to be enabled.

-b

> Thanks
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Tuning for small files

2015-09-29 Thread Ben Turner

- Original Message -
> From: "Thibault Godouet" <tib...@godouet.net>
> To: "Ben Turner" <btur...@redhat.com>
> Cc: hm...@t-hamel.fr, gluster-users@gluster.org
> Sent: Tuesday, September 29, 2015 1:36:20 PM
> Subject: Re: [Gluster-users] Tuning for small files
> 
> Ben,
> 
> I suspect meta-data / 'ls -l' performance is very important for my svn
> use-case.
> 
> Having said that, what do you mean by small file performance? I thought
> what people meant by this was really the overhead of meta-data, with a 'ls
> -l' being a sort of extreme case (pure meta-data).
> Obviously if you also have to read and write actual data (albeit not much
> at all per file), then the effect of meta-data overhead would get diluted
> to a degree, bit potentially still very present.

Where you run into problems with smallfiles on gluster is latency of sending 
data over the wire.  For every smallfile create there are a bunch of different 
file opetations we have to do on every file.  For example we will have to do at 
least 1 lookup per brick to make sure that the file doesn't exist anywhere 
before we create it.  We actually got it down to 1 per brick with lookup 
optimize on, its 2 IIRC(maybe more?) with it disabled.  So the time we spend 
waiting for those lookups to complete adds to latency which lowers the number 
of files that can be created in a given period of time.  Lookup optimize was 
implemented in 3.7 and like I said its now at the optimal 1 lookup per brick on 
creates.

The other problem with small files that we had in 3.6 is that we were using a 
single threaded event listener(epoll is what we call it).  This single thread 
would spike a CPU to 100%(called a hot thread) and glusterfs would become CPU 
bound.  The solution here was to make the event listener multi threaded so that 
we could spread the epoll load across CPUs there by eliminating the CPU 
bottleneck and allowing us to process more events in a given time.  FYI epoll 
is defaulted to 2 threads in 3.7, but I have seen cases where I still 
bottlenecked on CPU without 4 threads in my envs, so I usually do 4.  This was 
implemented in upstream 3.7 but was backported to RHGS 3.0.4 if you have a RH 
based version.

Fixing these two issues lead to the performance gains I was talking about with 
smallfile creates.  You are probably thinking from a distributed FS + metadata 
server perspective(MDS) where the bottleneck is the MDS for smallfiles.  Since 
gluster doesn't have an MDS that load is transferred to the clients / servers 
and this lead to a CPU bottleneck when epoll was single threaded.  I think this 
is the piece you may have been missing.

> 
> Would there be an easy way to tell how much time is spent on meta-data vs.
> Data in a profile output?

Yep!  Can you gather some profiling info and send it to me?

> 
> One thing I wonder: do your comments apply to both native Fuse and NFS
> mounts?
> 
> Finally, all this brings me back to my initial question really: are there
> any tuning recommendation of configuration tuning for my requirement (small
> file read/writes on a pair of nodes with replication) beyond the thread
> counts and lookup optimize?
> Or are those by far the most important in this scenario?

For creating a bunch of small files those are the only two that I know of that 
will have a large impact, maybe some others from the list can give some input 
on anything else we can do here.

-b

> 
> Thx,
> Thibault.
> - Original Message -
> > From: hm...@t-hamel.fr
> > To: aba...@magix.net
> > Cc: gluster-users@gluster.org
> > Sent: Monday, September 28, 2015 7:40:52 AM
> > Subject: Re: [Gluster-users] Tuning for small files
> >
> > I'm also quite interested by small files performances optimization, but
> > I'm a bit confused about the best option between 3.6/3.7.
> >
> > Ben Turner was saying that 3.6 might give the best performances:
> > http://www.gluster.org/pipermail/gluster-users/2015-September/023733.html
> >
> > What kind of gain is expected (with consistent-metadata) if this
> > regression is solved?
> 
> Just to be clear, the issue I am talking about is metadata only(think ls -l
> or file browsing).  It doesn't affect small file perf(well not that much,
> I'm sure a little, but I have never quantified it), with server and client
> event threads set to 4 + lookup optimize I see between a 200-300% gain on
> my systems on 3.7 vs 3.6 builds.  If I needed fast metadata I would go with
> 3.6, if I need fast smallfile I would go with 3.7.  If I needed both I
> would pick the less of the two evils and go with that one and upgrade when
> the fix is released.
> 
> -b
> 
> 
> >
> > I tried 3.6.5 (last version for debian jessie), and it's a bit better
> > than 3.7.4

Re: [Gluster-users] Tuning for small files

2015-09-28 Thread Ben Turner

- Original Message -
> From: hm...@t-hamel.fr
> To: aba...@magix.net
> Cc: gluster-users@gluster.org
> Sent: Monday, September 28, 2015 7:40:52 AM
> Subject: Re: [Gluster-users] Tuning for small files
> 
> I'm also quite interested by small files performances optimization, but
> I'm a bit confused about the best option between 3.6/3.7.
> 
> Ben Turner was saying that 3.6 might give the best performances:
> http://www.gluster.org/pipermail/gluster-users/2015-September/023733.html
> 
> What kind of gain is expected (with consistent-metadata) if this
> regression is solved?

Just to be clear, the issue I am talking about is metadata only(think ls -l or 
file browsing).  It doesn't affect small file perf(well not that much, I'm sure 
a little, but I have never quantified it), with server and client event threads 
set to 4 + lookup optimize I see between a 200-300% gain on my systems on 3.7 
vs 3.6 builds.  If I needed fast metadata I would go with 3.6, if I need fast 
smallfile I would go with 3.7.  If I needed both I would pick the less of the 
two evils and go with that one and upgrade when the fix is released.

-b


> 
> I tried 3.6.5 (last version for debian jessie), and it's a bit better
> than 3.7.4 but not by much (10-15%).
> 
> I was also wondering if there is recommendations for the underlying file
> system of the bricks (xfs, ext4, tuning...).
> 
> 
> Regards
> 
> Thomas HAMEL
> 
> On 2015-09-28 12:04, André Bauer wrote:
> > If you're not already on Glusterfs 3.7.x i would recommend an update
> > first.
> > 
> > Am 25.09.2015 um 17:49 schrieb Thibault Godouet:
> >> Hi,
> >> 
> >> There are quite a few tuning parameters for Gluster (as seen in
> >> Gluster
> >> volume XYZ get all), but I didn't find much documentation on those.
> >> Some people do seem to set at least some of them, so the knowledge
> >> must
> >> be somewhere...
> >> 
> >> Is there a good source of information to understand what they mean,
> >> and
> >> recommendation on how to set them to get a good small file
> >> performance?
> >> 
> >> Basically what I'm trying to optimize is for svn operations (e.g. svn
> >> checkout, or svn branch) on a replicated 2 x 1 volume (hosted on 2
> >> VMs,
> >> 16GB ram, 4 cores each, 10Gb/s network tested at full speed), using a
> >> NFS mount which appears much faster than fuse in this case (but still
> >> much slower than when served by a normal NFS server).
> >> Any recommendation for such a setup?
> >> 
> >> Thanks,
> >> Thibault.
> >> 
> >> 
> >> 
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >> 
> > 
> > 
> > --
> > Mit freundlichen Grüßen
> > André Bauer
> > 
> > MAGIX Software GmbH
> > André Bauer
> > Administrator
> > August-Bebel-Straße 48
> > 01219 Dresden
> > GERMANY
> > 
> > tel.: 0351 41884875
> > e-mail: aba...@magix.net
> > aba...@magix.net <mailto:Email>
> > www.magix.com <http://www.magix.com/>
> > 
> > 
> > Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Michael Keith
> > Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205
> > 
> > Find us on:
> > 
> > <http://www.facebook.com/MAGIX> <http://www.twitter.com/magix_de>
> > <http://www.youtube.com/wwwmagixcom> <http://www.magixmagazin.de>
> > --
> > The information in this email is intended only for the addressee named
> > above. Access to this email by anyone else is unauthorized. If you are
> > not the intended recipient of this message any disclosure, copying,
> > distribution or any action taken in reliance on it is prohibited and
> > may be unlawful. MAGIX does not warrant that any attachments are free
> > from viruses or other defects and accepts no liability for any losses
> > resulting from infected email transmissions. Please note that any
> > views expressed in this email may be those of the originator and do> > 
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow roaming profiles on top of glusterfs

2015-09-25 Thread Ben Turner

I am _pretty_sure_ its only 3.7+ based, IIRC on upstream anything 3.6 should 
work around it.  Lemme see if I can dig up the patch that fixed the issue that 
lead to this regression so we can be sure, but I would try the latest 3.6.  For 
Red Hat based distros go with 3.0.4(which has the MT epoll patch but no lookup 
optimize, both of which REALLY help smallfile perf).

-b

- Original Message -
> From: "Thibault Godouet" <tib...@godouet.net>
> To: "Ben Turner" <btur...@redhat.com>
> Cc: gluster-users@gluster.org
> Sent: Friday, September 25, 2015 11:43:55 AM
> Subject: Re: [Gluster-users] Very slow roaming profiles on top of glusterfs
> 
> Hi Ben,
> 
> Regarding https://bugzilla.redhat.com/show_bug.cgi?id=1250241which does
> look like a serious regression for small file performance, do you know
> which versions are affected, or is there a way to find out?
> 
> Also the patch didn't make it: do you have visibility on whether another
> patch is likely to land soon?
> 
> If not I may try the version before the regression was introduced...
> 
> Thanks,
> Thibault.
> On 14 Sep 2015 4:22 pm, "Ben Turner" <btur...@redhat.com> wrote:
> 
> > - Original Message -
> > > From: "Diego Remolina" <dijur...@gmail.com>
> > > To: "Alex Crow" <ac...@integrafin.co.uk>
> > > Cc: gluster-users@gluster.org
> > > Sent: Monday, September 14, 2015 9:26:17 AM
> > > Subject: Re: [Gluster-users] Very slow roaming profiles on top of
> > glusterfs
> > >
> > > Hi Alex,
> > >
> > > Thanks for the reply, I was aware of the performance issues with small
> > > files, but never expected an order of magnitude slower. I understand
> > > some improvements were made to 3.7.x to help with low small file
> > > performance, however I did not see any big changes after upgrading
> > > from 3.6.x to 3.7.x.
> > >
> > >
> > http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf
> > >
> > > And the ssd metadata support feature seems to have not had any changes
> > > since September 2014:
> > >
> > > https://forge.gluster.org/gluster-meta-data-on-ssd
> > >
> > > Am I just totally out of luck with gluster for now?
> >
> > Are you using glusterFS mounts or SMB mounts?  As for SMB mounts we are
> > working VERY hard to improve metadata / smallfile performance but as it
> > sits right now we are limited by the number of lookup / stat calls that are
> > issued.  When we can reduce the number of lookups and prefetch the xattrs
> > that SMB / windows needs(I am working on the stat prefetch but don't have a
> > testable solution yet) I expect to see a vast perf improvement but I don't
> > have an ETA for you.
> >
> > On the glusterFS side I see ~300% improvement in smallfile create
> > performance between 3.6 and 3.7.  Try setting:
> >
> > gluster volume set testvol server.event-threads 4
> > gluster volume set testvol client.event-threads 4
> > gluster volume set testvol cluster.lookup-optimize on
> >
> > Unfortunately WRT to metadata operations a fix went in that has negatively
> > affected performance:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1250241
> >
> > I used to see about 25k metatdata operations per second, now I am only
> > seeking 6k.  It looks like there is a patch but I don't know if the fix
> > will get us back to the 25k OPs per second, maybe Pranith can comment on
> > expectations for:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1250803
> >
> > To summarize:
> >
> > SMB - no ETA for improvement
> > GlusterFS smallfile create - 300% increase in my env between 3.6 and 3.7
> > GlusterFS metadata - BZ is in POST(patch is submitted) but I am not sure
> > on the ETA of the fix and if the fix will get back to what I was seeing in
> > 3.6
> >
> > Hope this helps.
> >
> > -b
> >
> > >
> > > Diego
> > >
> > > On Mon, Sep 14, 2015 at 8:37 AM, Alex Crow <ac...@integrafin.co.uk>
> > wrote:
> > > > Hi Diego,
> > > >
> > > > I think it's the overhead of fstat() calls. Gluster keeps its metadata
> > on
> > > > the bricks themselves, and this has to be looked up for every file
> > access.
> > > > For big files this is not an issue as it only happens once, but when
> > > > accessing lots of small files this overhead rapidly builds up, the
> > smaller
> > > > the

Re: [Gluster-users] RAID vs bare drive for bricks

2015-09-23 Thread Ben Turner

- Original Message -
> From: "Gluster Admin" 
> To: gluster-users@gluster.org
> Sent: Wednesday, September 23, 2015 11:48:46 AM
> Subject: [Gluster-users] RAID vs bare drive for bricks
> 
> 
> So in most of the documentation I read from both redhat and gluster.org it
> seems to reference using RAID on the servers for the bricks. This is a nice
> failsafe but obviously has reduced capacity repurcussions as you continue to
> scale with nodes and bricks.
> 
> With Gluster 3.7+ is it still recommended to use hardware RAID for the
> underlying disks of the bricks or especially in the case of Replica-3 would
> it be better to have individual drives as bricks?

See the replica 3 section here:

https://videos.cdn.redhat.com/summit2015/presentations/13767_red-hat-gluster-storage-performance.pdf


> 
> In the scenario of many servers and bricks lets say 12 servers with 12 drives
> each that would yield a scenario where 3 servers would have a copy of the
> data on each brick and assuming a multi rack layout would lead to a fairly
> distributed fault domain.
> 
> Am I missing something here? I can see hardware raid for smaller
> implementations of a few servers but it seems counterproductive for larger
> distributed-replicated setups

The problem with JBOD vs RAID is that you will only see the performance of a 
single client to a single disk without some kind of striping / sharding.  RAID 
aggregates the disks on the back end so single file performance can take 
advantage of all the disks instead of just 1.  JBOD is def the way of the 
future and when sharding is prod read you should be able to take more advantage 
of your disk for single file throughput.  Have a look at hat slide deck I 
linked for the tradeoffs and some perf data.

-b 



> 
> thanks
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow roaming profiles on top of glusterfs

2015-09-14 Thread Ben Turner

- Original Message -
> From: "Diego Remolina" 
> To: "Alex Crow" 
> Cc: gluster-users@gluster.org
> Sent: Monday, September 14, 2015 9:26:17 AM
> Subject: Re: [Gluster-users] Very slow roaming profiles on top of glusterfs
> 
> Hi Alex,
> 
> Thanks for the reply, I was aware of the performance issues with small
> files, but never expected an order of magnitude slower. I understand
> some improvements were made to 3.7.x to help with low small file
> performance, however I did not see any big changes after upgrading
> from 3.6.x to 3.7.x.
> 
> http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf
> 
> And the ssd metadata support feature seems to have not had any changes
> since September 2014:
> 
> https://forge.gluster.org/gluster-meta-data-on-ssd
> 
> Am I just totally out of luck with gluster for now?

Are you using glusterFS mounts or SMB mounts?  As for SMB mounts we are working 
VERY hard to improve metadata / smallfile performance but as it sits right now 
we are limited by the number of lookup / stat calls that are issued.  When we 
can reduce the number of lookups and prefetch the xattrs that SMB / windows 
needs(I am working on the stat prefetch but don't have a testable solution yet) 
I expect to see a vast perf improvement but I don't have an ETA for you.

On the glusterFS side I see ~300% improvement in smallfile create performance 
between 3.6 and 3.7.  Try setting:

gluster volume set testvol server.event-threads 4
gluster volume set testvol client.event-threads 4
gluster volume set testvol cluster.lookup-optimize on

Unfortunately WRT to metadata operations a fix went in that has negatively 
affected performance:

https://bugzilla.redhat.com/show_bug.cgi?id=1250241

I used to see about 25k metatdata operations per second, now I am only seeking 
6k.  It looks like there is a patch but I don't know if the fix will get us 
back to the 25k OPs per second, maybe Pranith can comment on expectations for:

https://bugzilla.redhat.com/show_bug.cgi?id=1250803

To summarize:

SMB - no ETA for improvement
GlusterFS smallfile create - 300% increase in my env between 3.6 and 3.7
GlusterFS metadata - BZ is in POST(patch is submitted) but I am not sure on the 
ETA of the fix and if the fix will get back to what I was seeing in 3.6

Hope this helps.

-b

> 
> Diego
> 
> On Mon, Sep 14, 2015 at 8:37 AM, Alex Crow  wrote:
> > Hi Diego,
> >
> > I think it's the overhead of fstat() calls. Gluster keeps its metadata on
> > the bricks themselves, and this has to be looked up for every file access.
> > For big files this is not an issue as it only happens once, but when
> > accessing lots of small files this overhead rapidly builds up, the smaller
> > the file the worse the issue. Profiles do have hundreds of very small
> > files!
> >
> > I was looking to use GlusterFS for generic file sharing as well, but I
> > noticed the same issue while testing backups from a GlusterFS volume. On
> > one
> > vol (scanned 4-bit greyscale images and small PDFs) backups were taking
> > over
> > 16 hours whereas with a traditional FS they were completing in just over 1
> > hour.
> >
> > It may be worth trying out one of the distributed filesystems that use a
> > separate in-memory metadata server. I've tried LizardFS and MooseFS and
> > they
> > are both much faster than GlusterFS for small files, although large-file
> > sequential performance is not as good (but still plenty for a Samba
> > server).
> >
> > Alex
> >
> >
> > On 14/09/15 13:21, Diego Remolina wrote:
> >>
> >> Bump...
> >>
> >> Anybody has any clues as to how I can try and identify the cause of
> >> the slowness?
> >>
> >> Diego
> >>
> >> On Wed, Sep 9, 2015 at 7:42 PM, Diego Remolina  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am running two glusterfs servers as replicas. I have a 3rd server
> >>> which provides quorum. Since gluster was introduced, we have had an
> >>> issue where windows roaming profiles are extremely slow. The initial
> >>> setup was done on 3.6.x and since 3.7.x has small file performance
> >>> improvements, I upgraded to 3.7.3, but that has not helped.
> >>>
> >>> It seems that for some reason gluster is very slow when dealing with
> >>> lots of small files. I am not sure how to really troubleshoot this via
> >>> samba, but I have come up with other tests that produce rather
> >>> disconcerting results as shown below.
> >>>
> >>> If I run directly on the brick:
> >>> [root@ysmha01 /]# time ( find
> >>> /bricks/hdds/brick/home/jgibbs/.winprofile.V2 -type f > /dev/null )
> >>> real 0m3.683s
> >>> user 0m0.042s
> >>> sys 0m0.154s
> >>>
> >>> Now running on the gluster volume mounted via fuse:
> >>> [root@ysmha01 /]# mount | grep export
> >>> 10.0.1.6:/export on /export type fuse.glusterfs
> >>> (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)
> >>>
> >>> [root@ysmha01 /]# time ( find

Re: [Gluster-users] Need help making a decision choosing MS DFS or Gluster+SAMBA+CTDB

2015-08-10 Thread Ben Turner

- Original Message -
 From: David david.p...@gmail.com
 To: Daniel Müller muel...@tropenklinik.de
 Cc: gluster-users gluster-users@gluster.org
 Sent: Monday, August 10, 2015 4:04:30 AM
 Subject: Re: [Gluster-users] Need help making a decision choosing MS DFS or   
 Gluster+SAMBA+CTDB

 Thanks everyone.

 So from reading all your comments, I understand that if I need an active /
 active synchronized setup for higher workloads, Gluster is for me.
 Other then that, DFS-R is a good option for data replication at the expanse
 of latency of the replicated data to the secondary node, and only one server
 is active per CIFS share.

If smallfile performance is a concern I HIGHLY recommend you steer clear of 
GLUSTER + SMB + CTDB.  Large file sequential and random IO is not great but OK, 
but smallfile and metadata operations(especially from 
Windows clients) are poor.  To put it in perspective I can create 3500 64k 
files / second on my glusterFS mount, on SMB I can only do 308(same HW / 
config).  This is something I am working on improving but there is quite a bit 
to be done on the gluster side for smallfile workloads to make sense 
performance wise:

- on creates, the extra xattrs that SMB requires (ACLs, etc) cause extra round 
trips, proposed solution:

http://www.gluster.org/community/documentation/index.php/Features/composite-operations#CREATE-AND-WRITE

- lack of good, coherent client-side caching (cache invalidation enables longer 
caching of metadata)
- incomplete metadata reads (READDIRPLUS) cause per-file round trips for 
directory scans, proposed solution:

http://www.gluster.org/community/documentation/index.php/Features/composite-operations#READDIRPLUS_used_to_prefetch_xattrs

- case-insensitive file lookup semantics, proposed solution:

http://www.gluster.org/community/documentation/index.php/Features/composite-operations#case-insensitive_volume_support

- high latency of file creates and even reads at the brick level, due to 
excessive system calls, proposed solution:

http://www.gluster.org/community/documentation/index.php/Features/stat-xattr-cache

-b

 Does DFS-R works well on high rate of changes?
 Found from other users use cases that DFS-R caused server hangs and such,
 hope it was fixed in Win2K12 server.

 David

 On Mon, Aug 10, 2015 at 10:34 AM, Daniel Müller  muel...@tropenklinik.de 
 wrote:

 An example of a working share on samba4:

 You can choose to work with vfs objects= glusterfs
 Glusterfs:volume=yourvolume
 Glusterfs:volfile.server=Your.server
 Form e it turned out to be too buggy.

 I just used instead the path=/path/toyour/mountedgluster

 You will need this:
 posix locking =NO
 kernel share modes = No

 [edv]
 comment=edv s4master verzeichnis auf gluster node1
 vfs objects= recycle
 ##vfs objects= recycle, glusterfs
 recycle:repository= /%P/Papierkorb
 ##glusterfs:volume= sambacluster
 ##glusterfs:volfile_server = XXX..
 recycle:exclude = *.tmp,*.temp,*.log,*.ldb,*.TMP,?~$*,~$*,Thumbs.db
 recycle:keeptree = Yes
 recycle:exclude_dir = .Papierkorb,Papierkorb,tmp,temp,profile,.profile
 recycle:touch_mtime = yes
 recycle:versions = Yes
 recycle:minsize = 1
 msdfs root=yes
 path=/mnt/glusterfs/ads/wingroup/edv
 read only=no
 posix locking =NO
 kernel share modes = No
 access based share enum=yes
 hide unreadable=yes
 hide unwriteable files=yes
 veto files = Thumbs.db
 delete veto files = yes

 Greetings
 Daniel

 EDV Daniel Müller

 Leitung EDV
 Tropenklinik Paul-Lechler-Krankenhaus
 Paul-Lechler-Str. 24
 72076 Tübingen
 Tel.: 07071/206-463, Fax: 07071/206-499
 eMail: muel...@tropenklinik.de
 Internet: www.tropenklinik.de

 -Ursprüngliche Nachricht-
 Von: gluster-users-boun...@gluster.org
 [mailto: gluster-users-boun...@gluster.org ] Im Auftrag von Dan Mons
 Gesendet: Montag, 10. August 2015 09:08
 An: Mathieu Chateau
 Cc: gluster-users; David
 Betreff: Re: [Gluster-users] Need help making a decision choosing MS DFS or
 Gluster+SAMBA+CTDB

 If you're looking at a Gluster+Samba setup of any description for people
 extensively using Microsoft Office tools (either Windows or Mac clients), I
 *strongly* suggested exhaustive testing of Microsoft Word and Excel.

 I've yet to find a way to make these work 100% on Gluster. Strange
 client-side locking behaviour with these tools often make documents
 completely unusable when hosted off Gluster. We host our large
 production files (VFX industry) off Gluster, however have a separate Windows
 Server VM purely for administration to host their legacy Microsoft Office
 documents (we've since migrated largely to Google Apps + Google Drive for
 that stuff, but the legacy requirement remains for a handful of users).

 -Dan

 Dan Mons - RD Sysadmin
 Cutting Edge
 http://cuttingedge.com.au

 On 10 August 2015 at 15:42, Mathieu Chateau  mathieu.chat...@lotp.fr 
 wrote:
  Hello,

  what do you mean by true clustering ?
  We can do a Windows Failover cluster (1 virtual ip, 1

Re: [Gluster-users] Very slow ls

2015-08-05 Thread Ben Turner

I am seeing a pretty big perf regression with ls -l on the 3.7 branch:

https://bugzilla.redhat.com/show_bug.cgi?id=1250241

Even when running on cached results I am not seeing what I saw on 3.6:

total threads = 32
total files = 316100
 98.78% of requested files processed, minimum is  70.00
20.056840 sec elapsed time
15760.209342 files/sec

In my 3.6 tests I was seeing 20k+ files per second uncached.

-b

- Original Message -
 From: Mathieu Chateau mathieu.chat...@lotp.fr
 To: Florian Oppermann gluster-us...@flopwelt.de
 Cc: gluster-users gluster-users@gluster.org
 Sent: Tuesday, August 4, 2015 4:07:13 AM
 Subject: Re: [Gluster-users] Very slow ls
 
 Sorry only read replicated in your first mail, I squashed the distributed one
 :'(
 
 
 
 Cordialement,
 Mathieu CHATEAU
 http://www.lotp.fr
 
 2015-08-04 9:47 GMT+02:00 Florian Oppermann  gluster-us...@flopwelt.de  :
 
 
 In my current configuration I have a distributed and replicated volume
 which is (to my understanding) similar to a raid 10.
 
 On 04.08.2015 08 :51, Mathieu Chateau wrote:
  In a replicated scheme, it's like a raid 1 (mirror).
  You write as slow as the slowest disk. Client will wait for all brick
  writes confirmation.
  
  In this scheme, you wouldn't much more than 3 bricks.
  
  I think you mox up with distributed scheme, which is like a raid 0
  stripped.
  This one get more perf when adding bricks. But a single file is
  present only in one brick.
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Please advise for our file server cluster

2015-06-08 Thread Ben Turner

- Original Message -
 From: Gao g...@pztop.com
 To: gluster-users@gluster.org
 Sent: Monday, June 8, 2015 12:58:56 PM
 Subject: Re: [Gluster-users] Please advise for our file server cluster

 On 15-06-05 04:30 PM, Gao wrote:
  Hi,

  We are a small business and now we are planning to build a new file
  server system. I did some research and I decide to use GlusterFS as
  the cluster system to build a 2-node system. Our goals are trying to
  minimize the downtime and to avoid single point of failure. Meanwhile,
  I need keep an eye on the budget.

  In our office we have 20+ computers running Ubuntu. Few(6) machines
  use Windows 8. We use a SAMBA server to take care file sharing.

What file sizes / access patterns are you planning on using?  Smallfile and 
stat / metadata operations on Windows / Samba will be much slower than using 
glusterfs or NFS mounts.  Be sure to clearly identify your performance 
requirements before you go to size your HW.

  I did some research and here are some main components I selected for
  the system:
  M/B: Asus P9D-E/4L (It has 6 SATA ports so I can use softRAID5 for
  data storage. 4 NIC ports so I can do link aggregation)
  CPU: XEON E3-1220v3 3.1GHz (is this over kill? the MB also support i3
  though.)
  Memory: 4x8GB ECC DDR3
  SSD: 120 GB for OS
  Hard Drive: 4 (or 5) 3TB 7200RPM drive to form soft RAID5
  10GBe card: Intel X540-T1

Seems reasonable.  I would expect 40-60 MB / sec writes and 80-100 MB / sec 
reads over gigabit with sequential workloads.  Over 10G I would expect ~200-400 
MB / sec for sequential reads and writes.  Glusterfs and NFS mounts will 
perform better but it sounds like you need samba for your windows hosts.

  About the hardware I am not confident. One thing is the 10GBe card. Is
  it sufficient? I chose this because it's less expensive. But I don't
  want it drag the system down once I build them. Also, if I only need 2
  nodes, can I just use CAT6 cable to link them together? or I have to
  use a 10GBe switch?

It all depends on your performance requirements.  You will need a 10G switch if 
you want the clients to access the servers over 10G.  If you don't need more 
than 120 MB / sec you can use gigabit, but if you need more then you will have 
to goto the 10G NICs.  

  Could someone give me some advice?

  Thanks.

  Gao

 Any help? Please.

 --

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

2015-06-08 Thread Ben Turner

- Original Message -
 From: Geoffrey Letessier geoffrey.letess...@cnrs.fr
 To: Ben Turner btur...@redhat.com
 Cc: Pranith Kumar Karampuri pkara...@redhat.com, gluster-users@gluster.org
 Sent: Monday, June 8, 2015 8:37:08 AM
 Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

 Hello,

 Do you know more about?

 In addition, do you know how to « activate » RDMA for my volume with
 Intel/QLogic QDR? Currently, i mount my volumes with RDMA transport-type
 option (both in server and client side) but I notice all streams are using
 TCP stack -and my bandwith never exceed 2.0-2.5Gbs (250-300MB/s).

That is a little slow for the HW you described.  Can you check what you get 
with iperf just between the clients and servers?  https://iperf.fr/  With 
replica 2 and 10G NW you should see ~400 MB / sec sequential writes and ~600 MB 
/ sec reads.  Can you send me the output from gluster v info?  You specify RDMA 
volumes at create time by running gluster v create blah transport rdma, did you 
specify RDMA when you created the volume?  What block size are you using in 
your tests?  1024 KB writes perform best with glusterfs, and the block size 
gets smaller perf will drop a little bit.  I wouldn't write in anything under 
4k blocks, the sweet spot is between 64k and 1024k.

-b

 Thanks in advance,
 Geoffrey
 --
 Geoffrey Letessier
 Responsable informatique  ingénieur système
 UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
 Institut de Biologie Physico-Chimique
 13, rue Pierre et Marie Curie - 75005 Paris
 Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr

  Le 2 juin 2015 à 23:45, Geoffrey Letessier geoffrey.letess...@cnrs.fr a
  écrit :

  Hi Ben,

  I just check my messages log files, both on client and server, and I dont
  find any hung task you notice on yours..

  As you can read below, i dont note the performance issue in a simple DD but
  I think my issue is concerning a set of small files (tens of thousands nay
  more)…

  [root@nisus test]# ddt -t 10g /mnt/test/
  Writing to /mnt/test/ddt.8362 ... syncing ... done.
  sleeping 10 seconds ... done.
  Reading from /mnt/test/ddt.8362 ... done.
  10240MiBKiB/s  CPU%
  Write  114770 4
  Read40675 4

  for info: /mnt/test concerns the single v2 GlFS volume

  [root@nisus test]# ddt -t 10g /mnt/fhgfs/
  Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done.
  sleeping 10 seconds ... done.
  Reading from /mnt/fhgfs/ddt.8380 ... done.
  10240MiBKiB/s  CPU%
  Write  102591 1
  Read98079 2

  Do you have a idea how to tune/optimize performance settings? and/or TCP
  settings (MTU, etc.)?

  ---
  | |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
  ---
  | single  |  ~3m45s |   ~43s |~47s |  ~3m10s | ~3m15s |
  ---
  | replicated  |  ~5m10s |   ~59s |   ~1m6s |  ~1m19s | ~1m49s |
  ---
  | distributed |  ~4m18s |   ~41s |~57s |  ~2m24s | ~1m38s |
  ---
  | dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
  ---
  | native FS   |~11s |~4s | ~2s |~56s |   ~10s |
  ---
  | BeeGFS  |  ~3m43s |   ~15s | ~3s |  ~1m33s |   ~46s |
  ---
  | single (v2) |   ~3m6s |   ~14s |~32s |   ~1m2s |   ~44s |
  ---
  for info:
  -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 
  servers)
  - single (v2): simple gluster volume with default settings

  I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS
  but the rest (DU, FIND, RM) looks like to be OK.

  Thank you very much for your reply and help.
  Geoffrey
  ---
  Geoffrey Letessier

  Responsable informatique  ingénieur système
  CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
  Institut de Biologie Physico-Chimique
  13, rue Pierre et Marie Curie - 75005 Paris
  Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr
  mailto:geoffrey.letess...@cnrs.fr
  Le 2 juin 2015 à 21:53, Ben Turner btur...@redhat.com
  mailto:btur...@redhat.com a écrit :

  I am seeing problems on 3.7 as well.  Can you check /var/log/messages on
  both the clients and servers for hung tasks like:

  Jun  2 15:23:14 gqac006 kernel: echo 0 
  /proc/sys/kernel/hung_task_timeout_secs disables this message.
  Jun  2 15:23:14 gqac006 kernel: iozoneD 0001 0
  21999  1 0x0080
  Jun  2

Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

2015-06-02 Thread Ben Turner

I am seeing problems on 3.7 as well.  Can you check /var/log/messages on both 
the clients and servers for hung tasks like:

Jun  2 15:23:14 gqac006 kernel: echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Jun  2 15:23:14 gqac006 kernel: iozoneD 0001 0 21999
  1 0x0080
Jun  2 15:23:14 gqac006 kernel: 880611321cc8 0082 
880611321c18 a027236e
Jun  2 15:23:14 gqac006 kernel: 880611321c48 a0272c10 
88052bd1e040 880611321c78
Jun  2 15:23:14 gqac006 kernel: 88052bd1e0f0 88062080c7a0 
880625addaf8 880611321fd8
Jun  2 15:23:14 gqac006 kernel: Call Trace:
Jun  2 15:23:14 gqac006 kernel: [a027236e] ? 
rpc_make_runnable+0x7e/0x80 [sunrpc]
Jun  2 15:23:14 gqac006 kernel: [a0272c10] ? rpc_execute+0x50/0xa0 
[sunrpc]
Jun  2 15:23:14 gqac006 kernel: [810aaa21] ? ktime_get_ts+0xb1/0xf0
Jun  2 15:23:14 gqac006 kernel: [811242d0] ? sync_page+0x0/0x50
Jun  2 15:23:14 gqac006 kernel: [8152a1b3] io_schedule+0x73/0xc0
Jun  2 15:23:14 gqac006 kernel: [8112430d] sync_page+0x3d/0x50
Jun  2 15:23:14 gqac006 kernel: [8152ac7f] __wait_on_bit+0x5f/0x90
Jun  2 15:23:14 gqac006 kernel: [81124543] wait_on_page_bit+0x73/0x80
Jun  2 15:23:14 gqac006 kernel: [8109eb80] ? 
wake_bit_function+0x0/0x50
Jun  2 15:23:14 gqac006 kernel: [8113a525] ? 
pagevec_lookup_tag+0x25/0x40
Jun  2 15:23:14 gqac006 kernel: [8112496b] 
wait_on_page_writeback_range+0xfb/0x190
Jun  2 15:23:14 gqac006 kernel: [81124b38] 
filemap_write_and_wait_range+0x78/0x90
Jun  2 15:23:14 gqac006 kernel: [811c07ce] vfs_fsync_range+0x7e/0x100
Jun  2 15:23:14 gqac006 kernel: [811c08bd] vfs_fsync+0x1d/0x20
Jun  2 15:23:14 gqac006 kernel: [811c08fe] do_fsync+0x3e/0x60
Jun  2 15:23:14 gqac006 kernel: [811c0950] sys_fsync+0x10/0x20
Jun  2 15:23:14 gqac006 kernel: [8100b072] 
system_call_fastpath+0x16/0x1b

Do you see a perf problem with just a simple DD or do you need a more complex 
workload to hit the issue?  I think I saw an issue with metadata performance 
that I am trying to run down, let me know if you can see the problem with 
simple DD reads / writes or if we need to do some sort of dir / metadata access 
as well.

-b

- Original Message -
 From: Geoffrey Letessier geoffrey.letess...@cnrs.fr
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: gluster-users@gluster.org
 Sent: Tuesday, June 2, 2015 8:09:04 AM
 Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
 
 Hi Pranith,
 
 I’m sorry but I cannot bring you any comparison because comparison will be
 distorted by the fact in my HPC cluster in production the network technology
 is InfiniBand QDR and my volumes are quite different (brick in RAID6
 (12x2TB), 2 bricks per server and 4 servers into my pool)
 
 Concerning your demand, in attachments you can find all expected results
 hoping it can help you to solve this serious performance issue (maybe I need
 play with glusterfs parameters?).
 
 Thank you very much by advance,
 Geoffrey
 --
 Geoffrey Letessier
 Responsable informatique  ingénieur système
 UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
 Institut de Biologie Physico-Chimique
 13, rue Pierre et Marie Curie - 75005 Paris
 Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr
 
 
 
 
 Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri  pkara...@redhat.com  a
 écrit :
 
 hi Geoffrey,
 Since you are saying it happens on all types of volumes, lets do the
 following:
 1) Create a dist-repl volume
 2) Set the options etc you need.
 3) enable gluster volume profile using gluster volume profile volname
 start
 4) run the work load
 5) give output of gluster volume profile volname info
 
 Repeat the steps above on new and old version you are comparing this with.
 That should give us insight into what could be causing the slowness.
 
 Pranith
 On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
 
 
 Dear all,
 
 I have a crash test cluster where i’ve tested the new version of GlusterFS
 (v3.7) before upgrading my HPC cluster in production.
 But… all my tests show me very very low performances.
 
 For my benches, as you can read below, I do some actions (untar, du, find,
 tar, rm) with linux kernel sources, dropping cache, each on distributed,
 replicated, distributed-replicated, single (single brick) volumes and the
 native FS of one brick.
 
 # time (echo 3  /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz;
 sync; echo 3  /proc/sys/vm/drop_caches)
 # time (echo 3  /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 
 /proc/sys/vm/drop_caches)
 # time (echo 3  /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3
  /proc/sys/vm/drop_caches)
 # time (echo 3  /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz
 linux-4.1-rc5/; echo 3  /proc/sys/vm/drop_caches)
 # time (echo 3

Re: [Gluster-users] GlusterD errors

2015-05-11 Thread Ben Turner

- Original Message -
 From: Gaurav Garg gg...@redhat.com
 To: RASTELLI Alessandro alessandro.raste...@skytv.it
 Cc: gluster-users@gluster.org
 Sent: Monday, May 11, 2015 4:42:59 AM
 Subject: Re: [Gluster-users] GlusterD errors

 Hi Rastelli,

 Could you tell us what steps you followed or what command you executed for
 getting these log.

Also, what about manually running xfs_info on your brick filesystems and 
looking for errors.  From the logs it looks like gluster can't get the inode 
size and xfs_info is return a non zero return code.  Is it mounted?  Do you see 
signs of FS corruption?  Look in /var/log/message for XFS related errors.

-b

 ~ Gaurav

 - Original Message -
 From: RASTELLI Alessandro alessandro.raste...@skytv.it
 To: gluster-users@gluster.org
 Sent: Monday, May 11, 2015 2:05:16 PM
 Subject: [Gluster-users] GlusterD errors

 Signature electronique

 Hi,

 we’ve got a lot of these errors in /etc-glusterfs-glusterd.vol.log in our
 Glusterfs environment.

 Just wanted to know if I can do anything about that, or if I can ignore them.

 Thank you

 [2015-05-11 08:22:43.848305] E
 [glusterd-utils.c:7364:glusterd_add_inode_size_to_dict] 0-management:
 xfs_info exited with non-zero exit status

 [2015-05-11 08:22:43.848347] E
 [glusterd-utils.c:7390:glusterd_add_inode_size_to_dict] 0-management: failed
 to get inode size

 [2015-05-11 08:22:52.911718] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo]
 0-: Unable to get transaction opinfo for transaction ID :
 ace2f066-1acb-4e00-9cca-721f88691dce

 [2015-05-11 08:23:53.26] E
 [glusterd-syncop.c:961:_gd_syncop_commit_op_cbk] 0-management: Failed to
 aggregate response from node/brick

 Alessandro

 From: gluster-users-boun...@gluster.org
 [mailto:gluster-users-boun...@gluster.org] On Behalf Of Pierre Léonard
 Sent: venerdì 10 aprile 2015 16:18
 To: gluster-users@gluster.org
 Subject: Re: [Gluster-users] one node change uuid in the night

 Hi Atin and all,

 have corrected with the data in glusterd.info and suppress the bad peers
 file.
 Could you clarify what steps did you perform here. Also could you try to
 start glusterd with -LDEBUG and share the glusterd log file with us.
 Also do you see any delta in glusterd.info file between node 10 and the
 other nodes?
 ~Atin

 The problem is solved. It came from a miwe of uuid file and their contents on
 the 10 node.
 As we said here Ouf ! because I have vacation on next week.

 May be It could be necessary to save the peers directory, as many problem
 came from their contents.

 As the log the name volfile in an error line I search on the web and found
 that page :
 http://www.gluster.org/community/documentation/index.php/Understanding_vol-file

 I have added some section of the example file. Is that pertinent for our 14
 node cluster or do I have to forget or change notably for the number of
 threads ?

 Many thank's for all ,

 --

 Pierre Léonard

 Senior IT Manager

 MetaGenoPolis

 pierre.leon...@jouy.inra.fr

 Tél. : +33 (0)1 34 65 29 78

 Centre de recherche INRA

 Domaine de Vilvert – Bât. 325 R+1

 78 352 Jouy-en-Josas CEDEX

 France

 www.mgps.eu

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Improving Gluster performance through more hardware.

2015-05-07 Thread Ben Turner

- Original Message -
From: Ernie Dunbar maill...@lightspeed.ca
To: Gluster Users gluster-users@gluster.org
Sent: Thursday, May 7, 2015 2:36:08 PM
Subject: [Gluster-users] Improving Gluster performance through more hardware.

Hi all.

First, I have a specific question about what hardware should be used for
Gluster, then after that I have a question about how Gluster does its
multithreading/hyperthreading.

So, we have a new Gluster cluster (currently, two servers with one
replicated volume) serving up our files for e-mail, which has for
years been stored in Maildir format. That works pretty well except for
the few clients who store all their old mail on our server, and their
cur folder contains a few tens of thousands of messages. As others
have noticed, this isn't something that Gluster handles well. But we
value high availability and redundancy more than we value fast, and we
don't yet have a large enough cluster to justify going with software the
requires a metadata server. So we're going with Gluster as a result of
this. That doesn't mean we don't need better performance though.

So I've noticed that the resources that Gluster consumes the most in our
use case isn't the network or disk utilization - both of which remain
*well* under full utilization - but CPU cycles. I can easily test this
by running `ls -l` in a folder with ~20,000 files in it, and I see CPU
usage by glusterfsd jump to between 40-200%. The glusterfs process
usually stays around 20-30%.

Both of our Gluster servers are gen III Dell 2950's with dual Xeon
E5345's (quad-core, 2.33 GHz CPUs) in them, so we have 8 CPUs total to
deal with this load. So far, we're only using a single mail server, but
we'll be migrating to a load-balanced pair very soon. So my guess is
that we can reduce the latency that's very noticeable in our webmail by
upgrading to the fastest CPUs the 2950's can hold, evidently a 3.67 GHz
quad-core.

It would be nice to know what other users have experienced with this
kind of upgrade, or whether they've gotten better performance from other
hardware upgrades.

Which leads to my second question. Does glusterfsd spawn multiple
threads to handle other requests made of it? I don't see any evidence of
this in the `top` program, but other clients don't notice at all that
I'm running up the CPU usage with my one `ls` process. Smaller mail
accounts can read their mail just as quickly as if the system were at
near-idle while this operation is in progress. It's also hard for me to
test this with only one mail server attached to the Gluster cluster. I
can't tell if the additional load from 20 or 100 other servers makes any
difference to CPU usage, but we want to know about what performance we
can expect should we expand that far, and whether throwing more CPUs at
the problem is the answer, or just throwing faster CPUs at the problem
is what we will need to do in the future.

Alot of what you are seeing is getting addressed with:

http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf

Specifically:

http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf#multi-thread-epoll

In the past the single event listener thread would peg out a CPU(hot thread)
and until MT epoll throwing more CPUs at the problem wouldn't help much:

Previously, epoll thread did socket even-handling and the same thread was used
for serving the client or processing the response received from the server. Due
to this, other requests were in a queue until the current epoll thread
completed its operation. With multi-threaded epoll, events are distributed that
improves the performance due the parallel processing of requests/responses
received.

Here are the guidelines for tuning them:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html

Server and client event threads are available in 3.7, and more improvements are
in the pipe. I would start with 4 of each and do some tuning to see what fits
your workload best.

I just ran a test where I created ~300 GB with of 64k files. On 3.7 beta I got:

4917.50 files / second across 4 clients mounting a 2x2 dist rep volume.

The same test on 3.6 + same HW:

2069.28 files / second across 4 clients mounting a 2x2 dist rep volume.

I was running smallfile:

http://www.gluster.org/community/documentation/index.php/Performance_Testing#smallfile_Distributed_I.2FO_Benchmark

To confirm you are hitting the hot thread I suggest running the benchamark of
your choice(I like smallfile for this) and on the brick servers hit:

# top -H

If you see one of the gluster threads at 100% CPU then you are probably hitting
the hot event thread issue that MT epoll addresses. Here is what my top -H
list looks like during a smallfile run:

Tasks: 640 total, 3 running, 637 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.8%us, 11.1%sy,

Re: [Gluster-users] Write operations failing on clients

2015-05-06 Thread Ben Turner

- Original Message -
 From: Alex ale...@icecat.biz
 To: gluster-users@gluster.org
 Sent: Wednesday, May 6, 2015 3:55:15 AM
 Subject: Re: [Gluster-users] Write operations failing on clients

 Ben Turner bturner@... writes:

  Are all writes failing or just writes to certain places?
 When problem occur I've done basic testing such as creating directory and
 successfully copied small file to it. Looks like writes only to some places
 caused problem.

  To me it sounds like rebalance was run, there was a problem and now
  writes are problematic.  Is an accurate problem description?
 I'm not sure about rebalance as source of problem. But it definitely was
 running when writes begin to fail.

 Okay, I understand that it difficult to help with such problem via mail.
 Thank you anyway.

I'm not giving up Alex!  I was just hoping that someone would jump in here from 
the rebal team as IMO the failed rebal is the RCA.  I suggest finding one of 
the problematic directories / files and get me the xattrs:

# getfattr -m . -d -e hex bad file / dir

And post them back here.  Susany / anyone from the rebal team ever see this or 
know how to recover from it?  Any help troubleshoot this would be appreciated.

-b

 Alex

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] performance tuning - list of available options?

2015-05-05 Thread Ben Turner

- Original Message -
 From: Mohammed Rafi K C rkavu...@redhat.com
 To: Vijay Bellur vbel...@redhat.com, Kingsley 
 glus...@gluster.dogwind.com, gluster-users@gluster.org,
 Raghavendra Gowdappa rgowd...@redhat.com
 Sent: Tuesday, May 5, 2015 9:09:31 AM
 Subject: Re: [Gluster-users] performance tuning - list of available options?

 On 05/05/2015 05:24 PM, Vijay Bellur wrote:
  On 05/05/2015 04:34 PM, Kingsley wrote:
  On Tue, 2015-05-05 at 15:08 +0530, Vijay Bellur wrote:
  [snip]
  I have seen this before and it primarily seems to be related to the
  readdir calls done by git clone.

  Turning on these options might help to some extent:

  gluster volume set volname performance.readdir-ahead on

  gluster volume set volname cluster.readdir-optimize on

  Is there a list of all performance tuning options, and a description of
  what they do? Even better if it details pros/cons of using a particular
  option, or circumstances in which you might want to use it.

  If such a list exists, it would be very useful. I've tried googling but
  can only find pages for very old versions (eg gluster 3.2) and only
  detailing /some/ options.

  Knowing what the value defaults to would also be handy. In a separate
  thread, someone advised me to do gluster volume set VOLNAME
  performance.flush-behind off, but when I found that didn't help (the
  issue turned out to be something I was doing, not a fault in gluster) I
  couldn't figure out what the default value was, so I didn't know whether
  to leave it off, or whether it had previously been on.

  Is there anything available?

  Thanks for this feedback. This is something that I would like to see
  captured too.

  Raghavendra, Rafi - would you be able to help create a document on
  this one?
 I would be happy to contribute for this, but I can only spend time after
 3.7 release :-) .

 Along with the performance turning guide, we can also enhance our
 development guide.

I have some ideas here, happy to help.

-b

 Rafi KC

  -Vijay

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Write operations failing on clients

2015-05-01 Thread Ben Turner

Geesh Alex I don't have much for you.  Are all writes failing or just writes to 
certain places?  Anyone else have any input on what could cause this?  To me it 
sounds like rebalance was run, there was a problem and now writes are 
problematic.  Is an accurate problem description?

-b

- Original Message -
 From: Alex ale...@icecat.biz
 To: gluster-users@gluster.org
 Sent: Friday, May 1, 2015 5:47:29 AM
 Subject: Re: [Gluster-users] Write operations failing on clients

  Are your files split brained:

  gluster v heal img info split-brain

  I see alot of problem with your self heal daemon connecting:

 As far as I can see nodes are not split brained:

 # gluster v heal img info split-brain
 Gathering list of split brain entries on volume img has been successful

 Brick gluster1:/var/gl/images
 Number of entries: 0

 Brick gluster2:/var/gl/images
 Number of entries: 0

 Brick gluster3:/var/gl/images
 Number of entries: 0

 Brick gluster4:/var/gl/images
 Number of entries: 0

 Brick gluster5:/var/gl/images
 Number of entries: 0

 Brick gluster6:/var/gl/images
 Number of entries: 0

  $ service glusterd stop
  $ killall glusterfs
  $ killall glusterfsd
  $ ps aux | grep glu  - Make sure evertyhing is actually cleaned up

 Yes, I actually did this in the first place with problematic nodes.
 Unfortunately it did'nt help. CPU load came back in about 3-4 minutes.

  Have you recently run a rebalance?

 Rebalance was running when the problem occur and I stopped it to see if it
 caused problems. I try to run it again.

  Are you having trouble access those directories?  It looks like the fix
 layout failed for those two.

 I can access those dirs via gluster-client:

 # grep gluster /etc/fstab
 gluster1:/img   /media   glusterfs   defaults,_netdev0 1

 # ls -la /media/www/ | wc -l
 47

 /www/thumbs have excessive amount of files so i just stat something inside:
 # ls -l /media/www/thumbs/125.jpg
 -rw-r--r-- 1 apache apache 4365 Oct  8  2009 /media/www/thumbs/125.jpg

 Everything looks fine.

 Thank you,
 Alex

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] client is terrible with large amount of small files

2015-04-30 Thread Ben Turner

- Original Message -
 From: Atin Mukherjee amukh...@redhat.com
 To: gjprabu gjpr...@zohocorp.com
 Cc: Ben Turner btur...@redhat.com, gluster-users@gluster.org
 Sent: Thursday, April 30, 2015 7:37:19 AM
 Subject: Re: [Gluster-users] client is terrible with large amount of small 
 files

 On 04/30/2015 03:09 PM, gjprabu wrote:
  Hi Amukher,

How to resolve this issue, till we need to wait for 3.7 release
or any work around is there.
 You will have to as this feature is in for 3.7.

My apologies, I didn't realize that MT epoll didn't land in 3.6.  If you want 
to test it out there is an alpha build available:

http://download.gluster.org/pub/gluster/glusterfs/nightly/glusterfs-3.7/epel-6-x86_64

I wouldn't run this in production until 3.7 is released though.  Again sorry 
for the confusion.

-b

  RegardsPrabu

   On Thu, 30 Apr 2015 14:49:46 +0530 Atin
  Mukherjeelt;amukh...@redhat.comgt; wrote 

  On 04/30/2015 02:32 PM, gjprabu wrote:
  gt; Hi bturner,
  gt;
  gt;
  gt; I am getting below error while adding server.event
  gt;
  gt; gluster v set integvol server.event-threads 3
  gt; volume set: failed: option : server.event-threads does not exist
  gt; Did you mean server.gid-timeout or ...manage-gids?
  This option is not available in 3.6, its going to come in 3.7

  gt;
  gt;
  gt; Glusterfs version has been upgraded to 3.6.3
  gt; Also os kernel upgraded to 6.6 kernel
  gt; Yes two brick are running in KVM and one is physical machine and we
  are not using thinp.
  gt;
  gt; Regards
  gt; G.J
  gt;
  gt;
  gt;
  gt;
  gt;
  gt;  On Thu, 30 Apr 2015 00:37:44 +0530 Ben
  Turneramp;lt;btur...@redhat.comamp;gt; wrote 
  gt;
  gt; - Original Message -
  gt; amp;gt; From: gjprabu amp;lt;gjpr...@zohocorp.comamp;gt;
  gt; amp;gt; To: A Ghoshal amp;lt;a.ghos...@tcs.comamp;gt;
  gt; amp;gt; Cc: gluster-users@gluster.org,
  gluster-users-boun...@gluster.org
  gt; amp;gt; Sent: Wednesday, April 29, 2015 9:07:07 AM
  gt; amp;gt; Subject: Re: [Gluster-users] client is terrible with large
  amount of small files
  gt; amp;gt;
  gt; amp;gt; Hi Ghoshal,
  gt; amp;gt;
  gt; amp;gt; Please find the details below.
  gt; amp;gt;
  gt; amp;gt; A) Glusterfs version
  gt; amp;gt; glusterfs 3.6.2
  gt;
  gt; Upgrade to 3.6.3 and set client.event-threads and server.event-threads
  to at least 4. Here is a guide on tuning MT epoll:
  gt;
  gt;
  https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html
  gt;
  gt; amp;gt;
  gt; amp;gt; B) volume configuration (gluster v amp;lt;volnameamp;gt;
  info)
  gt; amp;gt; gluster volume info
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt; Volume Name: integvol
  gt; amp;gt; Type: Replicate
  gt; amp;gt; Volume ID: b8f3a19e-59bc-41dc-a55a-6423ec834492
  gt; amp;gt; Status: Started
  gt; amp;gt; Number of Bricks: 1 x 3 = 3
  gt; amp;gt; Transport-type: tcp
  gt; amp;gt; Bricks:
  gt; amp;gt; Brick1: integ-gluster2:/srv/sdb1/brick
  gt; amp;gt; Brick2: integ-gluster1:/srv/sdb1/brick
  gt; amp;gt; Brick3: integ-gluster3:/srv/sdb1/brick
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt; C) host linux version
  gt; amp;gt; CentOS release 6.5 (Final)
  gt;
  gt; Are your bricks on LVM? Are you using thinp? If so update to the
  latest kernel as thinp perf was really bad in 6.5 and early 6.6 kernels.
  gt;
  gt; amp;gt;
  gt; amp;gt; D) details about the kind of network you use to connect your
  servers making
  gt; amp;gt; up your storage pool.
  gt; amp;gt; We are connecting LAN to LAN there is no special network
  configuration done
  gt; amp;gt;
  gt; amp;gt; Frome client we use to mount like below
  gt; amp;gt; mount -t glusterfs gluster1:/integvol /mnt/gluster/
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt; Regards
  gt; amp;gt; Prabu
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;  On Wed, 29 Apr 2015 17:58:16 +0530 A
  Ghoshalamp;lt;a.ghos...@tcs.comamp;gt; wrote
  gt; amp;gt; 
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt; Performance would largely depend upon setup. While I cannot
  think of any
  gt; amp;gt; setup that would cause write to be this slow, if would help
  if you share the
  gt; amp;gt; following details:
  gt; amp;gt;
  gt; amp;gt; A) Glusterfs version
  gt; amp;gt; B) volume configuration (gluster v amp;lt;volnameamp;gt;
  info)
  gt; amp;gt; C) host linux version
  gt; amp;gt; D) details about the kind of network you use to connect your
  servers making
  gt; amp;gt; up your storage pool.
  gt; amp;gt;
  gt; amp;gt; Thanks,
  gt; amp;gt; Anirban
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt; From: gjprabu amp;lt; gjpr...@zohocorp.com amp;gt;
  gt; amp;gt; To: amp;lt; gluster-users@gluster.org amp;gt;
  gt; amp;gt; Date: 04/29/2015 05:52 PM
  gt; amp;gt; Subject: Re: [Gluster-users] client is terrible with large
  amount of small
  gt; amp;gt; files
  gt; amp;gt; Sent by: gluster-users-boun

Re: [Gluster-users] Write operations failing on clients

2015-04-30 Thread Ben Turner

- Original Message -
 From: Alex ale...@icecat.biz
 To: gluster-users@gluster.org
 Sent: Thursday, April 30, 2015 6:52:58 AM
 Subject: Re: [Gluster-users] Write operations failing on clients

 Okay, I did some digging. On the client there was many errors such as:

 [2015-04-29 15:47:08.700174] W [client-rpc-fops.c:2774:client3_3_lookup_cbk]
 0-img-client-0: remote operation failed: Transport endpoint is not
 connected. Path: /www/img/gallery/9722926_4130.jpg
 (----)
 [2015-04-29 15:47:08.700268] I
 [afr-self-heal-entry.c:607:afr_sh_entry_expunge_entry_cbk]
 0-img-replicate-0: looking up /www/img/gallery/9722926_4130.jpg under
 img-client-0 failed (Transport endpoint is not connected)

 And at the same time on the cluster:
 [2015-04-29 15:47:59.989897] W [client-rpc-fops.c:2774:client3_3_lookup_cbk]
 0-img-client-0: remote operation failed: Transport endpoint is not
 connected. Path: /www/pdf/23096091-1722.pdf
 (----)
 [2015-04-29 15:47:59.989923] I
 [afr-self-heal-entry.c:607:afr_sh_entry_expunge_entry_cbk]
 0-img-replicate-0: looking up /www/pdf/23096091-1722.pdf under img-client-0
 failed (Transport endpoint is not connected)

 What could it mean? Is there some kind of network error? BTW there was
 nothing that indicated any network connectivity problems between nodes and
 clients.

Hi Alex.  You are correct when you see the Transport endpoint is not 
connected. it usually means that the client is unable to access the server.  
Check gluster v status and make sure all your bricks are online.  Try to 
unmount / remount the client if you see everything is up.  If you still cant 
access and everythin is online I would do some basic NW troubleshooting, make 
sure selinux is off on the servers, and check /var/log/glusterfs/bricks for 
errors on each of the servers.  If you see any error messages of the severity { 
M | A | C | E } pastebin them to me and I'll have a look.

-b

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Poor performance with small files

2015-04-30 Thread Ben Turner

- Original Message -
From: Ron Trompert ron.tromp...@surfsara.nl
To: Ben Turner btur...@redhat.com
Cc: gluster-users@gluster.org
Sent: Thursday, April 30, 2015 1:25:42 AM
Subject: Re: [Gluster-users] Poor performance with small files

Hi Ben,

Thanks for the info.

My apologies Ron I just found out that MT epoll did not land in 3.6 and it
won't be available until 3.7. If you want to try the alpha bits they are
available in:

http://download.gluster.org/pub/gluster/glusterfs/nightly/glusterfs-3.7/epel-6-x86_64/

These are alpha and should only be used to test, but if you want to get a feel
for what is in the pipe its there for you to try.

-b

Cheers,

Ron

On 29/04/15 21:03, Ben Turner wrote:
- Original Message -
From: Ron Trompert ron.tromp...@surfsara.nl
To: gluster-users@gluster.org
Sent: Wednesday, April 29, 2015 1:25:59 PM
Subject: [Gluster-users] Poor performance with small files

Hi,

We run gluster as storage solution for our Owncloud-based sync and share
service. At the moment we have about 30 million files in the system
which addup to a little more than 30TB. Most of these files are as you
may expect very small, i.e. in the 100KB ball park. For about a year
everything ran perfectly fine. We run 3.6.2 by the way.

Upgrade to 3.6.3 and set client.event-threads and server.event-threads to
at least 4:

Previously, epoll thread did socket even-handling and the same thread was
used for serving the client or processing the response received from the
server. Due to this, other requests were in a queue untill the current
epoll thread completed its operation. With multi-threaded epoll, events
are distributed that improves the performance due the parallel processing
of requests/responses received.

Here are the guidelines for tuning them:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html

In my testing with epoll threads at 4 I saw a between a 15% and 50%
increase depending on the workload.

There are several smallfile perf enhancements in the works:

*http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf

*Lookup unhashed is the next feature and should be ready with 3.7(correct
me if I am wrong).

*If you are using RAID 6 you may want to do some testing with RAID 10 or
JBOD, but the benefits here only come into play with alot of concurrent
access(30+ processes / threads working with different files).

*Tiering may help here if you want to add some SSDs, this is also a 3.7
feature.

HTH!

-b

Now we are trying to commission new hardware. We have done this by
adding the new nodes to our cluster and using the add-brick and
remove-brick procedure to get the data to the new nodes. In a week we
have migrated only 8.5TB this way. What are we doing wrong here? Is
there a way to improve the gluster performance on small files?

I have another question. If you want to setup a gluster that will
contain lots of very small files. What would be a good practice to set
things up in terms configuration, sizes of bricks related tot memory and
number of cores, number of brick per node etc.?

Best regards and thanks in advance,

Ron

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Poor performance of NFS client for large writes compared to native client

2015-04-30 Thread Ben Turner

- Original Message -
 From: Vijay Bellur vbel...@redhat.com
 To: Behrooz Shafiee shafie...@gmail.com, Gluster-users@gluster.org List 
 gluster-users@gluster.org
 Sent: Thursday, April 30, 2015 6:44:11 AM
 Subject: Re: [Gluster-users] Poor performance of NFS client for large writes 
 compared to native client

 On 04/30/2015 06:49 AM, Behrooz Shafiee wrote:
  Hi,

I was comparing GlusterFS native and NFS clients and I noticed, NFS
  client is significantly slower for large writes. I wrote about 200, 1GB
  files using a 1MB block sizes and NFS throughput was almost half of
  native client. Can anyone explain why is that?

 Depending on where the file gets scheduled, NFS might need an additional
 network hop. That can contribute to additional latency and less
 throughput than the native client.

To tag on here GlusterFS mounts use the hash algorithm to know which server to 
write directly to.  NFS is not aware of this so all files get routed through 
the server that is mounted just like Vijay said.  The server relaying the file 
adds this extra hop and contributes to the latency / slowdown.  I estimate 
performance like:

10G interface with 12 disk RAID 6:

GFS Read(replica 1 or 2) = 720 MB/s
GFS Write(replica 1) = 820 MB/s
GFS Write(replica 2) = 410 MB/s

NFS Read(replica 1 or 2) = 535 MB/s
NFS Write(replica 1) = 400 MB/s
NFS Write(replica 2) = 250 MB/s

So with replica 2 Gluster FS I would expect ~410 MB / sec writes and on the 
same volume over NFS I would expect 250 MB / sec.  Its not a full 50% but its 
close.

HTH!

-b

 -Vijay

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Write operations failing on clients

2015-04-30 Thread Ben Turner

- Original Message -
 From: Alex ale...@icecat.biz
 To: gluster-users@gluster.org
 Sent: Thursday, April 30, 2015 6:52:58 AM
 Subject: Re: [Gluster-users] Write operations failing on clients

 Okay, I did some digging. On the client there was many errors such as:

 [2015-04-29 15:47:08.700174] W [client-rpc-fops.c:2774:client3_3_lookup_cbk]
 0-img-client-0: remote operation failed: Transport endpoint is not
 connected. Path: /www/img/gallery/9722926_4130.jpg
 (----)
 [2015-04-29 15:47:08.700268] I
 [afr-self-heal-entry.c:607:afr_sh_entry_expunge_entry_cbk]
 0-img-replicate-0: looking up /www/img/gallery/9722926_4130.jpg under
 img-client-0 failed (Transport endpoint is not connected)

 And at the same time on the cluster:
 [2015-04-29 15:47:59.989897] W [client-rpc-fops.c:2774:client3_3_lookup_cbk]
 0-img-client-0: remote operation failed: Transport endpoint is not
 connected. Path: /www/pdf/23096091-1722.pdf
 (----)
 [2015-04-29 15:47:59.989923] I
 [afr-self-heal-entry.c:607:afr_sh_entry_expunge_entry_cbk]
 0-img-replicate-0: looking up /www/pdf/23096091-1722.pdf under img-client-0
 failed (Transport endpoint is not connected)

 What could it mean? Is there some kind of network error? BTW there was
 nothing that indicated any network connectivity problems between nodes and
 clients.

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Poor performance of NFS client for large writes compared to native client

2015-04-30 Thread Ben Turner

- Original Message -
 From: Behrooz Shafiee shafie...@gmail.com
 To: Ben Turner btur...@redhat.com
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Thursday, April 30, 2015 9:34:31 AM
 Subject: Re: [Gluster-users] Poor performance of NFS client for large writes 
 compared to native client

 Thanks, it clarifies write slowdown! But my reads with NFS are as fast as
 GlusterFS native client. Does it mean the server which NFS was mounted with
 is actually hosting those files so no extra hop and same performance?

Yepo, that is probably the case.  With reads setting read ahead on the brick 
device is pretty important.  I recommend trying:

echo 65536  /sys/block/$device_name/queue/read_ahead_kb

Only use this if you have a RAID, I normally use RAID 6 with 12 disks.

-b

 Thanks,
 On 30 Apr 2015 8:27 am, Ben Turner btur...@redhat.com wrote:

  - Original Message -
   From: Vijay Bellur vbel...@redhat.com
   To: Behrooz Shafiee shafie...@gmail.com, Gluster-users@gluster.org
  List gluster-users@gluster.org
   Sent: Thursday, April 30, 2015 6:44:11 AM
   Subject: Re: [Gluster-users] Poor performance of NFS client for large
  writes compared to native client

   On 04/30/2015 06:49 AM, Behrooz Shafiee wrote:
Hi,

  I was comparing GlusterFS native and NFS clients and I noticed, NFS
client is significantly slower for large writes. I wrote about 200, 1GB
files using a 1MB block sizes and NFS throughput was almost half of
native client. Can anyone explain why is that?

   Depending on where the file gets scheduled, NFS might need an additional
   network hop. That can contribute to additional latency and less
   throughput than the native client.

  To tag on here GlusterFS mounts use the hash algorithm to know which
  server to write directly to.  NFS is not aware of this so all files get
  routed through the server that is mounted just like Vijay said.  The server
  relaying the file adds this extra hop and contributes to the latency /
  slowdown.  I estimate performance like:

  10G interface with 12 disk RAID 6:

  GFS Read(replica 1 or 2) = 720 MB/s
  GFS Write(replica 1) = 820 MB/s
  GFS Write(replica 2) = 410 MB/s

  NFS Read(replica 1 or 2) = 535 MB/s
  NFS Write(replica 1) = 400 MB/s
  NFS Write(replica 2) = 250 MB/s

  So with replica 2 Gluster FS I would expect ~410 MB / sec writes and on
  the same volume over NFS I would expect 250 MB / sec.  Its not a full 50%
  but its close.

  HTH!

  -b

   -Vijay

   ___
   Gluster-users mailing list
   Gluster-users@gluster.org
   http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Write operations failing on clients

2015-04-30 Thread Ben Turner

Are your files split brained:

gluster v heal img info split-brain

I see alot of problem with your self heal daemon connecting:

[2015-04-29 16:15:37.137215] E [socket.c:2161:socket_connect_finish] 
0-img-client-4: connection to 192.168.114.185:49154 failed (Connection refused)
[2015-04-29 16:15:37.434035] E 
[client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-0: failed to 
get the port number for remote subvolume. Please run 'gluster volume status' on 
server to see if brick process is running.
[2015-04-29 16:15:40.308730] E [afr-self-heald.c:1479:afr_find_child_position] 
0-img-replicate-2: getxattr failed on img-client-5 - (Transport endpoint is not 
connected)
[2015-04-29 16:15:40.308878] E [afr-self-heald.c:1479:afr_find_child_position] 
0-img-replicate-1: getxattr failed on img-client-3 - (Transport endpoint is not 
connected)
[2015-04-29 16:15:41.192965] E 
[client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-3: failed to 
get the port number for remote subvolume. Please run 'gluster volume status' on 
server to see if brick process is running.
[2015-04-29 16:20:23.184879] E [socket.c:2161:socket_connect_finish] 
0-img-client-1: connection to 192.168.114.182:24007 failed (Connection refused)
[2015-04-29 16:21:01.684625] E 
[client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-1: failed to 
get the port number for remote subvolume. Please run 'gluster volume status' on 
server to see if brick process is running.
[2015-04-29 16:24:14.211163] E [socket.c:2161:socket_connect_finish] 
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:18.213126] E [socket.c:2161:socket_connect_finish] 
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:22.212902] E [socket.c:2161:socket_connect_finish] 
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:26.213708] E [socket.c:2161:socket_connect_finish] 
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:30.214324] E [socket.c:2161:socket_connect_finish] 
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:34.214816] E [socket.c:2161:socket_connect_finish] 
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)

There looks to have been some network flapping up and down and files may have 
become split brianed.  Whenever you are bouncing services I usually:

$ service glusterd stop
$ killall glusterfs
$ killall glusterfsd
$ ps aux | grep glu  - Make sure evertyhing is actually cleaned up

Anytime you take a node offline and back online make sure the files get 
resynced with a self heal before you take offline any other nodes:

$ gluster v heal img full

If you do see split brained files you can resolve with:

http://blog.gluster.org/category/howtos/
https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/

LMK if you see any split brained files.

-b

- Original Message -
 From: Alex ale...@icecat.biz
 To: gluster-users@gluster.org
 Sent: Thursday, April 30, 2015 9:26:04 AM
 Subject: Re: [Gluster-users] Write operations failing on clients
 
 Oh and this is output of some status commands:
 http://termbin.com/bvzz
 
 Mount\umount worked just fine.
 
 Alex
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Write operations failing on clients

2015-04-30 Thread Ben Turner

Also I see:

/var/log/glusterfs/img-rebalance.log-20150430
[2015-04-29 14:49:40.793369] E [dht-rebalance.c:1515:gf_defrag_fix_layout] 
0-img-dht: Fix layout failed for /www/thumbs
[2015-04-29 14:49:40.793625] E [dht-rebalance.c:1515:gf_defrag_fix_layout] 
0-img-dht: Fix layout failed for /www

Have you recently run a rebalance?  Are you having trouble access those 
directories?  It looks like the fix layout failed for those two.

-b


- Original Message -
 From: Ben Turner btur...@redhat.com
 To: Alex ale...@icecat.biz
 Cc: gluster-users@gluster.org
 Sent: Thursday, April 30, 2015 5:10:39 PM
 Subject: Re: [Gluster-users] Write operations failing on clients
 
 Are your files split brained:
 
 gluster v heal img info split-brain
 
 I see alot of problem with your self heal daemon connecting:
 
 [2015-04-29 16:15:37.137215] E [socket.c:2161:socket_connect_finish]
 0-img-client-4: connection to 192.168.114.185:49154 failed (Connection
 refused)
 [2015-04-29 16:15:37.434035] E
 [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-0: failed to
 get the port number for remote subvolume. Please run 'gluster volume status'
 on server to see if brick process is running.
 [2015-04-29 16:15:40.308730] E
 [afr-self-heald.c:1479:afr_find_child_position] 0-img-replicate-2: getxattr
 failed on img-client-5 - (Transport endpoint is not connected)
 [2015-04-29 16:15:40.308878] E
 [afr-self-heald.c:1479:afr_find_child_position] 0-img-replicate-1: getxattr
 failed on img-client-3 - (Transport endpoint is not connected)
 [2015-04-29 16:15:41.192965] E
 [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-3: failed to
 get the port number for remote subvolume. Please run 'gluster volume status'
 on server to see if brick process is running.
 [2015-04-29 16:20:23.184879] E [socket.c:2161:socket_connect_finish]
 0-img-client-1: connection to 192.168.114.182:24007 failed (Connection
 refused)
 [2015-04-29 16:21:01.684625] E
 [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-1: failed to
 get the port number for remote subvolume. Please run 'gluster volume status'
 on server to see if brick process is running.
 [2015-04-29 16:24:14.211163] E [socket.c:2161:socket_connect_finish]
 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
 refused)
 [2015-04-29 16:24:18.213126] E [socket.c:2161:socket_connect_finish]
 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
 refused)
 [2015-04-29 16:24:22.212902] E [socket.c:2161:socket_connect_finish]
 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
 refused)
 [2015-04-29 16:24:26.213708] E [socket.c:2161:socket_connect_finish]
 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
 refused)
 [2015-04-29 16:24:30.214324] E [socket.c:2161:socket_connect_finish]
 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
 refused)
 [2015-04-29 16:24:34.214816] E [socket.c:2161:socket_connect_finish]
 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
 refused)
 
 There looks to have been some network flapping up and down and files may have
 become split brianed.  Whenever you are bouncing services I usually:
 
 $ service glusterd stop
 $ killall glusterfs
 $ killall glusterfsd
 $ ps aux | grep glu  - Make sure evertyhing is actually cleaned up
 
 Anytime you take a node offline and back online make sure the files get
 resynced with a self heal before you take offline any other nodes:
 
 $ gluster v heal img full
 
 If you do see split brained files you can resolve with:
 
 http://blog.gluster.org/category/howtos/
 https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
 
 LMK if you see any split brained files.
 
 -b
 
 - Original Message -
  From: Alex ale...@icecat.biz
  To: gluster-users@gluster.org
  Sent: Thursday, April 30, 2015 9:26:04 AM
  Subject: Re: [Gluster-users] Write operations failing on clients
  
  Oh and this is output of some status commands:
  http://termbin.com/bvzz
  
  Mount\umount worked just fine.
  
  Alex
  
  
  ___
  Gluster-users mailing list
  Gluster-users@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-users
  
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Gluster Benchmark Kit

2015-04-29 Thread Ben Turner

- Original Message -
 From: M S Vishwanath Bhat msvb...@gmail.com
 To: Benjamin Turner bennytu...@gmail.com
 Cc: Kiran Patil ki...@fractalio.com, gluster-users@gluster.org, Gluster 
 Devel gluster-de...@gluster.org,
 btur...@redhat.com
 Sent: Wednesday, April 29, 2015 3:20:12 AM
 Subject: Re: [Gluster-devel] Gluster Benchmark Kit

 On 28 April 2015 at 01:03, Benjamin Turner bennytu...@gmail.com wrote:

  Hi Kiran, thanks for the feedback!  I already put up a repo on githib:

  https://github.com/bennyturns/gluster-bench

  On my TODO list is:

  -The benchmark is currently RHEL / RHGS(Red Hat Gluster Storage) specific,
  I want to make things work with at least non paid RPM distros and Ubuntu.
  -Other filesystems(like you mentioned)
  -No LVM and non thinp config options.
  -EC, tiering, snapshot capabilities.

  I'll probably fork things and have a Red Hat specific version and an
  upstream version.  As soon as I have everything working on Centos I'll let
  the list know and we can enhance things to do whatever we need.  I always
  thought it would be interesting if we had a page where people could submit
  their benchmark data and the HW / config used.  Having a standard tool /
  tool set will help there.

 Ben,

 Do you think it is a good Idea (or is it possible) to integrate these with
 distaf? (https://github.com/gluster/distaf)

When I made this my goal was for someone to be able to build a cluster from 
scratch and run the full benchmark suite with only like 3-4 commands.  I 
specifically didn't use distaf to cut down on complexity, because the test 
tools were all already multinode capable, and I am already maintaining the 
rhs-system-init script I figured I would just reuse that for the setup.  That 
said I have been seeing more and more reasons to move things to distaf(FIO is 
not multi node capable, gather profiling data from each server, etc), and I was 
kicking around the idea of DISTAF-ifing it.  Let me fix everything to work with 
Centos/Fedora and get it into git.  From there we can look at distaf-ifing 
things, but I agree that is the way things should go.

-b

 That would enable us to choose workloads suitable to each scenario for
 single (set of) tests.

 Best Regards,
 Vishwanath

  -b

  On Mon, Apr 27, 2015 at 3:31 AM, Kiran Patil ki...@fractalio.com wrote:

  Hi,

  I came across Gluster Benchmark Kit while reading [Gluster-users]
  Disastrous performance with rsync to mounted Gluster volume thread.

  http://54.82.237.211/gluster-benchmark/gluster-bench-README

  http://54.82.237.211/gluster-benchmark

  The Kit includes tools such as iozone, smallfile and fio.

  This Kit is not documented and need to baseline this tool for Gluster
  Benchmark testing.

  The community is going to benefit by adopting and extending it as per
  their needs and the kit should be hosted on Github.

  The init.sh script in the Kit contains only XFS filesystem which can be
  extended to BTRFS and ZFS.

  Thanks Ben Turner for sharing it.

  Kiran.

  ___
  Gluster-devel mailing list
  gluster-de...@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel

  ___
  Gluster-devel mailing list
  gluster-de...@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] client is terrible with large amount of small files

2015-04-29 Thread Ben Turner

- Original Message -
 From: gjprabu gjpr...@zohocorp.com
 To: A Ghoshal a.ghos...@tcs.com
 Cc: gluster-users@gluster.org, gluster-users-boun...@gluster.org
 Sent: Wednesday, April 29, 2015 9:07:07 AM
 Subject: Re: [Gluster-users] client is terrible with large amount of small 
 files

 Hi Ghoshal,

 Please find the details below.

 A) Glusterfs version
 glusterfs 3.6.2

Upgrade to 3.6.3 and set client.event-threads and server.event-threads to at 
least 4.  Here is a guide on tuning MT epoll:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html

 B) volume configuration (gluster v volname info)
 gluster volume info

 Volume Name: integvol
 Type: Replicate
 Volume ID: b8f3a19e-59bc-41dc-a55a-6423ec834492
 Status: Started
 Number of Bricks: 1 x 3 = 3
 Transport-type: tcp
 Bricks:
 Brick1: integ-gluster2:/srv/sdb1/brick
 Brick2: integ-gluster1:/srv/sdb1/brick
 Brick3: integ-gluster3:/srv/sdb1/brick

 C) host linux version
 CentOS release 6.5 (Final)

Are your bricks on LVM?  Are you using thinp?  If so update to the latest 
kernel as thinp perf was really bad in 6.5 and early 6.6 kernels.

 D) details about the kind of network you use to connect your servers making
 up your storage pool.
 We are connecting LAN to LAN there is no special network configuration done

 Frome client we use to mount like below
 mount -t glusterfs gluster1:/integvol /mnt/gluster/

 Regards
 Prabu

  On Wed, 29 Apr 2015 17:58:16 +0530 A Ghoshala.ghos...@tcs.com wrote

 Performance would largely depend upon setup. While I cannot think of any
 setup that would cause write to be this slow, if would help if you share the
 following details:

 A) Glusterfs version
 B) volume configuration (gluster v volname info)
 C) host linux version
 D) details about the kind of network you use to connect your servers making
 up your storage pool.

 Thanks,
 Anirban

 From: gjprabu  gjpr...@zohocorp.com 
 To:  gluster-users@gluster.org 
 Date: 04/29/2015 05:52 PM
 Subject: Re: [Gluster-users] client is terrible with large amount of small
 files
 Sent by: gluster-users-boun...@gluster.org

 Hi Team,

 If anybody know the solution please share us.

 Regards
 Prabu

  On Tue, 28 Apr 2015 19:32:40 +0530 gjprabu  gjpr...@zohocorp.com 
 wrote 
 Hi Team,

 We are using glusterfs newly and testing data transfer part in client using
 fuse.glusterfs file system but it is terrible with large amount of small
 files (Large amount of small file 150MB of size it's writing around 18min).
 I can able copy small files and syncing between the server brick are working
 fine but it is terrible with large amount of small files.

 if anybody please share the solution for the above issue.

 Regards
 Prabu

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Poor performance with small files

2015-04-29 Thread Ben Turner

- Original Message -
From: Ron Trompert ron.tromp...@surfsara.nl
To: gluster-users@gluster.org
Sent: Wednesday, April 29, 2015 1:25:59 PM
Subject: [Gluster-users] Poor performance with small files

Hi,

Upgrade to 3.6.3 and set client.event-threads and server.event-threads to at
least 4:

Previously, epoll thread did socket even-handling and the same thread was used
for serving the client or processing the response received from the server. Due
to this, other requests were in a queue untill the current epoll thread
completed its operation. With multi-threaded epoll, events are distributed that
improves the performance due the parallel processing of requests/responses
received.

Here are the guidelines for tuning them:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html

In my testing with epoll threads at 4 I saw a between a 15% and 50% increase
depending on the workload.

There are several smallfile perf enhancements in the works:

*http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf

*Lookup unhashed is the next feature and should be ready with 3.7(correct me if
I am wrong).

*If you are using RAID 6 you may want to do some testing with RAID 10 or JBOD,
but the benefits here only come into play with alot of concurrent access(30+
processes / threads working with different files).

*Tiering may help here if you want to add some SSDs, this is also a 3.7 feature.

HTH!

-b

Best regards and thanks in advance,

Ron

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disastrous performance with rsync to mounted Gluster volume.

2015-04-27 Thread Ben Turner

- Original Message -
 From: Ernie Dunbar maill...@lightspeed.ca
 To: Gluster Users gluster-users@gluster.org
 Sent: Monday, April 27, 2015 4:24:56 PM
 Subject: Re: [Gluster-users] Disastrous performance with rsync to mounted 
 Gluster volume.

 On 2015-04-24 11:43, Joe Julian wrote:

  This should get you where you need to be.  Before you start to migrate
  the data maybe do a couple DDs and send me the output so we can get an
  idea of how your cluster performs:

  time `dd if=/dev/zero of=gluster-mount/myfile bs=1024k count=1000;
  sync`
  echo 3  /proc/sys/vm/drop_caches
  dd if=gluster mount of=/dev/null bs=1024k count=1000

  If you are using gigabit and glusterfs mounts with replica 2 you
  should get ~55 MB / sec writes and ~110 MB / sec reads.  With NFS you
  will take a bit of a hit since NFS doesnt know where files live like
  glusterfs does.

 After copying our data and doing a couple of very slow rsyncs, I did
 your speed test and came back with these results:

 1048576 bytes (1.0 MB) copied, 0.0307951 s, 34.1 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=1024 bs=1024; sync
 1024+0 records in
 1024+0 records out
 1048576 bytes (1.0 MB) copied, 0.0298592 s, 35.1 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=1024 bs=1024; sync
 1024+0 records in
 1024+0 records out
 1048576 bytes (1.0 MB) copied, 0.0501495 s, 20.9 MB/s
 root@backup:/home/webmailbak# echo 3  /proc/sys/vm/drop_caches
 root@backup:/home/webmailbak# # dd if=/mnt/testfile of=/dev/null
 bs=1024k count=1000
 1+0 records in
 1+0 records out
 1048576 bytes (1.0 MB) copied, 0.0124498 s, 84.2 MB/s

 Keep in mind that this is an NFS share over the network.

 I've also noticed that if I increase the count of those writes, the
 transfer speed increases as well:

 2097152 bytes (2.1 MB) copied, 0.036291 s, 57.8 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=2048 bs=1024; sync
 2048+0 records in
 2048+0 records out
 2097152 bytes (2.1 MB) copied, 0.0362724 s, 57.8 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=2048 bs=1024; sync
 2048+0 records in
 2048+0 records out
 2097152 bytes (2.1 MB) copied, 0.0360319 s, 58.2 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=10240 bs=1024; sync
 10240+0 records in
 10240+0 records out
 10485760 bytes (10 MB) copied, 0.127219 s, 82.4 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=10240 bs=1024; sync
 10240+0 records in
 10240+0 records out
 10485760 bytes (10 MB) copied, 0.128671 s, 81.5 MB/s

This is correct, there is overhead that happens with small files and the 
smaller the file the less throughput you get.  That said, since files are 
smaller you should get more files / second but less MB / second.  I have found 
that when you go under 16k changing files size doesn't matter, you will get the 
same number of 16k files / second as you do 1 k files.

 However, the biggest stumbling block for rsync seems to be changes to
 directories. I'm unsure about what exactly it's doing (probably changing
 last access times?) but these minor writes seem to take a very long time
 when normally they would not. Actual file copies (as in the very files
 that are actually new within those same directories) appear to take
 quite a lot less time than the directory updates.

Dragons be here!  Access time is not kept in sync across the replicas(IIRC, 
someone correct me if I am wrong!) and each time a dir is read from a different 
brick I bet the access time is different.

 For example:

 # time rsync -av --inplace --whole-file --ignore-existing --delete-after
 gromm/* /mnt/gromm/
 building file list ... done
 Maildir/## This part takes a long time.
 Maildir/.INBOX.Trash/
 Maildir/.INBOX.Trash/cur/
 Maildir/.INBOX.Trash/cur/1429836077.H817602P21531.pop.lightspeed.ca:2,S
 Maildir/.INBOX.Trash/tmp/   ## The previous three lines took nearly
 no time at all.
 Maildir/cur/## This takes a long time.
 Maildir/cur/1430160436.H952679P13870.pop.lightspeed.ca:2,S
 Maildir/new/
 Maildir/tmp/## The previous lines again take no time
 at all.
 deleting Maildir/cur/1429836077.H817602P21531.pop.lightspeed.ca:2,S
 ## This delete did take a while.
 sent 1327634 bytes  received 75 bytes  59009.29 bytes/sec
 total size is 624491648  speedup is 470.35

 real  0m26.110s
 user  0m0.140s
 sys   0m1.596s

 So, rsync reports that it wrote 1327634 bytes at 59 kBytes/sec, and the
 whole operation took 26 seconds. To write 2 files that were around 20-30
 kBytes each and delete 1.

 The last rsync took around 56 minutes, when normally such an rsync would
 have taken 5-10 minutes, writing over the network via ssh.

It may have something to do with the access times not being in sync across 
replicated pairs.  Maybe some has experience with this / could this be tripping 
up rsync?

-b

Re: [Gluster-users] Disastrous performance with rsync to mounted Gluster volume.

2015-04-27 Thread Ben Turner

- Original Message -
 From: David Robinson david.robin...@corvidtec.com
 To: Ben Turner btur...@redhat.com, Ernie Dunbar maill...@lightspeed.ca
 Cc: Gluster Users gluster-users@gluster.org
 Sent: Monday, April 27, 2015 5:21:08 PM
 Subject: Re[2]: [Gluster-users] Disastrous performance with rsync to mounted 
 Gluster volume.
 
 I am also having a terrible time with rsync and gluster.  The vast
 majority of my time is spent figuring out what to sync...  This sync
 takes 17-hours even though very little data is being transferred.
 
 sent 120,523 bytes  received 74,485,191,265 bytes  1,210,720.02
 bytes/sec
 total size is 27,589,660,889,910  speedup is 370.40
 

Maybe we could try something to confirm / deny my theory.  What about asking 
rsync to ignore anything that could differ between bricks in a replicated pair. 
 A couple options I see are:

--size-only means that rsync will skip files that match in size, even if the 
timestamps differ. This means it will synchronise less files than the default 
behaviour. It will miss any file with changes that don't affect the overall 
file size.

--ignore-times means that rsync will checksum every file, even if the 
timestamps and file sizes match. This means it will synchronise more files than 
the default behaviour. It will include changes to files even where the file 
size is the same and the modification date/time has been reset to the original 
value (resetting the date/time is unlikely to be done in practise, but it could 
happen).

These may also help, but it looks more to be for recovering from brick failures:

http://blog.gluster.org/category/rsync/
https://mjanja.ch/2014/07/parallelizing-rsync/?utm_source=rssutm_medium=rssutm_campaign=parallelizing-rsync#sync_brick

I'll try some stuff in the lab and see if I can come up with RCA or something 
that helps.

-b
 
 
 -- Original Message --
 From: Ben Turner btur...@redhat.com
 To: Ernie Dunbar maill...@lightspeed.ca
 Cc: Gluster Users gluster-users@gluster.org
 Sent: 4/27/2015 4:52:35 PM
 Subject: Re: [Gluster-users] Disastrous performance with rsync to
 mounted Gluster volume.
 
 - Original Message -
   From: Ernie Dunbar maill...@lightspeed.ca
   To: Gluster Users gluster-users@gluster.org
   Sent: Monday, April 27, 2015 4:24:56 PM
   Subject: Re: [Gluster-users] Disastrous performance with rsync to
 mounted Gluster volume.
 
   On 2015-04-24 11:43, Joe Julian wrote:
 
This should get you where you need to be.  Before you start to
 migrate
the data maybe do a couple DDs and send me the output so we can
 get an
idea of how your cluster performs:
   
time `dd if=/dev/zero of=gluster-mount/myfile bs=1024k
 count=1000;
sync`
echo 3  /proc/sys/vm/drop_caches
dd if=gluster mount of=/dev/null bs=1024k count=1000
   
If you are using gigabit and glusterfs mounts with replica 2 you
should get ~55 MB / sec writes and ~110 MB / sec reads.  With NFS
 you
will take a bit of a hit since NFS doesnt know where files live
 like
glusterfs does.
 
   After copying our data and doing a couple of very slow rsyncs, I did
   your speed test and came back with these results:
 
   1048576 bytes (1.0 MB) copied, 0.0307951 s, 34.1 MB/s
   root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
   count=1024 bs=1024; sync
   1024+0 records in
   1024+0 records out
   1048576 bytes (1.0 MB) copied, 0.0298592 s, 35.1 MB/s
   root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
   count=1024 bs=1024; sync
   1024+0 records in
   1024+0 records out
   1048576 bytes (1.0 MB) copied, 0.0501495 s, 20.9 MB/s
   root@backup:/home/webmailbak# echo 3  /proc/sys/vm/drop_caches
   root@backup:/home/webmailbak# # dd if=/mnt/testfile of=/dev/null
   bs=1024k count=1000
   1+0 records in
   1+0 records out
   1048576 bytes (1.0 MB) copied, 0.0124498 s, 84.2 MB/s
 
 
   Keep in mind that this is an NFS share over the network.
 
   I've also noticed that if I increase the count of those writes, the
   transfer speed increases as well:
 
   2097152 bytes (2.1 MB) copied, 0.036291 s, 57.8 MB/s
   root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
   count=2048 bs=1024; sync
   2048+0 records in
   2048+0 records out
   2097152 bytes (2.1 MB) copied, 0.0362724 s, 57.8 MB/s
   root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
   count=2048 bs=1024; sync
   2048+0 records in
   2048+0 records out
   2097152 bytes (2.1 MB) copied, 0.0360319 s, 58.2 MB/s
   root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
   count=10240 bs=1024; sync
   10240+0 records in
   10240+0 records out
   10485760 bytes (10 MB) copied, 0.127219 s, 82.4 MB/s
   root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
   count=10240 bs=1024; sync
   10240+0 records in
   10240+0 records out
   10485760 bytes (10 MB) copied, 0.128671 s, 81.5 MB/s
 
 This is correct, there is overhead that happens with small files and
 the smaller the file the less throughput

Re: [Gluster-users] unusual gluster-fuse client load

2015-04-27 Thread Ben Turner

- Original Message -
 From: Khoi Mai khoi...@up.com
 To: gluster-users@gluster.org
 Sent: Monday, April 27, 2015 10:43:27 AM
 Subject: [Gluster-users] unusual gluster-fuse client load

 All,

 I have an unusual situation.

 I have a client whose 1 fuse mount to a volume adds increased load to the
 machine when its mounted. When it is not mounted the server is well below
 1.00 in top/uptime for load avg.

 The strange part is, this gluster volume is mounted across multiple servers
 in my environment where the load is not observed.

 I've tried strace, and saw nothing usual. I've even rebooted the client
 thinking, that there is might be some odd cache but there was no change as
 the load started to climb after the reboot, and mounting of the fuse
 filesystem.

 Nothing in the client/server logs regarding that fuse volume.

 Would the community have any suggestions that I might investigate more of? I
 haven't tried to wireshark yet. But i'm not certain I know what to look for
 if I did capture the tcp dump.

What process is causing load to increase?  Normally I look at ps or top and see 
what is stuck in D or is using all my CPU.  Do you have anything scanning the 
gluster mount or accessing it in the background?  Just mounting on a client 
shouldn't cause the system to get too much busier, I would think that something 
is accessing the mount.  What is the load when you see when you mount?

-b

 Thanks,

 Khoi Mai
 Union Pacific Railroad
 Distributed Engineering  Architecture
 Senior Project Engineer

 **

 This email and any attachments may contain information that is confidential
 and/or privileged for the sole use of the intended recipient. Any use,
 review, disclosure, copying, distribution or reliance by others, and any
 forwarding of this email or its contents, without the express permission of
 the sender is strictly prohibited by law. If you are not the intended
 recipient, please contact the sender immediately, delete the e-mail and
 destroy all copies.

 **

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] SSD Cache Without RAID Controller

2015-04-24 Thread Ben Turner

- Original Message -
 From: Alex Crow ac...@integrafin.co.uk
 To: gluster-users@gluster.org
 Sent: Friday, April 24, 2015 4:36:01 AM
 Subject: Re: [Gluster-users] SSD Cache Without RAID Controller

 On 24/04/15 09:14, Punit Dambiwal wrote:
  Hi,

  I want to use the glusterfs with the following architecture :-

  1. 3* Supermicro servers As storage node.
  2. Every server has 10 SATA HDD (JBOD) and 2 SSD for caching (2
  Additional on back pane for OS).
  3. Gluster should be replica=3
  4. 10G network Connection

  The question is how to configure the SSD caching for 10 Bricks,just
  like CEPH (write the journal on SSD drives)...I didn't find any
  article for the same..

You should look at tiering, it is currently in development but will be ready 
soon-ish:

http://www.gluster.org/community/documentation/index.php/Features/data-classification

The hot tier would be the SSDs and the could tier would be your spinning disks. 
 It should be available with 3.7, if you are interested it would be awesome to 
have an early adopter :)

-b

  Thanks,
  Punit

 You have at least 2 choices for SSD caching with current GlusterFS.

 1. Use Centos 7 and use LVM cache on your SSDs.
 2. Use ZFS as your brick filesystem and set up an SLOG, say 4GB and an
 L2ARC (rest of the space) on the SSDs. Note that the SLOG would need to
 be a mirror if you only have 2 SSDs. L2ARC can be a stripe.

 Native Glusterfs tiering is being developed at the moment.

 Alex

 --
 This message is intended only for the addressee and may contain
 confidential information. Unless you are that person, you may not
 disclose its contents or use it in any way and are requested to delete
 the message along with any attachments and notify us immediately.
 Transact is operated by Integrated Financial Arrangements plc. 29
 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608
 5300. (Registered office: as above; Registered in England and Wales
 under number: 3727592). Authorised and regulated by the Financial
 Conduct Authority (entered on the Financial Services Register; no. 190856).
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disastrous performance with rsync to mounted Gluster volume.

2015-04-24 Thread Ben Turner

- Original Message -
 From: Joe Julian j...@julianfamily.org
 To: gluster-users@gluster.org
 Sent: Thursday, April 23, 2015 9:10:59 PM
 Subject: Re: [Gluster-users] Disastrous performance with rsync to mounted 
 Gluster volume.

 On 04/23/2015 04:41 PM, Ernie Dunbar wrote:
  On 2015-04-23 12:58, Ben Turner wrote:

  +1, lets nuke everything and start from a known good.  Those error
  messages make me think something is really wrong with how we are
  copying the data.  Gluster does NFS by default so you shouldn't have
  have to reconfigure anything after you recreate the volume.

  Okay... this is a silly question. How do I do that? Deleting the
  volume doesn't affect the files in the underlying filesystem, and I
  get the impression that trying to delete the files in the underlying
  filesystem without shutting down or deleting the volume would result
  in Gluster trying to write the files back where they belong.

  Should I stop the volume, delete it, then delete the files and start
  from scratch, re-creating the volume?

 That's what I would do.

Here is how I cleanup gluster, scroll down to the bottom it has the commands:

Look for - ** If you make a mistake or want to recreate things in a different 
config here are the commands to cleanup
http://54.82.237.211/gluster-benchmark/gluster-bench-README

  At this point, the cluster isn't live, so this is an entirely feasible
  thing to do. All the data exists somewhere else already, and I just
  need to copy it to the NFS share to get things going.

  ___
  Gluster-users mailing list
  Gluster-users@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-users

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disastrous performance with rsync to mounted Gluster volume.

2015-04-24 Thread Ben Turner

- Original Message -
 From: Ernie Dunbar maill...@lightspeed.ca
 To: Gluster Users gluster-users@gluster.org
 Sent: Friday, April 24, 2015 1:15:32 PM
 Subject: Re: [Gluster-users] Disastrous performance with rsync to mounted 
 Gluster volume.

 On 2015-04-23 18:10, Joe Julian wrote:
  On 04/23/2015 04:41 PM, Ernie Dunbar wrote:
  On 2015-04-23 12:58, Ben Turner wrote:

  +1, lets nuke everything and start from a known good.  Those error
  messages make me think something is really wrong with how we are
  copying the data.  Gluster does NFS by default so you shouldn't have
  have to reconfigure anything after you recreate the volume.

  Okay... this is a silly question. How do I do that? Deleting the
  volume doesn't affect the files in the underlying filesystem, and I
  get the impression that trying to delete the files in the underlying
  filesystem without shutting down or deleting the volume would result
  in Gluster trying to write the files back where they belong.

  Should I stop the volume, delete it, then delete the files and start
  from scratch, re-creating the volume?

  That's what I would do.

 Well, apparently removing the .glusterfs directory from the brick is an
 exceptionally bad thing, and breaks gluster completely, rendering it
 inoperable. I'm going to have to post another thread about how to fix
 this mess now.

You are correct and I would just start from scratch Ernie.  Creating a gluster 
cluster is only about 3-4 commands and should only take a minute or two.  Also 
with all the problems you are having I am not confident in your data integrity. 
 All you need to do to clear EVERYTHING out is:

service glusterd stop
killall glusterfsd
killall glusterfs
sleep 1
for file in /var/lib/glusterd/*; do if ! echo $file | grep 'hooks' /dev/null 
21;then rm -rf $file; fi; done

From there restart the gluster service and recreate everything:

service glusterd restart
make a new filesystem on your bricks, mount
gluster peer probe my peer
gluster v create my vol
gluster v start my vol
gluster v info 

From there mount the new volume on your system with the data  you want to 
migrate:

mount -t nfs -o vers=3 my vol my mount
rsync your rsync command

This should get you where you need to be.  Before you start to migrate the data 
maybe do a couple DDs and send me the output so we can get an idea of how your 
cluster performs:

time `dd if=/dev/zero of=gluster-mount/myfile bs=1024k count=1000; sync`
echo 3  /proc/sys/vm/drop_caches
dd if=gluster mount of=/dev/null bs=1024k count=1000

If you are using gigabit and glusterfs mounts with replica 2 you should get ~55 
MB / sec writes and ~110 MB / sec reads.  With NFS you will take a bit of a hit 
since NFS doesnt know where files live like glusterfs does.

-b

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disastrous performance with rsync to mounted Gluster volume.

2015-04-23 Thread Ben Turner

- Original Message -
 From: Ernie Dunbar maill...@lightspeed.ca
 To: Gluster Users gluster-users@gluster.org
 Sent: Thursday, April 23, 2015 1:54:35 PM
 Subject: [Gluster-users] Disastrous performance with rsync to mounted Gluster 
 volume.

 Hello everyone.

 I've built a replicated Gluster cluster (volume info shown below) of two
 Dell servers on a 1 GB switch, plus a second NIC on each server for
 replication data. But when I try to copy our mail store from our backup
 server onto the Gluster volume, I've been having nothing but trouble.

 I may have messed this right up the first time, as I just used rsync to
 copy all the files to the Linux filesystem on the primary Gluster
 server, instead of copying the data to an NFS or Gluster mount.
 Attempting to get Gluster to synchronize the files to the second Gluster
 server hasn't worked out very well at all, with about half the data
 actually copied to the second Gluster server. Attempts to force Gluster
 to synchronize this data have all failed (Gluster appears to think the
 data is already synchronized). This might be the best way of
 accomplishing this in the end, but in the meantime I've tried a
 different tack.

Gluster writes to both bricks in the pair at the same time, bricks should never 
be out of sync when they are both online.

 Now, I'm trying to mount the Gluster volume over the network from the
 backup server, using NFS (the backup server doesn't and can't have a
 compatible version of GlusterFS on it, I plan to nuke it and install an
 OS that does support it, but first we have to get this mail store copied
 over!). Then I use rsync to copy only missing files to the NFS share and
 let Gluster do its own replication. This has been many, many times
 slower than just using rsync to copy the files, even considering the
 amount of data (439 GB). CPU usage on the Gluster servers is fairly
 high, with a server load value of about 4 on an 8 CPU system. Network
 usage is... well, not that high. Maybe topping about 50-70 Mbps. This
 same story is true whether I'm looking at the network usage for the
 primary, server-facing network or the secondary, Gluster-only network,
 so I don't think the bottleneck is there. Hard drive utilization peaks
 at around 40% but doesn't really stay that high.

Hmm I am confused, are you using kernel NFS or gluster NFS to copy the data 
over?  The only way I can think of a file to get on gluster without a GFID is 
if you put it directly on the brick without going through glusterfs?  Be sure 
not to do work on the backend bricks, always access the data through gluster.  
The logs below defiantly indicate a problem.  Here is what I would do:

create gluster volume
On the server with the data:
mount -t nfs -o vers=3 my gluster volume /gluster-volume
cp -R my data my gluster mount
go home, drink beer, come back the next day

If you need to use rsync I would look at the --whole-file option and/or forcing 
it to write in bigger in larger block sizes.  The rsync workload is one of the 
worst I can think of for glusterfs, it does a ton of stating and copies the 
files in really small block sizes which creates tons of round trips for 
gluster.  How much data are you copying?  How many files?  Which version

To address:

Maybe topping about 50-70 Mbps.

This is as fast as gigabit + replica 2 will do from a single client, each 
client writes to both bricks at the same time which cuts your throughput in 
half.  Gigabit theoretical is 125 MB / sec, most gigabit NICs get ~120 MB / sec 
in the real world, half of that is ~60 MB / sec which is what gluster + replica 
2 will do from a single client.

 One possible clue may lie in Gluster's logs. I see millions of log
 entries like this:

 [2015-04-23 16:40:50.122007] I
 [afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
 0-gv2-replicate-0:
 gfid:912eec51-89dc-40ea-9dfd-072404d306a2/1355401127.H542717P24276.pop.lightspeed.ca:2,:
 Skipping entry self-heal because of gfid absence
 [2015-04-23 16:40:50.123327] I
 [afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
 0-gv2-replicate-0:
 gfid:912eec51-89dc-40ea-9dfd-072404d306a2/1355413874.H20794P22730.pop.lightspeed.ca:2,:
 Skipping entry self-heal because of gfid absence
 [2015-04-23 16:40:50.123705] I
 [afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
 0-gv2-replicate-0:
 gfid:912eec51-89dc-40ea-9dfd-072404d306a2/1355420013.H176322P3859.pop.lightspeed.ca:2,:
 Skipping entry self-heal because of gfid absence
 [2015-04-23 16:40:50.124030] I
 [afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
 0-gv2-replicate-0:
 gfid:912eec51-89dc-40ea-9dfd-072404d306a2/1355429494.H263072P14676.pop.lightspeed.ca:2,:
 Skipping entry self-heal because of gfid absence
 [2015-04-23 16:40:50.124423] I
 [afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
 0-gv2-replicate-0:
 gfid:912eec51-89dc-40ea-9dfd-072404d306a2/1355436426.H973617P29804.pop.lightspeed.ca:2,:
 Skipping entry self-heal because of gfid

Re: [Gluster-users] Glusterfs performance tweaks

2015-04-09 Thread Ben Turner

- Original Message -
 From: Punit Dambiwal hypu...@gmail.com
 To: Vijay Bellur vbel...@redhat.com
 Cc: gluster-users@gluster.org
 Sent: Wednesday, April 8, 2015 9:55:38 PM
 Subject: Re: [Gluster-users] Glusterfs performance tweaks

 Hi Vijay,

 If i run the same command directly on the brick...

 [root@cpu01 1]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
 4096+0 records in
 4096+0 records out
 268435456 bytes (268 MB) copied, 16.8022 s, 16.0 MB/s
 [root@cpu01 1]# pwd
 /bricks/1
 [root@cpu01 1]#

This is your problem.  Gluster is only as fast as its slowest piece, and here 
your storage is the bottleneck.  Being that you get 16 MB to the brick and 12 
to gluster that works out to about 25% overhead which is what I would expect 
with a single thread, single brick, single client scenario.  This may have 
something to do with the way SSDs write?  On my SSD at my desk I only get 11.4 
MB / sec when I run that DD command:

# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 23.065 s, 11.4 MB/s

My thought is that maybe using dsync is forcing the SSD to clean the data or 
something else before writing to it:

http://www.blog.solidstatediskshop.com/2012/how-does-an-ssd-write/

Do your drives support fstrim?  It may be worth it to trim before you run and 
see what results you get.  Other than tuning the SSD / OS to perform better on 
the back end there isn't much we can do from the gluster perspective on that 
specific DD w/ the dsync flag.

-b

 On Wed, Apr 8, 2015 at 6:44 PM, Vijay Bellur  vbel...@redhat.com  wrote:

 On 04/08/2015 02:57 PM, Punit Dambiwal wrote:

 Hi,

 I am getting very slow throughput in the glusterfs (dead slow...even
 SATA is better) ... i am using all SSD in my environment.

 I have the following setup :-
 A. 4* host machine with Centos 7(Glusterfs 3.6.2 | Distributed
 Replicated | replica=2)
 B. Each server has 24 SSD as bricks…(Without HW Raid | JBOD)
 C. Each server has 2 Additional ssd for OS…
 D. Network 2*10G with bonding…(2*E5 CPU and 64GB RAM)

 Note :- Performance/Throughput slower then Normal SATA 7200 RPM…even i
 am using all SSD in my ENV..

 Gluster Volume options :-

 +++
 Options Reconfigured:
 performance.nfs.write-behind- window-size: 1024MB
 performance.io-thread-count: 32
 performance.cache-size: 1024MB
 cluster.quorum-type: auto
 cluster.server-quorum-type: server
 diagnostics.count-fop-hits: on
 diagnostics.latency- measurement: on
 nfs.disable: on
 user.cifs: enable
 auth.allow: *
 performance.quick-read: off
 performance.read-ahead: off
 performance.io-cache: off
 performance.stat-prefetch: off
 cluster.eager-lock: enable
 network.remote-dio: enable
 storage.owner-uid: 36
 storage.owner-gid: 36
 server.allow-insecure: on
 network.ping-timeout: 0
 diagnostics.brick-log-level: INFO
 +++

 Test with SATA and Glusterfs SSD….
 ———
 Dell EQL (SATA disk 7200 RPM)
 —-
 [root@mirror ~]#
 4096+0 records in
 4096+0 records out
 268435456 bytes (268 MB) copied, 20.7763 s, 12.9 MB/s
 [root@mirror ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
 4096+0 records in
 4096+0 records out
 268435456 bytes (268 MB) copied, 23.5947 s, 11.4 MB/s

 GlsuterFS SSD
 —
 [root@sv-VPN1 ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
 4096+0 records in
 4096+0 records out
 268435456 bytes (268 MB) copied, 66.2572 s, 4.1 MB/s
 [root@sv-VPN1 ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
 4096+0 records in
 4096+0 records out
 268435456 bytes (268 MB) copied, 62.6922 s, 4.3 MB/s

 Please let me know what i should do to improve the performance of my
 glusterfs…

 What is the throughput that you get when you run these commands on the disks
 directly without gluster in the picture?

 By running dd with dsync you are ensuring that there is no buffering anywhere
 in the stack and that is the reason why low throughput is being observed.

 -Vijay

 -Vijay

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterfs performance tweaks

2015-04-08 Thread Ben Turner

- Original Message -
 From: Vijay Bellur vbel...@redhat.com
 To: Punit Dambiwal hypu...@gmail.com, gluster-users@gluster.org
 Sent: Wednesday, April 8, 2015 6:44:42 AM
 Subject: Re: [Gluster-users] Glusterfs performance tweaks

 On 04/08/2015 02:57 PM, Punit Dambiwal wrote:
  Hi,

  I am getting very slow throughput in the glusterfs (dead slow...even
  SATA is better) ... i am using all SSD in my environment.

  I have the following setup :-
  A. 4* host machine with Centos 7(Glusterfs 3.6.2 | Distributed
  Replicated | replica=2)
  B. Each server has 24 SSD as bricks…(Without HW Raid | JBOD)
  C. Each server has 2 Additional ssd for OS…
  D. Network 2*10G with bonding…(2*E5 CPU and 64GB RAM)

  Note :- Performance/Throughput slower then Normal SATA 7200 RPM…even i
  am using all SSD in my ENV..

  Gluster Volume options :-

  +++
  Options Reconfigured:
  performance.nfs.write-behind-window-size: 1024MB
  performance.io-thread-count: 32
  performance.cache-size: 1024MB
  cluster.quorum-type: auto
  cluster.server-quorum-type: server
  diagnostics.count-fop-hits: on
  diagnostics.latency-measurement: on
  nfs.disable: on
  user.cifs: enable
  auth.allow: *
  performance.quick-read: off
  performance.read-ahead: off
  performance.io-cache: off
  performance.stat-prefetch: off
  cluster.eager-lock: enable
  network.remote-dio: enable
  storage.owner-uid: 36
  storage.owner-gid: 36
  server.allow-insecure: on
  network.ping-timeout: 0
  diagnostics.brick-log-level: INFO
  +++

  Test with SATA and Glusterfs SSD….
  ———
  Dell EQL (SATA disk 7200 RPM)
  —-
  [root@mirror ~]#
  4096+0 records in
  4096+0 records out
  268435456 bytes (268 MB) copied, 20.7763 s, 12.9 MB/s
  [root@mirror ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
  4096+0 records in
  4096+0 records out
  268435456 bytes (268 MB) copied, 23.5947 s, 11.4 MB/s

  GlsuterFS SSD
  —
  [root@sv-VPN1 ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
  4096+0 records in
  4096+0 records out
  268435456 bytes (268 MB) copied, 66.2572 s, 4.1 MB/s
  [root@sv-VPN1 ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
  4096+0 records in
  4096+0 records out
  268435456 bytes (268 MB) copied, 62.6922 s, 4.3 MB/s

  Please let me know what i should do to improve the performance of my
  glusterfs…

 What is the throughput that you get when you run these commands on the
 disks directly without gluster in the picture?

 By running dd with dsync you are ensuring that there is no buffering
 anywhere in the stack and that is the reason why low throughput is being
 observed.

This is slow for the env you described.  Are you sure you are using your 10G 
NICs?  What do you see with iperf between the client and server?  In my env 
with 12 spinning disks in a RAID 6 + single 10G NIC I get:

[root@gqac025 gluster-mount]# dd if=/dev/zero of=test bs=64k count=4k 
oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 9.88752 s, 27.1 MB/s

A couple things to check with your SSDs:

-Scheduler {noop or deadline }
-No read ahead!
-No RAID!
-Make sure the kernel seems them as SSDs

As Vijay said you will see WAY better throughput if you get rid of the dsync 
flag.  Maybe try something like:

$ time `dd if=/dev/zero of=/gluster-mount/myfile bs=1024k count=1000; sync`

That will give you an idea of what it takes to write to RAM then sync the dirty 
pages to disk.

-b

 -Vijay

 -Vijay

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 答复: Gluster/NFS mount issues

2015-03-26 Thread Ben Turner

NP!  Happy to help!  Yepo if you see that message you can assume that the NIC 
wasn't fully up and when it tried to contact the server but wasn't yet able to 
route traffic.

-b

- Original Message -
 From: 何亦军 heyi...@greatwall.com.cn
 To: Ben Turner btur...@redhat.com
 Cc: gluster-users@gluster.org
 Sent: Thursday, March 26, 2015 9:23:31 PM
 Subject: 答复: [Gluster-users] Gluster/NFS mount issues

 Thanks Ben Turner,

 The  LINKDELAY=time  fixed my fstab problem,  thank very much.

 BTW, my issue message in log:

 [socket.c:2267:socket_connect_finish] 0-glusterfs: connection to
 192.168.0.61:24007 failed (No route to host)
 [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect
 with remote-host: gwgfs01 (Transport endpoint is not connected)

 From: Ben Turner btur...@redhat.commailto:btur...@redhat.com
 To: Alun James aja...@tibus.commailto:aja...@tibus.com
 Cc: gluster-users@gluster.orgmailto:gluster-users@gluster.org
 Sent: Wednesday, 25 March, 2015 9:21:14 PM
 Subject: Re: [Gluster-users] Gluster/NFS mount issues

 Normally when I see this the NICs are not fully initialized.  I have done a
 couple different things to work around this:

 -Try adding the linkdelay parameter to the ifcfg script:

 LINKDELAY=time
 where time is the number of seconds to wait for link negotiation before
 configuring the device.

 -Try turning on portfast on your switch to speed up negotiation.

 -Try putting a sleep in your init scripts just before it goes to mount your
 fstab items

 -Try putting the mount command in rc.local or whatever is the last thing your
 system does before it boots.

 Last time I looked at the _netdev code it only looked for an active link, it
 didn't ensure that the NIC was up and able to send traffic.  I would start
 with the linkdelay and go from there.  LMK how this works out for ya, I am
 not very well versed on the Ubuntu boot process :/

 -b

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster/NFS mount issues

2015-03-25 Thread Ben Turner

Normally when I see this the NICs are not fully initialized.  I have done a 
couple different things to work around this:

-Try adding the linkdelay parameter to the ifcfg script:

LINKDELAY=time
where time is the number of seconds to wait for link negotiation before 
configuring the device.

-Try turning on portfast on your switch to speed up negotiation.

-Try putting a sleep in your init scripts just before it goes to mount your 
fstab items

-Try putting the mount command in rc.local or whatever is the last thing your 
system does before it boots.

Last time I looked at the _netdev code it only looked for an active link, it 
didn't ensure that the NIC was up and able to send traffic.  I would start with 
the linkdelay and go from there.  LMK how this works out for ya, I am not very 
well versed on the Ubuntu boot process :/

-b

- Original Message -
 From: Alun James aja...@tibus.com
 To: gluster-users@gluster.org
 Sent: Wednesday, March 25, 2015 6:33:05 AM
 Subject: [Gluster-users] Gluster/NFS mount issues
 
 Hi folks,
 
 I am having some issues getting NFS to mount the glusterfs volume on boot-up,
 I have tried all the usual mount options in fstab, but thus far none have
 helped I am using NFS as it seems to give better performance for my workload
 compared with glusterfs client.
 
 [Node Setup]
 
 3 x Nodes mounting vol locally.
 Ubuntu 14.04 3.13.0-45-generic
 GlusterFS: 3.6.2-ubuntu1~trusty3
 nfs-common 1:1.2.8-6ubuntu1.1
 
 Type: Replicate
 Status: Started
 Number of Bricks: 1 x 3 = 3
 Transport-type: tcp
 Bricks:
 Brick1: node01:/export/brick0
 Brick2: node 02:/export/brick0
 Brick3: node 03:/export/brick0
 
 /etc/fstab:
 
 /dev/mapper/gluster--vg-brick0 /export/brick0 xfs defaults 0 0
 
 localhost:/my_filestore_vol /data nfs
 defaults,nobootwait,noatime,_netdev,nolock,mountproto=tcp,vers=3 0 0
 
 
 [Issue]
 
 On boot, the /data partition is not mounted, however, I can jump on each node
 and simply run mount /data without any problems, so I assume my fstab
 options are OK. I have noticed the following log:
 
 /var/log/upstart/mountall.log:
 
 mount.nfs: requested NFS version or transport protocol is not supported
 mountall: mount /data [1178] terminated with status 32
 
 I have attempted the following fstab options without success and similar log
 message:
 
 localhost:/my_filestore_vol /data nfs
 defaults,nobootwait,noatime,_netdev,nolock 0 0
 localhost:/my_filestore_vol /data nfs
 defaults,nobootwait,noatime,_netdev,nolock,mountproto=tcp,vers=3 0 0
 localhost:/my_filestore_vol /data nfs
 defaults,nobootwait,noatime,_netdev,nolock,vers=3 0 0
 localhost:/my_filestore_vol /data nfs
 defaults,nobootwait,noatime,_netdev,nolock,nfsvers=3 0 0
 localhost:/my_filestore_vol /data nfs
 defaults,nobootwait,noatime,_netdev,nolock,mountproto=tcp,nfsvers=3 0 0
 
 Anything else I can try?
 
 Regards,
 
 ALUN JAMES
 Senior Systems Engineer
 Tibus
 
 T: +44 (0)28 9033 1122
 E: aja...@tibus.com
 W: www.tibus.com
 
 Follow us on Twitter @tibus
 
 Tibus is a trading name of The Internet Business Ltd, a company limited by
 share capital and registered in Northern Ireland, NI31235. It is part of UTV
 Media Plc.
 
 This email and any attachment may contain confidential information for the
 sole use of the intended recipient. Any review, use, distribution or
 disclosure by others is strictly prohibited. If you are not the intended
 recipient (or authorised to receive for the recipient), please contact the
 sender by reply email and delete all copies of this message.
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] What should I do to improve performance ?

2015-03-23 Thread Ben Turner

- Original Message -
 From: marianna cattani marianna.catt...@gmail.com
 To: gluster-users@gluster.org
 Sent: Monday, March 23, 2015 6:09:41 AM
 Subject: [Gluster-users] What should I do to improve performance ?

 Dear all,
 I followed the tutorial I read at this link :
 http://www.gluster.org/documentation/use_cases/Virt-store-usecase/

 I have 4 nodes configured as a linked list , each node also performs virtual
 machines with KVM and mounts on its ip address, like this:

 172.16.155.12:/nova /var/lib/nova/instances glusterfs defaults,_netdev 0 0

 Each node has two nic (ten giga) bonded in mode 4.

 What can I do to further improve the speed ?

What kind of disks are back ending your 10G NICs?  Are you using FUSE or 
libgfapi to connect to gluster from your hypervisor?  What kind of speeds are 
you expecting vs seeing in your environment?  We need to understand what your 
HW can do first then gather some data running on gluster and compare the two.  
As a rule of thumb with replica 2 you should see about:

throughput = ( NIC line speed / 2 ) - 20% overhead

As long as your disks can service it.  If you are seeing about that on the 
gluster mounts then go inside one of the VMs and run the same test, the VM 
should get something similar.  If you aren't seeing at least 400 MB / sec on 
sequential writes and 500-700 MB /sec on reads then there may be something off 
in your storage stack.

-b

 BR.

 M.

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS gluster mount on Windows

2015-03-12 Thread Ben Turner

Hi Jeremy.  A couple things first, as of today gluster only supports NFS v3 so 
make sure you are using v3 and tcp mounts.  I haven't personally tested Windows 
NFS with gluster, which version of Windows are you using?  I would be happy to 
setup a test in our lab in hopes of helping get this working, I have seen 
issues with other OSs mounting gNFS that had to be resolved, maybe we are 
hitting something similar here.

-b

- Original Message -
 From: Jeremy Jarvis jeremy.jar...@hexagongeospatial.com
 To: gluster-users@gluster.org
 Sent: Thursday, March 12, 2015 3:57:00 AM
 Subject: [Gluster-users] NFS gluster mount on Windows
 
 
 
 Hi,
 
 
 
 I’m trying to use Windows Services for NFS to expose a gluster node and
 access this folder with an Application Pool Identity. I can map the drive
 correctly and read access works but I can’t get write working.
 
 
 
 
 
 The procedure I’m using is:
 
 Set the DefaultApplicationPoolIdentity to use user “CustomUser”
 
 
 
 Set the following environment variables to 0 (for root)
 
 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ClientForNFS
 \CurrentVersion\Default\AnonymousUid
 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ClientForNFS
 \CurrentVersion\Default\AnonymousGid
 
 
 
 Open a cmd with “CustomUser”
 
 mount -o nolock ip-of-gluster:/datapoint Z:
 
 SUCCESS
 
 
 
 Then the iis application is supposed to write log files out to Z:\ but always
 fails. I wrote a small test app that stats the folder and I can successfully
 stat the directory with “CustomUser”. Right click  PropertiesNFS
 Attributes shows RWX for all and the NFS Mount Options shows UID 0 Primary
 GID 0.
 
 
 
 I can’t think of anything else that would be standing in the way. Any advice
 is greatly appreciated.
 
 
 
 Thanks, Jeremy.
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Need advice: GlusterFS replicated on arrays with different speed!!!

2015-03-12 Thread Ben Turner

- Original Message -
 From: wodel youchi wodel.you...@gmail.com
 To: gluster-users gluster-users@gluster.org
 Sent: Thursday, March 12, 2015 6:15:20 PM
 Subject: [Gluster-users] Need advice: GlusterFS replicated on arrays with 
 different speed!!!

 Hi,

 Please I need some advice.

 We have two iSCSI disk arrays, the first with 16 (2.5) disks, 600Go SAS 10K
 each, the second 5 (3.5) disks, 4To SATA 7200 each.

 is it wise to configure GlusterFS replicated volumes upon arrays with
 different speed?

I would try to match similar speed drives in replicated pairs.  When you create 
the volume do something like:

gluster v create testvol transport tcp replica 2 node1:/fast node2:/fast 
node3:/slow node4:/slow

I can't think of anything functionally that would cause a problem, you would 
only write as fast as your slowest brick in the pair though.  I see something 
similar when a disk in my RAID goes bad and starts running in degraded mode.  
Things work you are just limited to the speed of the slowest brick in the 
replicated pair.

-b

 Thanks in advance.

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Poor Gluster performance

2015-02-18 Thread Ben Turner

- Original Message -
 From: Lars Hanke deb...@lhanke.de
 To: gluster-users@gluster.org
 Sent: Wednesday, February 18, 2015 3:01:54 PM
 Subject: [Gluster-users] Poor Gluster performance

 I set up a distributed, replicated volume consisting of just 2 bricks on
 two physical nodes. The nodes are peered using a dedicated GB ethernet
 and can be accessed from the clients using a separate GB ethernet NIC.

 Doing a simple dd performance test I see about 11 MB/s for read and
 write. Running a local setup, i.e. both bricks on the same machine and
 local mount, I saw even 500 MB/s. So network sould be the limiting
 factor. But using NFS or CIFS on the same network I see 110 MB/s.

 Is gluster 10 times slower than NFS?

Something is going on there.  On my gigabit setups I see 100-120 MB / sec 
writes for pure distribute and about 45-55 MB / sec with replica 2.  What block 
size are you using?  I could see that if you were writing something like 4k or 
under but 64k and up you should be getting about what I said.  Can you tell me 
more about your test?

-b

 Gluster version is 3.6.2.

 Thanks for your help,
   - lars.
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Poor Gluster performance

2015-02-18 Thread Ben Turner

- Original Message -
 From: Lars Hanke deb...@lhanke.de
 To: Ben Turner btur...@redhat.com
 Cc: gluster-users@gluster.org
 Sent: Wednesday, February 18, 2015 5:09:19 PM
 Subject: Re: [Gluster-users] Poor Gluster performance

 Am 18.02.2015 um 22:05 schrieb Ben Turner:
  - Original Message -
  From: Lars Hanke deb...@lhanke.de
  To: gluster-users@gluster.org
  Sent: Wednesday, February 18, 2015 3:01:54 PM
  Subject: [Gluster-users] Poor Gluster performance

  I set up a distributed, replicated volume consisting of just 2 bricks on
  two physical nodes. The nodes are peered using a dedicated GB ethernet
  and can be accessed from the clients using a separate GB ethernet NIC.

  Doing a simple dd performance test I see about 11 MB/s for read and
  write. Running a local setup, i.e. both bricks on the same machine and
  local mount, I saw even 500 MB/s. So network sould be the limiting
  factor. But using NFS or CIFS on the same network I see 110 MB/s.

  Is gluster 10 times slower than NFS?

  Something is going on there.  On my gigabit setups I see 100-120 MB / sec
  writes for pure distribute and about 45-55 MB / sec with replica 2.  What
  block size are you using?  I could see that if you were writing something
  like 4k or under but 64k and up you should be getting about what I said.
  Can you tell me more about your test?

 Block size is 50M:

 root@gladsheim:/# mount -t glusterfs node2:/test ~/mnt
 root@gladsheim:/# dd if=/dev/zero of=~/mnt/testfile.null bs=50M count=10
 10+0 records in
 10+0 records out
 524288000 bytes (524 MB) copied, 46.6079 s, 11.2 MB/s
 root@gladsheim:/# dd if=~/mnt/testfile.null of=/dev/null bs=50M count=10
 10+0 records in
 10+0 records out
 524288000 bytes (524 MB) copied, 45.7487 s, 11.5 MB/s

 It doesn't depend on whether I use node1 or node2 for the mount.

Here is how I usually run:

[root@gqac022 gluster-mount]# time `dd if=/dev/zero of=/gluster-mount/test.txt 
bs=1024k count=1000; sync`
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 9.12639 s, 115 MB/s
real0m9.205s
user0m0.000s
sys 0m0.670s

[root@gqac022 gluster-mount]# sync; echo 3  /proc/sys/vm/drop_caches 

[root@gqac022 gluster-mount]# dd if=./test.txt of=/dev/null bs=1024k count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 9.04464 s, 116 MB/s

And with your commands:

[root@gqac022 gluster-mount]# dd if=/dev/zero of=/gluster-mount/testfile.null 
bs=50M count=10
10+0 records in
10+0 records out
524288000 bytes (524 MB) copied, 5.00876 s, 105 MB/s

[root@gqac022 gluster-mount]# sync; echo 3  /proc/sys/vm/drop_caches 

[root@gqac022 gluster-mount]# dd if=./testfile.null of=/dev/null bs=1024k 
count=1000
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 4.51992 s, 116 MB/s

Normally to troubleshoot these issues I break the storage stack into it's 
individual pieces and test each one.  Try running on the bricks outside gluster 
and see what you are getting.  What all tuning are you using?  Is anything 
nonstandard?  What are the disks?

-b

 BTW: does the cut of the bandwidth to half in replicated mode mean that
 the client writes to both nodes, i.e. doubles the network load on the
 client side network? I hoped that replication would be run on the server
 side network.

Correct, replication is done client side by writing to both bricks.

 Regards,
   - lars.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Poor Gluster performance

2015-02-18 Thread Ben Turner

- Original Message -
 From: Lars Hanke deb...@lhanke.de
 To: Ben Turner btur...@redhat.com
 Cc: gluster-users@gluster.org
 Sent: Wednesday, February 18, 2015 5:09:19 PM
 Subject: Re: [Gluster-users] Poor Gluster performance

 Am 18.02.2015 um 22:05 schrieb Ben Turner:
  - Original Message -
  From: Lars Hanke deb...@lhanke.de
  To: gluster-users@gluster.org
  Sent: Wednesday, February 18, 2015 3:01:54 PM
  Subject: [Gluster-users] Poor Gluster performance

  I set up a distributed, replicated volume consisting of just 2 bricks on
  two physical nodes. The nodes are peered using a dedicated GB ethernet
  and can be accessed from the clients using a separate GB ethernet NIC.

  Doing a simple dd performance test I see about 11 MB/s for read and
  write. Running a local setup, i.e. both bricks on the same machine and
  local mount, I saw even 500 MB/s. So network sould be the limiting
  factor. But using NFS or CIFS on the same network I see 110 MB/s.

  Is gluster 10 times slower than NFS?

  Something is going on there.  On my gigabit setups I see 100-120 MB / sec
  writes for pure distribute and about 45-55 MB / sec with replica 2.  What
  block size are you using?  I could see that if you were writing something
  like 4k or under but 64k and up you should be getting about what I said.
  Can you tell me more about your test?

 Block size is 50M:

 root@gladsheim:/# mount -t glusterfs node2:/test ~/mnt
 root@gladsheim:/# dd if=/dev/zero of=~/mnt/testfile.null bs=50M count=10
 10+0 records in
 10+0 records out
 524288000 bytes (524 MB) copied, 46.6079 s, 11.2 MB/s
 root@gladsheim:/# dd if=~/mnt/testfile.null of=/dev/null bs=50M count=10
 10+0 records in
 10+0 records out
 524288000 bytes (524 MB) copied, 45.7487 s, 11.5 MB/s

This looks like the NICs may only be negotiating to 100Mb(max theoretical of 
12.5 MB / sec), can you check ethtool on all of your NICs?  Also I like to run 
iperf between servers and clients and servers and servers before I do anything 
with gluster, if you aren't getting ~line speed with iperf gluster wont be able 
to either.  Double check you NICs and your backend and see if you can spot the 
bottleneck at either of those layers.

-b

 It doesn't depend on whether I use node1 or node2 for the mount.

 BTW: does the cut of the bandwidth to half in replicated mode mean that
 the client writes to both nodes, i.e. doubles the network load on the
 client side network? I hoped that replication would be run on the server
 side network.

 Regards,
   - lars.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Looking for volunteer to write up official How to do GlusterFS in the Cloud: The Right Way for Rackspace...

2015-02-18 Thread Ben Turner

- Original Message -
 From: Justin Clift jus...@gluster.org
 To: Benjamin Turner bennytu...@gmail.com
 Cc: Gluster Users gluster-users@gluster.org, Gluster Devel 
 gluster-de...@gluster.org, Jesse Noller
 jesse.nol...@rackspace.com
 Sent: Tuesday, February 17, 2015 6:52:48 PM
 Subject: Re: [Gluster-users] [Gluster-devel] Looking for volunteer to write   
 up official How to do GlusterFS in the
 Cloud: The Right Way for Rackspace...

 Yeah, that'd be pretty optimal.  How full on do you want to go?

I have some benchmark kits that can easily be enhanced to work with ubuntu, it 
should just be a matter of checking /etc/redaht-release and the ubuntu 
equivalent and running apt instead of yum.  I was thinking I could do the 
testing on the RPM based systems and the other contributor could run on deb?  
We can share scripts / tuning / recommendations and at the end of this could 
have nice setup / benchmark kits that work on a variety of OSs + the DOC we are 
looking for.  The tool uses IOZone, smallfile, and fio, runs the tool N # of 
times to get sample set, and does some statistical analysis on the samples(avg, 
std dev, delta between samples) before moving on to the next test.

 Should we look into arranging some OnMetal stuff, in addition to the
 VM offerings?

I have all the BM numbers we can handle, happy to provide them.  1G, 10G, RDMA, 
spinning and SSDs.  I will be rerunning everything with the mt epoll + RDMA 
changes also.

 Jesse, Ben is our *very best* GlusterFS and RHS performance tuning
 expert, so capturing his interest is a *very* good thing. ;)

A little overstated but the sentiment is appreciated :)

 + Justin

 On 17 Feb 2015, at 23:10, Benjamin Turner bennytu...@gmail.com wrote:
  This is interesting to me, I'd like the chance to run my performance tests
  on a cloud provider's systems.  We could put together some recommendations
  for configuration, tuning, and performance numbers?  Also it would be cool
  to enhance my setup scripts to work with cloud instances.  Sound like what
  you are looking for ish?

  -b

  On Tue, Feb 17, 2015 at 5:06 PM, Justin Clift jus...@gluster.org wrote:
  On 17 Feb 2015, at 21:49, Josh Boon glus...@joshboon.com wrote:
   Do we have use cases to focus on? Gluster is part of the answer to many
   different questions so if it's things like simple replication and
   distribution and basic performance tuning I could help. I also have a
   heavy Ubuntu tilt so if it's Red Hat oriented I'm not much help :)

  Jesse, thoughts on this?

  I kinda think it would be useful to have instructions which give
  correct steps for Ubuntu + Red Hat (and anything else suitable).

  Josh, if Jesse agrees, then your Ubuntu knowledge will probably
  be useful for this. ;)

  + Justin

   - Original Message -
   From: Justin Clift jus...@gluster.org
   To: Gluster Users gluster-users@gluster.org, Gluster Devel
   gluster-de...@gluster.org
   Cc: Jesse Noller jesse.nol...@rackspace.com
   Sent: Tuesday, February 17, 2015 9:37:05 PM
   Subject: [Gluster-devel] Looking for volunteer to write up official How
   to   do GlusterFS in the Cloud: The Right Way for Rackspace...

   Yeah, huge subject line.  :)

   But it gets the message across... Rackspace provide us a *bunch* of
   online VM's
   which we have our infrastructure in + run the majority of our regression
   tests
   with.

   They've asked us if we could write up a How to do GlusterFS in the
   Cloud: The
   Right Way (technical) doc, for them to add to their doc collection.
   They get asked for this a lot by customers. :D

   Sooo... looking for volunteers to write this up.  And yep, you're welcome
   to
   have your name all over it (eg this is good promo/CV material :)

   VM's (in Rackspace obviously) will be provided of course.

   Anyone interested?

   (Note - not suitable for a GlusterFS newbie. ;))

   Regards and best wishes,

   Justin Clift

   --
   GlusterFS - http://www.gluster.org

   An open source, distributed file system scaling to several
   petabytes, and handling thousands of clients.

   My personal twitter: twitter.com/realjustinclift

   ___
   Gluster-devel mailing list
   gluster-de...@gluster.org
   http://www.gluster.org/mailman/listinfo/gluster-devel

  --
  GlusterFS - http://www.gluster.org

  An open source, distributed file system scaling to several
  petabytes, and handling thousands of clients.

  My personal twitter: twitter.com/realjustinclift

  ___
  Gluster-devel mailing list
  gluster-de...@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel

 --
 GlusterFS - http://www.gluster.org

 An open source, distributed file system scaling to several
 petabytes, and handling thousands of clients.

 My personal twitter: twitter.com/realjustinclift

 ___

Re: [Gluster-users] Simulate split-brain on gluster 3.6

2015-02-16 Thread Ben Turner

- Original Message -
 From: Félix de Lelelis felix.deleli...@gmail.com
 To: gluster-users@gluster.org
 Sent: Monday, February 16, 2015 3:59:05 AM
 Subject: [Gluster-users] Simulate split-brain on gluster 3.6

 Hi,

 I am simulating a split brain condition on my cluster but I don't be able it.
 I have disconnected the nodes and creating a file with the same name and
 different contents but always the self-heal process take the last copy of
 the file.

 How can create thos condition?

1.  Kill both brick processes
2.  Edit the changelog on both files on the backend, make them different:

getfattr -d -e hex -m trusted.afr. my file
You should see something like:

trusted.afr.GLUSTER-SHARE-client-0=0x
trusted.afr.GLUSTER-SHARE-client-1=0x

Next set them to values that clash:

setfattr -n trusted.afrGLUSTER-SHARE-client-0 -v 0x0002
setfattr -n trusted.afrGLUSTER-SHARE-client-1 -v 0x0003

setfattr -n trusted.afrGLUSTER-SHARE-client-0 -v 0x0003
setfattr -n trusted.afrGLUSTER-SHARE-client-1 -v 0x0002

3.  Restart both brick processes.

4.  Gluster v heal my vol info split-brain

I haven't done this for a while but IIRC those command will get you there.  The 
-v syntax may not be correct on the setfattr, but this should get you going.

-b

 thanks

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster performance on the small files

2015-02-16 Thread Ben Turner

- Original Message -
 From: Joe Julian j...@julianfamily.org
 To: Punit Dambiwal hypu...@gmail.com, gluster-users@gluster.org, Humble 
 Devassy Chirammal
 humble.deva...@gmail.com
 Sent: Monday, February 16, 2015 3:32:31 PM
 Subject: Re: [Gluster-users] Gluster performance on the small files

 On 02/12/2015 10:58 PM, Punit Dambiwal wrote:

 Hi,

 I have seen the gluster performance is dead slow on the small files...even i
 am using the SSDit's too bad performanceeven i am getting better
 performance in my SAN with normal SATA disk...

 I am using distributed replicated glusterfs with replica count=2...i have all
 SSD disks on the brick...

 root@vm3:~# dd bs=64k count=4k if=/dev/zero of=test oflag=dsync

 4096+0 records in

 4096+0 records out

 268435456 bytes (268 MB) copied, 57.3145 s, 4.7 MB/s

This seems pretty slow, even if you are using gigabit.  Here is what I get:

[root@gqac031 smallfile]# dd bs=64k count=4k if=/dev/zero 
of=/gluster-emptyvol/test oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 10.5965 s, 25.3 MB/s

FYI this is on my 2 node pure replica + spinning disks(RAID 6, this is not 
setup for smallfile workloads.  For smallfile I normally use RAID 10) + 10G.

The single threaded DD process is defiantly a bottle neck here, the power in 
distributed systems is doing things in parallel across clients / threads.  You 
may want to try smallfile:

http://www.gluster.org/community/documentation/index.php/Performance_Testing

Smallfile command used - python /small-files/smallfile/smallfile_cli.py 
--operation create --threads 8 --file-size 64 --files 1 --top 
/gluster-emptyvol/ --pause 1000 --host-set client1, client2

total threads = 16
total files = 157100
total data = 9.589 GB
 98.19% of requested files processed, minimum is  70.00
41.271602 sec elapsed time
3806.491454 files/sec
3806.491454 IOPS
237.905716 MB/sec

If you wanted to do something similar with DD you could do:

my script
for i in `seq 1..4`
do
dd bs=64k count=4k if=/dev/zero of=/gluster-emptyvol/test$i oflag=dsync 
done
for pid in $(pidof dd); do
while kill -0 $pid; do
sleep 0.1
done
done

# time myscript.sh

Then do the math to figure out the MB / sec of the system.

-b 

 root@vm3:~# dd bs=64k count=4k if=/dev/zero of=test conv=fdatasync

 4096+0 records in

 4096+0 records out

 268435456 bytes (268 MB) copied, 1.80093 s, 149 MB/s

 How small is your VM image? The image is the file that GlusterFS is serving,
 not the small files within it. Perhaps the filesystem you're using within
 your VM is inefficient with regard to how it handles disk writes.

 I believe your concept of small file performance is misunderstood, as is
 often the case with this phrase. The small file issue has to do with the
 overhead of finding and checking the validity of any file, but with a small
 file the percentage of time doing those checks is proportionally greater.
 With your VM image, that file is already open. There are no self-heal checks
 or lookups that are happening in your tests, so that overhead is not the
 problem.

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] missing files

2015-02-05 Thread Ben Turner

- Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Xavier Hernandez xhernan...@datalab.es, David F. Robinson 
 david.robin...@corvidtec.com, Benjamin Turner
 bennytu...@gmail.com
 Cc: gluster-users@gluster.org, Gluster Devel gluster-de...@gluster.org
 Sent: Thursday, February 5, 2015 5:30:04 AM
 Subject: Re: [Gluster-users] [Gluster-devel] missing files

 On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:
  I believe David already fixed this. I hope this is the same issue he
  told about permissions issue.
 Oops, it is not. I will take a look.

Yes David exactly like these:

data-brick02a-homegfs.log:[2015-02-03 19:09:34.568842] I 
[server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection 
from gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0
data-brick02a-homegfs.log:[2015-02-03 19:09:41.286551] I 
[server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection 
from gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0
data-brick02a-homegfs.log:[2015-02-03 19:16:35.906412] I 
[server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection 
from gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0
data-brick02a-homegfs.log:[2015-02-03 19:51:22.761293] I 
[server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection 
from gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0
data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I 
[server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection 
from gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1

You can 100% verify my theory if you can correlate the time on the disconnects 
to the time that the missing files were healed.  Can you have a look at 
/var/log/glusterfs/glustershd.log?  That has all of the healed files + 
timestamps, if we can see a disconnect during the rsync and a self heal of the 
missing file I think we can safely assume that the disconnects may have caused 
this.  I'll try this on my test systems, how much data did you rsync?  What 
size ish of files / an idea of the dir layout?  

@Pranith - Could bricks flapping up and down during the rsync cause the files 
to be missing on the first ls(written to 1 subvol but not the other cause it 
was down), the ls triggered SH, and thats why the files were there for the 
second ls be a possible cause here?

-b

 Pranith

  Pranith
  On 02/05/2015 03:44 PM, Xavier Hernandez wrote:
  Is the failure repeatable ? with the same directories ?

  It's very weird that the directories appear on the volume when you do
  an 'ls' on the bricks. Could it be that you only made a single 'ls'
  on fuse mount which not showed the directory ? Is it possible that
  this 'ls' triggered a self-heal that repaired the problem, whatever
  it was, and when you did another 'ls' on the fuse mount after the
  'ls' on the bricks, the directories were there ?

  The first 'ls' could have healed the files, causing that the
  following 'ls' on the bricks showed the files as if nothing were
  damaged. If that's the case, it's possible that there were some
  disconnections during the copy.

  Added Pranith because he knows better replication and self-heal details.

  Xavi

  On 02/04/2015 07:23 PM, David F. Robinson wrote:
  Distributed/replicated

  Volume Name: homegfs
  Type: Distributed-Replicate
  Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
  Status: Started
  Number of Bricks: 4 x 2 = 8
  Transport-type: tcp
  Bricks:
  Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
  Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
  Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
  Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
  Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
  Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
  Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
  Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
  Options Reconfigured:
  performance.io-thread-count: 32
  performance.cache-size: 128MB
  performance.write-behind-window-size: 128MB
  server.allow-insecure: on
  network.ping-timeout: 10
  storage.owner-gid: 100
  geo-replication.indexing: off
  geo-replication.ignore-pid-check: on
  changelog.changelog: on
  changelog.fsync-interval: 3
  changelog.rollover-time: 15
  server.manage-gids: on

  -- Original Message --
  From: Xavier Hernandez xhernan...@datalab.es
  To: David F. Robinson david.robin...@corvidtec.com; Benjamin
  Turner bennytu...@gmail.com
  Cc: gluster-users@gluster.org gluster-users@gluster.org; Gluster
  Devel gluster-de...@gluster.org
  Sent: 2/4/2015 6:03:45 AM
  Subject: Re: [Gluster-devel] missing files

  On 02/04/2015 01:30 AM, David F. Robinson wrote:
  Sorry. Thought about this a little more. I should have been clearer.
  The files were on both bricks of the replica, not just one side. So,
  both

Re: [Gluster-users] [Gluster-devel] missing files

2015-02-05 Thread Ben Turner

- Original Message -
 From: David F. Robinson david.robin...@corvidtec.com
 To: Ben Turner btur...@redhat.com
 Cc: Pranith Kumar Karampuri pkara...@redhat.com, Xavier Hernandez 
 xhernan...@datalab.es, Benjamin Turner
 bennytu...@gmail.com, gluster-users@gluster.org, Gluster Devel 
 gluster-de...@gluster.org
 Sent: Thursday, February 5, 2015 5:01:13 PM
 Subject: Re: [Gluster-users] [Gluster-devel] missing files

 I'll send you the emails I sent Pranith with the logs. What causes these
 disconnects?

Thanks David!  Disconnects happen when there are interruption in communication 
between peers, normally there is ping timeout that happens.  It could be 
anything from a flaky NW to the system was to busy to respond to the pings.  My 
initial take is more towards the ladder as rsync is absolutely the worst use 
case for gluster - IIRC it writes in 4kb blocks.  I try to keep my writes at 
least 64KB as in my testing that is the smallest block size I can write with 
before perf starts to really drop off.  I'll try something similar in the lab.

-b

 David  (Sent from mobile)

 ===
 David F. Robinson, Ph.D.
 President - Corvid Technologies
 704.799.6944 x101 [office]
 704.252.1310  [cell]
 704.799.7974  [fax]
 david.robin...@corvidtec.com
 http://www.corvidtechnologies.com

  On Feb 5, 2015, at 4:55 PM, Ben Turner btur...@redhat.com wrote:

  - Original Message -
  From: Pranith Kumar Karampuri pkara...@redhat.com
  To: Xavier Hernandez xhernan...@datalab.es, David F. Robinson
  david.robin...@corvidtec.com, Benjamin Turner
  bennytu...@gmail.com
  Cc: gluster-users@gluster.org, Gluster Devel gluster-de...@gluster.org
  Sent: Thursday, February 5, 2015 5:30:04 AM
  Subject: Re: [Gluster-users] [Gluster-devel] missing files

  On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:
  I believe David already fixed this. I hope this is the same issue he
  told about permissions issue.
  Oops, it is not. I will take a look.

  Yes David exactly like these:

  data-brick02a-homegfs.log:[2015-02-03 19:09:34.568842] I
  [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
  connection from
  gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0
  data-brick02a-homegfs.log:[2015-02-03 19:09:41.286551] I
  [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
  connection from
  gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0
  data-brick02a-homegfs.log:[2015-02-03 19:16:35.906412] I
  [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
  connection from
  gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0
  data-brick02a-homegfs.log:[2015-02-03 19:51:22.761293] I
  [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
  connection from
  gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0
  data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I
  [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
  connection from
  gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1

  You can 100% verify my theory if you can correlate the time on the
  disconnects to the time that the missing files were healed.  Can you have
  a look at /var/log/glusterfs/glustershd.log?  That has all of the healed
  files + timestamps, if we can see a disconnect during the rsync and a self
  heal of the missing file I think we can safely assume that the disconnects
  may have caused this.  I'll try this on my test systems, how much data did
  you rsync?  What size ish of files / an idea of the dir layout?

  @Pranith - Could bricks flapping up and down during the rsync cause the
  files to be missing on the first ls(written to 1 subvol but not the other
  cause it was down), the ls triggered SH, and thats why the files were
  there for the second ls be a possible cause here?

  -b

  Pranith

  Pranith
  On 02/05/2015 03:44 PM, Xavier Hernandez wrote:
  Is the failure repeatable ? with the same directories ?

  It's very weird that the directories appear on the volume when you do
  an 'ls' on the bricks. Could it be that you only made a single 'ls'
  on fuse mount which not showed the directory ? Is it possible that
  this 'ls' triggered a self-heal that repaired the problem, whatever
  it was, and when you did another 'ls' on the fuse mount after the
  'ls' on the bricks, the directories were there ?

  The first 'ls' could have healed the files, causing that the
  following 'ls' on the bricks showed the files as if nothing were
  damaged. If that's the case, it's possible that there were some
  disconnections during the copy.

  Added Pranith because he knows better replication and self-heal details.

  Xavi

  On 02/04/2015 07:23 PM, David F. Robinson wrote:
  Distributed/replicated

  Volume Name: homegfs
  Type: Distributed-Replicate
  Volume ID: 1e32672a-f1b7-4b58-ba94

Re: [Gluster-users] [Gluster-devel] missing files

2015-02-05 Thread Ben Turner

- Original Message -
 From: Ben Turner btur...@redhat.com
 To: David F. Robinson david.robin...@corvidtec.com
 Cc: Pranith Kumar Karampuri pkara...@redhat.com, Xavier Hernandez 
 xhernan...@datalab.es, Benjamin Turner
 bennytu...@gmail.com, gluster-users@gluster.org, Gluster Devel 
 gluster-de...@gluster.org
 Sent: Thursday, February 5, 2015 5:22:26 PM
 Subject: Re: [Gluster-users] [Gluster-devel] missing files

 - Original Message -
  From: David F. Robinson david.robin...@corvidtec.com
  To: Ben Turner btur...@redhat.com
  Cc: Pranith Kumar Karampuri pkara...@redhat.com, Xavier Hernandez
  xhernan...@datalab.es, Benjamin Turner
  bennytu...@gmail.com, gluster-users@gluster.org, Gluster Devel
  gluster-de...@gluster.org
  Sent: Thursday, February 5, 2015 5:01:13 PM
  Subject: Re: [Gluster-users] [Gluster-devel] missing files

  I'll send you the emails I sent Pranith with the logs. What causes these
  disconnects?

 Thanks David!  Disconnects happen when there are interruption in
 communication between peers, normally there is ping timeout that happens.
 It could be anything from a flaky NW to the system was to busy to respond
 to the pings.  My initial take is more towards the ladder as rsync is
 absolutely the worst use case for gluster - IIRC it writes in 4kb blocks.  I
 try to keep my writes at least 64KB as in my testing that is the smallest
 block size I can write with before perf starts to really drop off.  I'll try
 something similar in the lab.

Ok I do think that the file being self healed is RCA for what you were seeing.  
Lets look at one of the disconnects:

data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I 
[server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection 
from gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1

And in the glustershd.log from the gfs01b_glustershd.log file:

[2015-02-03 20:55:48.001797] I 
[afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0: 
performing entry selfheal on 6c79a368-edaa-432b-bef9-ec690ab42448
[2015-02-03 20:55:49.341996] I [afr-self-heal-common.c:476:afr_log_selfheal] 
0-homegfs-replicate-0: Completed entry selfheal on 
6c79a368-edaa-432b-bef9-ec690ab42448. source=1 sinks=0 
[2015-02-03 20:55:49.343093] I 
[afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0: 
performing entry selfheal on 792cb0d6-9290-4447-8cd7-2b2d7a116a69
[2015-02-03 20:55:50.463652] I [afr-self-heal-common.c:476:afr_log_selfheal] 
0-homegfs-replicate-0: Completed entry selfheal on 
792cb0d6-9290-4447-8cd7-2b2d7a116a69. source=1 sinks=0 
[2015-02-03 20:55:51.465289] I 
[afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do] 0-homegfs-replicate-0: 
performing metadata selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c
[2015-02-03 20:55:51.466515] I [afr-self-heal-common.c:476:afr_log_selfheal] 
0-homegfs-replicate-0: Completed metadata selfheal on 
403e661a-1c27-4e79-9867-c0572aba2b3c. source=1 sinks=0 
[2015-02-03 20:55:51.467098] I 
[afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0: 
performing entry selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c
[2015-02-03 20:55:55.257808] I [afr-self-heal-common.c:476:afr_log_selfheal] 
0-homegfs-replicate-0: Completed entry selfheal on 
403e661a-1c27-4e79-9867-c0572aba2b3c. source=1 sinks=0 
[2015-02-03 20:55:55.258548] I 
[afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do] 0-homegfs-replicate-0: 
performing metadata selfheal on c612ee2f-2fb4-4157-a9ab-5a2d5603c541
[2015-02-03 20:55:55.259367] I [afr-self-heal-common.c:476:afr_log_selfheal] 
0-homegfs-replicate-0: Completed metadata selfheal on 
c612ee2f-2fb4-4157-a9ab-5a2d5603c541. source=1 sinks=0 
[2015-02-03 20:55:55.259980] I 
[afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0: 
performing entry selfheal on c612ee2f-2fb4-4157-a9ab-5a2d5603c541

As you can see the self heal logs are just spammed with files being healed, and 
I looked at a couple of disconnects and I see self heals getting run shortly 
after on the bricks that were down.  Now we need to find the cause of the 
disconnects, I am thinking once the disconnects are resolved the files should 
be properly copied over without SH having to fix things.  Like I said I'll give 
this a go on my lab systems and see if I can repro the disconnects, I'll have 
time to run through it tomorrow.  If in the mean time anyone else has a theory 
/ anything to add here it would be appreciated.

-b

 -b

  David  (Sent from mobile)

  ===
  David F. Robinson, Ph.D.
  President - Corvid Technologies
  704.799.6944 x101 [office]
  704.252.1310  [cell]
  704.799.7974  [fax]
  david.robin...@corvidtec.com
  http://www.corvidtechnologies.com

   On Feb 5, 2015, at 4:55 PM, Ben Turner btur...@redhat.com wrote:

   - Original Message -
   From: Pranith Kumar Karampuri pkara...@redhat.com
   To: Xavier Hernandez xhernan...@datalab.es, David F

1 2 >

1 - 100 of 128 matches

Mail list logo