Re: [Gluster-users] ls performance on directories with small number of items

2017-11-29 Thread Aaron Roberts
Thanks Joe,
   Just to clarify, I’m seeing 8 seconds to run ls -l in a dir 
containing 2 files.  I mentioned that the _parent_ dir contains 123k items, in 
case it was relevant.  Although it seems that the fact we are hitting the dir 
with many requests seems to be the key factor.

Aaron


From: Joe Julian [mailto:j...@julianfamily.org]
Sent: 29 November 2017 16:16
To: gluster-users@gluster.org; Aaron Roberts ; 
gluster-users@gluster.org
Subject: Re: [Gluster-users] ls performance on directories with small number of 
items

The -l flag is causing a metadata lookup for every file in the directory. The 
way the ls command does that it's with individual fstat calls to each directory 
entry. That's a lot of tiny network round trips with fops that don't even fill 
a standard frame thus each frame has a high percentage of overhead for tcp. Add 
to that the replica check to ensure you're not getting stale data and you have 
another round trip for each file. Your 123k directory entries require several 
frames of getdirent and over 492k frames for the individual fstat calls. That's 
roughly 16us per frame.

Can you eliminate the fstat calls? If you only get the directory listing that 
should be significantly better. To prove this, do "echo *". You will instantly 
see your 123k entries.
On November 27, 2017 5:18:56 AM PST, Aaron Roberts 
mailto:arobe...@domicilium.com>> wrote:
Hi,
   I have a situation where an apache web server is trying to 
locate the IndexDocument for a directory on a gluster volume.  This URL is 
being hit roughly 20 times per second.  There is only 1 file in this directory. 
 However, the parent directory does have a large number of items (+123,000 
files and dirs) and we are performing operations to move these files into 2 
levels of subdirs.


We are seeing very slow response times (around 8 seconds) in apache and also 
when trying to ls on this dir.  Before we started the migrations to move files 
on the large parent dir into 2 sub levels, we weren’t aware of a problem.


[root@web-02 images]# time ls -l dir1/get/ | wc -l
2


real0m8.114s
user0m0.002s
sys 0m0.014s


Other directories with only 1 item return very quickly (<1 sec).


[root@Web-01 images]# time ls -l dir1/tmp1/ | wc -l
2


real0m0.014s
user0m0.003s
sys 0m0.006s


I’m just trying to understand what would slow down this operation so much.  Is 
it the high frequency of attempts to read the directory (apache hits to 
dir1/get/) ?  Do the move operations on items in the parent directory have any 
impact?


Some background info:


[root@web-02 images]# gluster --version
glusterfs 3.7.20 built on Jan 30 2017 15:39:29
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. 
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General 
Public License.


[root@web-02 images]# gluster vol info


Volume Name: web_vol1
Type: Replicate
Volume ID: 0d63de20-c9c2-4931-b4a3-6aed5ae28057
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: web-01:/export/brick1/web_vol1_brick1
Brick2: web-02:/export/brick1/web_vol1_brick1
Options Reconfigured:
performance.readdir-ahead: on
performance.io-thread-count: 32
performance.cache-size: 512MB




Any insight would be gratefully received.


Thanks,
   Aaron



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ls performance on directories with small number of items

2017-11-29 Thread Joe Julian
The -l flag is causing a metadata lookup for every file in the directory. The 
way the ls command does that it's with individual fstat calls to each directory 
entry. That's a lot of tiny network round trips with fops that don't even fill 
a standard frame thus each frame has a high percentage of overhead for tcp. Add 
to that the replica check to ensure you're not getting stale data and you have 
another round trip for each file. Your 123k directory entries require several 
frames of getdirent and over 492k frames for the individual fstat calls. That's 
roughly 16us per frame.

Can you eliminate the fstat calls? If you only get the directory listing that 
should be significantly better. To prove this, do "echo *". You will instantly 
see your 123k entries.

On November 27, 2017 5:18:56 AM PST, Aaron Roberts  
wrote:
>Hi,
>I have a situation where an apache web server is trying to locate the
>IndexDocument for a directory on a gluster volume.  This URL is being
>hit roughly 20 times per second.  There is only 1 file in this
>directory.  However, the parent directory does have a large number of
>items (+123,000 files and dirs) and we are performing operations to
>move these files into 2 levels of subdirs.
>
>We are seeing very slow response times (around 8 seconds) in apache and
>also when trying to ls on this dir.  Before we started the migrations
>to move files on the large parent dir into 2 sub levels, we weren't
>aware of a problem.
>
>[root@web-02 images]# time ls -l dir1/get/ | wc -l
>2
>
>real0m8.114s
>user0m0.002s
>sys 0m0.014s
>
>Other directories with only 1 item return very quickly (<1 sec).
>
>[root@Web-01 images]# time ls -l dir1/tmp1/ | wc -l
>2
>
>real0m0.014s
>user0m0.003s
>sys 0m0.006s
>
>I'm just trying to understand what would slow down this operation so
>much.  Is it the high frequency of attempts to read the directory
>(apache hits to dir1/get/) ?  Do the move operations on items in the
>parent directory have any impact?
>
>Some background info:
>
>[root@web-02 images]# gluster --version
>glusterfs 3.7.20 built on Jan 30 2017 15:39:29
>Repository revision: git://git.gluster.com/glusterfs.git
>Copyright (c) 2006-2011 Gluster Inc. 
>GlusterFS comes with ABSOLUTELY NO WARRANTY.
>You may redistribute copies of GlusterFS under the terms of the GNU
>General Public License.
>
>[root@web-02 images]# gluster vol info
>
>Volume Name: web_vol1
>Type: Replicate
>Volume ID: 0d63de20-c9c2-4931-b4a3-6aed5ae28057
>Status: Started
>Number of Bricks: 1 x 2 = 2
>Transport-type: tcp
>Bricks:
>Brick1: web-01:/export/brick1/web_vol1_brick1
>Brick2: web-02:/export/brick1/web_vol1_brick1
>Options Reconfigured:
>performance.readdir-ahead: on
>performance.io-thread-count: 32
>performance.cache-size: 512MB
>
>
>Any insight would be gratefully received.
>
>Thanks,
>   Aaron

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] slow performance with export storage on glusterfs

2017-11-29 Thread Jiří Sléžka
Hello,

> 
> If you use Gluster as FUSE mount it's always slower than you expect it
> to be.
> If you want to get better performance out of your oVirt/Gluster storage,
> try the following: 
> 
> - create a Linux VM in your oVirt environment, assign 4/8/12 virtual
> disks (Virtual disks are located on your Gluster storage volume).
> - Boot/configure the VM, then use LVM to create VG/LV with 4 stripes
> (lvcreate -i 4) and use all 4/8/12 virtual disks as PVS.
> - then install NFS server and export LV you created in previous step,
> use the NFS export as export domain in oVirt/RHEV.
> 
> You should get wire speed when you use multiple stripes on Gluster
> storage, FUSE mount on oVirt host will fan out requests to all 4 servers.
> Gluster is very good at distributed/parallel workloads, but when you use
> direct Gluster FUSE mount for Export domain you only have one data
> stream, which is fragmented even more my multiple writes/reads that
> Gluster needs to do to save your data on all member servers.

Thanks for explanation, it is an interesting solution.

Cheers,

Jiri

> 
> 
> 
> On Mon, Nov 27, 2017 at 8:41 PM, Donny Davis  > wrote:
> 
> What about mounting over nfs instead of the fuse client. Or maybe
> libgfapi. Is that available for export domains
> 
> On Fri, Nov 24, 2017 at 3:48 AM Jiří Sléžka  > wrote:
> 
> On 11/24/2017 06:41 AM, Sahina Bose wrote:
> >
> >
> > On Thu, Nov 23, 2017 at 4:56 PM, Jiří Sléžka
> mailto:jiri.sle...@slu.cz>
> > >> wrote:
> >
> >     Hi,
> >
> >     On 11/22/2017 07:30 PM, Nir Soffer wrote:
> >     > On Mon, Nov 20, 2017 at 5:22 PM Jiří Sléžka
> mailto:jiri.sle...@slu.cz>
> >
> >     > 
>  >     >
> >     >     Hi,
> >     >
> >     >     I am trying realize why is exporting of vm to export
> storage on
> >     >     glusterfs such slow.
> >     >
> >     >     I am using oVirt and RHV, both instalations on
> version 4.1.7.
> >     >
> >     >     Hosts have dedicated nics for rhevm network - 1gbps,
> data
> >     storage itself
> >     >     is on FC.
> >     >
> >     >     GlusterFS cluster lives separate on 4 dedicated
> hosts. It has
> >     slow disks
> >     >     but I can achieve about 200-400mbit throughput in other
> >     applications (we
> >     >     are using it for "cold" data, backups mostly).
> >     >
> >     >     I am using this glusterfs cluster as backend for export
> >     storage. When I
> >     >     am exporting vm I can see only about 60-80mbit
> throughput.
> >     >
> >     >     What could be the bottleneck here?
> >     >
> >     >     Could it be qemu-img utility?
> >     >
> >     >     vdsm      97739  0.3  0.0 354212 29148 ?        S 15:43   0:06
> >     >     /usr/bin/qemu-img convert -p -t none -T none -f raw
> >     >   
> >   
>   
> /rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e
> >     >     -O raw
> >     >   
> >   
>   
> /rhev/data-center/mnt/glusterSD/10.20.30.41:_rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e
> >     >
> >     >     Any idea how to make it work faster or what
> throughput should I
> >     >     expected?
> >     >
> >     >
> >     > gluster storage operations are using fuse mount - so
> every write:
> >     > - travel to the kernel
> >     > - travel back to the gluster fuse helper process
> >     > - travel to all 3 replicas - replication is done on
> client side
> >     > - return to kernel when all writes succeeded
> >     > - return to caller
> >     >
> >     > So gluster will never set any speed record.
> >     >
> >     > Additionally, you are copying from raw lv on FC -
> qemu-img cannot do
> >     > anything
> >     > smart and avoid copying unused clusters. Instead if copies
> >     gigabytes of
> >     > zeros
> >     > from FC.
> >
> >     ok, it does make sense
> >
> >     > However 7.5-10 MiB/s sounds too slow.
> >     >
> 

Re: [Gluster-users] ls performance on directories with small number of items

2017-11-29 Thread Aaron Roberts
Thanks Sam/Julian,
   I understand roughly, that readir() and others are simply hard 
to solve with a distributed filesystem and that NFS can do this part faster.  
I’d like to see if gluster can be tweaked a bit to get this working.

performance.stat-prefetch is set to ‘on’.

Would performance.md-cache-timeout help me?
It is set to 1 on my volume (default).  Would raising this help with servicing 
large number of hits for a single file/dir?

Thanks,
   Aaron

From: Joe Julian [mailto:j...@julianfamily.org]
Sent: 27 November 2017 23:45
To: gluster-users@gluster.org; Sam McLeod ; Aaron 
Roberts 
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] ls performance on directories with small number of 
items

Also note, Sam's example is comparing apples and orchards. Feeding one person 
from an orchard is not as efficient as feeding one person an apple, but if 
you're feeding 1 people...

Also in question with the NFS example, how long until that chown was flushed? 
How long until another client could see those changes? That is ignoring the 
biggie, what happens when the NFS server goes down?
On November 27, 2017 2:49:23 PM PST, Sam McLeod  
wrote:
Hi Aaron,

We also find that Gluster is perhaps, not the most performant when performing 
actions on directories containing large numbers of files.
For example, with a single NFS server on the client side a recursive chown on 
(many!) files took about 18 seconds, our simple two replica gluster servers 
took over 15 minutes.
Having said that, while I'm new to the gluster world, things seem to be 
progressing quite quickly in regards to attempts to improve performance.

I noticed you're running a _very_ old version of Gluster, I'd first suggest 
upgrading to the latest stable (3.12.x) and FYI 3.13 is to be release shortly.

I'd also recommend ensuring the following setting is enabled:

performance.stat-prefetch

Further to this, additional information about the cluster / volume typology and 
configuration would help others assist you (but I still think you should 
upgrade!).

--
Sam McLeod
https://smcleod.net
https://twitter.com/s_mcleod


On 28 Nov 2017, at 12:18 am, Aaron Roberts 
mailto:arobe...@domicilium.com>> wrote:

Hi,
   I have a situation where an apache web server is trying to 
locate the IndexDocument for a directory on a gluster volume.  This URL is 
being hit roughly 20 times per second.  There is only 1 file in this directory. 
 However, the parent directory does have a large number of items (+123,000 
files and dirs) and we are performing operations to move these files into 2 
levels of subdirs.

We are seeing very slow response times (around 8 seconds) in apache and also 
when trying to ls on this dir.  Before we started the migrations to move files 
on the large parent dir into 2 sub levels, we weren’t aware of a problem.

[root@web-02 images]# time ls -l dir1/get/ | wc -l
2

real0m8.114s
user0m0.002s
sys 0m0.014s

Other directories with only 1 item return very quickly (<1 sec).

[root@Web-01 images]# time ls -l dir1/tmp1/ | wc -l
2

real0m0.014s
user0m0.003s
sys 0m0.006s

I’m just trying to understand what would slow down this operation so much.  Is 
it the high frequency of attempts to read the directory (apache hits to 
dir1/get/) ?  Do the move operations on items in the parent directory have any 
impact?

Some background info:

[root@web-02 images]# gluster --version
glusterfs 3.7.20 built on Jan 30 2017 15:39:29
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. 
>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General 
Public License.

[root@web-02 images]# gluster vol info

Volume Name: web_vol1
Type: Replicate
Volume ID: 0d63de20-c9c2-4931-b4a3-6aed5ae28057
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: web-01:/export/brick1/web_vol1_brick1
Brick2: web-02:/export/brick1/web_vol1_brick1
Options Reconfigured:
performance.readdir-ahead: on
performance.io-thread-count: 32
performance.cache-size: 512MB


Any insight would be gratefully received.

Thanks,
   Aaron

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users