Re: [Gluster-users] small files performance

2017-10-15 Thread Ben Turner
I get well over 2k IOPs in my OLD 12 disk RAID 6 HW in the lab(4 nodes 2x2 
volume):

https://access.redhat.com/sites/default/files/attachments/rhgs3_1_perfbrief_portalv1.pdf

That data is from 3.1, things have improved much since then(I think closer to 
3.2k IOPs on the same HW?).  I have a total of 48 disks though(20 data, 8 
parity, 20 redundancy) I'm not sure what you have.  I can extract a kernel in 
between 4 minutes and 1 min 30 secs depending on tunibles and if I use multi 
threaded TAR tools developed by Ben England.  If you don't have access to the 
RH paywall you will just have to trust me since the perf brief requires a sub.  
The key to getting smallfile perf out of gluster is to use multiple threads and 
multiple clients.

What is your back end like?

-b

- Original Message -
> From: "Gandalf Corvotempesta" <gandalf.corvotempe...@gmail.com>
> To: "Szymon Miotk" <szymon.mi...@gmail.com>, "gluster-users" 
> <Gluster-users@gluster.org>
> Sent: Friday, October 13, 2017 3:56:14 AM
> Subject: Re: [Gluster-users] small files performance
> 
> Where did you read 2k IOPS?
> 
> Each disk is able to do about 75iops as I'm using SATA disk, getting even
> closer to 2000 it's impossible
> 
> Il 13 ott 2017 9:42 AM, "Szymon Miotk" < szymon.mi...@gmail.com > ha scritto:
> 
> 
> Depends what you need.
> 2K iops for small file writes is not a bad result.
> In my case I had a system that was just poorly written and it was
> using 300-1000 iops for constant operations and was choking on
> cleanup.
> 
> 
> On Thu, Oct 12, 2017 at 6:23 PM, Gandalf Corvotempesta
> < gandalf.corvotempe...@gmail.com > wrote:
> > So, even with latest version, gluster is still unusable with small files ?
> > 
> > 2017-10-12 10:51 GMT+02:00 Szymon Miotk < szymon.mi...@gmail.com >:
> >> I've analyzed small files performance few months ago, because I had
> >> huge performance problems with small files writes on Gluster.
> >> The read performance has been improved in many ways in recent releases
> >> (md-cache, parallel-readdir, hot-tier).
> >> But write performance is more or less the same and you cannot go above
> >> 10K smallfiles create - even with SSD or Optane drives.
> >> Even ramdisk is not helping much here, because the bottleneck is not
> >> in the storage performance.
> >> Key problems I've noticed:
> >> - LOOKUPs are expensive, because there is separate query for every
> >> depth level of destination directory (md-cache helps here a bit,
> >> unless you are creating lot of directories). So the deeper the
> >> directory structure, the worse.
> >> - for every file created, Gluster creates another file in .glusterfs
> >> directory, doubling the required IO and network latency. What's worse,
> >> XFS, the recommended filesystem, doesn't like flat directory sturcture
> >> with thousands files in each directory. But that's exactly how Gluster
> >> stores its metadata in .glusterfs, so the performance decreases by
> >> 40-50% after 10M files.
> >> - complete directory structure is created on each of the bricks. So
> >> every mkdir results in io on every brick you have in the volume.
> >> - hot-tier may be great for improving reads, but for small files
> >> writes it actually kills performance even more.
> >> - FUSE driver requires context switch between userspace and kernel
> >> each time you create a file, so with small files the context switches
> >> are also taking their toll
> >> 
> >> The best results I got were:
> >> - create big file on Gluster, mount it as XFS over loopback interface
> >> - 13.5K smallfile writes. Drawback - you can use it only on one
> >> server, as XFS will crash when two servers will write to it.
> >> - use libgfapi - 20K smallfile writes performance. Drawback - no nice
> >> POSIX filesystem, huge CPU usage on Gluster server.
> >> 
> >> I was testing with 1KB files, so really small.
> >> 
> >> Best regards,
> >> Szymon Miotk
> >> 
> >> On Fri, Oct 6, 2017 at 4:43 PM, Gandalf Corvotempesta
> >> < gandalf.corvotempe...@gmail.com > wrote:
> >>> Any update about this?
> >>> I've seen some works about optimizing performance for small files, is
> >>> now gluster "usable" for storing, in example, Maildirs or git sources
> >>> ?
> >>> 
> >>> at least in 3.7 (or 3.8, I don't remember exactly), extracting kernel
> >>> sources took about 4-5 minutes.
> >>> ___
> >>> Gluster-users mailing list
> >>> Gluster-users@gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] small files performance

2017-10-13 Thread Gandalf Corvotempesta
Where did you read 2k IOPS?

Each disk is able to do about 75iops as I'm using SATA disk, getting even
closer to 2000 it's impossible

Il 13 ott 2017 9:42 AM, "Szymon Miotk"  ha scritto:

> Depends what you need.
> 2K iops for small file writes is not a bad result.
> In my case I had a system that was just poorly written and it was
> using 300-1000 iops for constant operations and was choking on
> cleanup.
>
>
> On Thu, Oct 12, 2017 at 6:23 PM, Gandalf Corvotempesta
>  wrote:
> > So, even with latest version, gluster is still unusable with small files
> ?
> >
> > 2017-10-12 10:51 GMT+02:00 Szymon Miotk :
> >> I've analyzed small files performance few months ago, because I had
> >> huge performance problems with small files writes on Gluster.
> >> The read performance has been improved in many ways in recent releases
> >> (md-cache, parallel-readdir, hot-tier).
> >> But write performance is more or less the same and you cannot go above
> >> 10K smallfiles create - even with SSD or Optane drives.
> >> Even ramdisk is not helping much here, because the bottleneck is not
> >> in the storage performance.
> >> Key problems I've noticed:
> >> - LOOKUPs are expensive, because there is separate query for every
> >> depth level of destination directory (md-cache helps here a bit,
> >> unless you are creating lot of directories). So the deeper the
> >> directory structure, the worse.
> >> - for every file created, Gluster creates another file in .glusterfs
> >> directory, doubling the required IO and network latency. What's worse,
> >> XFS, the recommended filesystem, doesn't like flat directory sturcture
> >> with thousands files in each directory. But that's exactly how Gluster
> >> stores its metadata in .glusterfs, so the performance decreases by
> >> 40-50% after 10M files.
> >> - complete directory structure is created on each of the bricks. So
> >> every mkdir results in io on every brick you have in the volume.
> >> - hot-tier may be great for improving reads, but for small files
> >> writes it actually kills performance even more.
> >> - FUSE driver requires context switch between userspace and kernel
> >> each time you create a file, so with small files the context switches
> >> are also taking their toll
> >>
> >> The best results I got were:
> >> - create big file on Gluster, mount it as XFS over loopback interface
> >> - 13.5K smallfile writes. Drawback - you can use it only on one
> >> server, as XFS will crash when two servers will write to it.
> >> - use libgfapi - 20K smallfile writes performance. Drawback - no nice
> >> POSIX filesystem, huge CPU usage on Gluster server.
> >>
> >> I was testing with 1KB files, so really small.
> >>
> >> Best regards,
> >> Szymon Miotk
> >>
> >> On Fri, Oct 6, 2017 at 4:43 PM, Gandalf Corvotempesta
> >>  wrote:
> >>> Any update about this?
> >>> I've seen some works about optimizing performance for small files, is
> >>> now gluster "usable" for storing, in example, Maildirs or git sources
> >>> ?
> >>>
> >>> at least in 3.7 (or 3.8, I don't remember exactly), extracting kernel
> >>> sources took about 4-5 minutes.
> >>> ___
> >>> Gluster-users mailing list
> >>> Gluster-users@gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] small files performance

2017-10-11 Thread Poornima Gurusiddaiah
Hi, 

Parallel-readdir is an experimental feature for 3.10, can you disable 
performance.parallel-readdir option and see if the files are visible? Does the 
unmount-mount help? 
Also If you want to use parallel-readdir in production please use 3.11 or 
greater. 

Regards, 
Poornima 

- Original Message -

> From: "Alastair Neil" <ajneil.t...@gmail.com>
> To: "gluster-users" <Gluster-users@gluster.org>
> Sent: Wednesday, October 11, 2017 3:29:10 AM
> Subject: Re: [Gluster-users] small files performance

> I just tried setting:

> performance.parallel-readdir on
> features.cache-invalidation on
> features.cache-invalidation-timeout 600
> performance.stat-prefetch
> performance.cache-invalidation
> performance.md-cache-timeout 600
> network.inode-lru-limit 5
> performance.cache-invalidation on

> and clients could not see their files with ls when accessing via a fuse
> mount. The files and directories were there, however, if you accessed them
> directly. Server are 3.10.5 and the clients are 3.10 and 3.12.

> Any ideas?

> On 10 October 2017 at 10:53, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com > wrote:

> > 2017-10-10 8:25 GMT+02:00 Karan Sandha < ksan...@redhat.com > :
> 

> > > Hi Gandalf,
> > 
> 

> > > We have multiple tuning to do for small-files which decrease the time for
> > > negative lookups , meta-data caching, parallel readdir. Bumping the
> > > server
> > > and client event threads will help you out in increasing the small file
> > > performance.
> > 
> 

> > > gluster v set  group metadata-cache
> > 
> 
> > > gluster v set  group nl-cache
> > 
> 
> > > gluster v set  performance.parallel-readdir on (Note : readdir
> > > should be on)
> > 
> 

> > This is what i'm getting with suggested parameters.
> 
> > I'm running "fio" from a mounted gluster client:
> 
> > 172.16.0.12:/gv0 on /mnt2 type fuse.glusterfs
> > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
> 

> > # fio --ioengine=libaio --filename=fio.test --size=256M --direct=1
> > --rw=randrw --refill_buffers --norandommap --bs=8k --rwmixread=70
> > --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=fio-test
> 
> > fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio,
> > iodepth=16
> 
> > ...
> 
> > fio-2.16
> 
> > Starting 16 processes
> 
> > fio-test: Laying out IO file(s) (1 file(s) / 256MB)
> 
> > Jobs: 14 (f=13): [m(5),_(1),m(8),f(1),_(1)] [33.9% done] [1000KB/440KB/0KB
> > /s] [125/55/0 iops] [eta 01m:59s]
> 
> > fio-test: (groupid=0, jobs=16): err= 0: pid=2051: Tue Oct 10 16:51:46 2017
> 
> > read : io=43392KB, bw=733103B/s, iops=89, runt= 60610msec
> 
> > slat (usec): min=14, max=1992.5K, avg=177873.67, stdev=382294.06
> 
> > clat (usec): min=768, max=6016.8K, avg=1871390.57, stdev=1082220.06
> 
> > lat (usec): min=872, max=6630.6K, avg=2049264.23, stdev=1158405.41
> 
> > clat percentiles (msec):
> 
> > | 1.00th=[ 20], 5.00th=[ 208], 10.00th=[ 457], 20.00th=[ 873],
> 
> > | 30.00th=[ 1237], 40.00th=[ 1516], 50.00th=[ 1795], 60.00th=[ 2073],
> 
> > | 70.00th=[ 2442], 80.00th=[ 2835], 90.00th=[ 3326], 95.00th=[ 3785],
> 
> > | 99.00th=[ 4555], 99.50th=[ 4948], 99.90th=[ 5211], 99.95th=[ 5800],
> 
> > | 99.99th=[ 5997]
> 
> > write: io=18856KB, bw=318570B/s, iops=38, runt= 60610msec
> 
> > slat (usec): min=17, max=3428, avg=212.62, stdev=287.88
> 
> > clat (usec): min=59, max=6015.6K, avg=1693729.12, stdev=1003122.83
> 
> > lat (usec): min=79, max=6015.9K, avg=1693941.74, stdev=1003126.51
> 
> > clat percentiles (usec):
> 
> > | 1.00th=[ 724], 5.00th=[144384], 10.00th=[403456], 20.00th=[765952],
> 
> > | 30.00th=[1105920], 40.00th=[1368064], 50.00th=[1630208],
> > | 60.00th=[1875968],
> 
> > | 70.00th=[2179072], 80.00th=[2572288], 90.00th=[3031040],
> > | 95.00th=[3489792],
> 
> > | 99.00th=[4227072], 99.50th=[4423680], 99.90th=[4751360],
> > | 99.95th=[5210112],
> 
> > | 99.99th=[5996544]
> 
> > lat (usec) : 100=0.15%, 250=0.05%, 500=0.06%, 750=0.09%, 1000=0.05%
> 
> > lat (msec) : 2=0.28%, 4=0.09%, 10=0.15%, 20=0.39%, 50=1.81%
> 
> > lat (msec) : 100=1.02%, 250=1.63%, 500=5.59%, 750=6.03%, 1000=7.31%
> 
> > lat (msec) : 2000=35.61%, >=2000=39.67%
> 
> > cpu : usr=0.01%, sys=0.01%, ctx=8218, majf=11, minf=295
> 
> > IO depths : 1=0.2%, 2=0.4%, 4=0.8%, 8=1.6%, 16=96.9%, 32=0.0%, >=64=0.0%
> 
> > submit : 0=

Re: [Gluster-users] small files performance

2017-10-10 Thread Alastair Neil
I just tried setting:

performance.parallel-readdir on
features.cache-invalidation on
features.cache-invalidation-timeout 600
performance.stat-prefetch
performance.cache-invalidation
performance.md-cache-timeout 600
network.inode-lru-limit 5
performance.cache-invalidation on

and clients could not see their files with ls when accessing via a fuse
mount.  The files and directories were there, however, if you accessed them
directly. Server are 3.10.5 and the clients are 3.10 and 3.12.

Any ideas?


On 10 October 2017 at 10:53, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2017-10-10 8:25 GMT+02:00 Karan Sandha :
>
>> Hi Gandalf,
>>
>> We have multiple tuning to do for small-files which decrease the time for
>> negative lookups , meta-data caching, parallel readdir. Bumping the server
>> and client event threads will help you out in increasing the small file
>> performance.
>>
>> gluster v set   group metadata-cache
>> gluster v set  group nl-cache
>> gluster v set  performance.parallel-readdir on (Note : readdir
>> should be on)
>>
>
> This is what i'm getting with suggested parameters.
> I'm running "fio" from a mounted gluster client:
> 172.16.0.12:/gv0 on /mnt2 type fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,
> allow_other,max_read=131072)
>
>
>
> # fio --ioengine=libaio --filename=fio.test --size=256M
> --direct=1 --rw=randrw --refill_buffers --norandommap
> --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16
> --runtime=60 --group_reporting --name=fio-test
> fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio,
> iodepth=16
> ...
> fio-2.16
> Starting 16 processes
> fio-test: Laying out IO file(s) (1 file(s) / 256MB)
> Jobs: 14 (f=13): [m(5),_(1),m(8),f(1),_(1)] [33.9% done] [1000KB/440KB/0KB
> /s] [125/55/0 iops] [eta 01m:59s]
> fio-test: (groupid=0, jobs=16): err= 0: pid=2051: Tue Oct 10 16:51:46 2017
>   read : io=43392KB, bw=733103B/s, iops=89, runt= 60610msec
> slat (usec): min=14, max=1992.5K, avg=177873.67, stdev=382294.06
> clat (usec): min=768, max=6016.8K, avg=1871390.57, stdev=1082220.06
>  lat (usec): min=872, max=6630.6K, avg=2049264.23, stdev=1158405.41
> clat percentiles (msec):
>  |  1.00th=[   20],  5.00th=[  208], 10.00th=[  457], 20.00th=[  873],
>  | 30.00th=[ 1237], 40.00th=[ 1516], 50.00th=[ 1795], 60.00th=[ 2073],
>  | 70.00th=[ 2442], 80.00th=[ 2835], 90.00th=[ 3326], 95.00th=[ 3785],
>  | 99.00th=[ 4555], 99.50th=[ 4948], 99.90th=[ 5211], 99.95th=[ 5800],
>  | 99.99th=[ 5997]
>   write: io=18856KB, bw=318570B/s, iops=38, runt= 60610msec
> slat (usec): min=17, max=3428, avg=212.62, stdev=287.88
> clat (usec): min=59, max=6015.6K, avg=1693729.12, stdev=1003122.83
>  lat (usec): min=79, max=6015.9K, avg=1693941.74, stdev=1003126.51
> clat percentiles (usec):
>  |  1.00th=[  724],  5.00th=[144384], 10.00th=[403456],
> 20.00th=[765952],
>  | 30.00th=[1105920], 40.00th=[1368064], 50.00th=[1630208],
> 60.00th=[1875968],
>  | 70.00th=[2179072], 80.00th=[2572288], 90.00th=[3031040],
> 95.00th=[3489792],
>  | 99.00th=[4227072], 99.50th=[4423680], 99.90th=[4751360],
> 99.95th=[5210112],
>  | 99.99th=[5996544]
> lat (usec) : 100=0.15%, 250=0.05%, 500=0.06%, 750=0.09%, 1000=0.05%
> lat (msec) : 2=0.28%, 4=0.09%, 10=0.15%, 20=0.39%, 50=1.81%
> lat (msec) : 100=1.02%, 250=1.63%, 500=5.59%, 750=6.03%, 1000=7.31%
> lat (msec) : 2000=35.61%, >=2000=39.67%
>   cpu  : usr=0.01%, sys=0.01%, ctx=8218, majf=11, minf=295
>   IO depths: 1=0.2%, 2=0.4%, 4=0.8%, 8=1.6%, 16=96.9%, 32=0.0%,
> >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.2%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  issued: total=r=5424/w=2357/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=16
>
> Run status group 0 (all jobs):
>READ: io=43392KB, aggrb=715KB/s, minb=715KB/s, maxb=715KB/s,
> mint=60610msec, maxt=60610msec
>   WRITE: io=18856KB, aggrb=311KB/s, minb=311KB/s, maxb=311KB/s,
> mint=60610msec, maxt=60610msec
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] small files performance

2017-10-10 Thread Gandalf Corvotempesta
2017-10-10 8:25 GMT+02:00 Karan Sandha :

> Hi Gandalf,
>
> We have multiple tuning to do for small-files which decrease the time for
> negative lookups , meta-data caching, parallel readdir. Bumping the server
> and client event threads will help you out in increasing the small file
> performance.
>
> gluster v set   group metadata-cache
> gluster v set  group nl-cache
> gluster v set  performance.parallel-readdir on (Note : readdir
> should be on)
>

This is what i'm getting with suggested parameters.
I'm running "fio" from a mounted gluster client:
172.16.0.12:/gv0 on /mnt2 type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)



# fio --ioengine=libaio --filename=fio.test --size=256M
--direct=1 --rw=randrw --refill_buffers --norandommap
--bs=8k --rwmixread=70 --iodepth=16 --numjobs=16
--runtime=60 --group_reporting --name=fio-test
fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio,
iodepth=16
...
fio-2.16
Starting 16 processes
fio-test: Laying out IO file(s) (1 file(s) / 256MB)
Jobs: 14 (f=13): [m(5),_(1),m(8),f(1),_(1)] [33.9% done] [1000KB/440KB/0KB
/s] [125/55/0 iops] [eta 01m:59s]
fio-test: (groupid=0, jobs=16): err= 0: pid=2051: Tue Oct 10 16:51:46 2017
  read : io=43392KB, bw=733103B/s, iops=89, runt= 60610msec
slat (usec): min=14, max=1992.5K, avg=177873.67, stdev=382294.06
clat (usec): min=768, max=6016.8K, avg=1871390.57, stdev=1082220.06
 lat (usec): min=872, max=6630.6K, avg=2049264.23, stdev=1158405.41
clat percentiles (msec):
 |  1.00th=[   20],  5.00th=[  208], 10.00th=[  457], 20.00th=[  873],
 | 30.00th=[ 1237], 40.00th=[ 1516], 50.00th=[ 1795], 60.00th=[ 2073],
 | 70.00th=[ 2442], 80.00th=[ 2835], 90.00th=[ 3326], 95.00th=[ 3785],
 | 99.00th=[ 4555], 99.50th=[ 4948], 99.90th=[ 5211], 99.95th=[ 5800],
 | 99.99th=[ 5997]
  write: io=18856KB, bw=318570B/s, iops=38, runt= 60610msec
slat (usec): min=17, max=3428, avg=212.62, stdev=287.88
clat (usec): min=59, max=6015.6K, avg=1693729.12, stdev=1003122.83
 lat (usec): min=79, max=6015.9K, avg=1693941.74, stdev=1003126.51
clat percentiles (usec):
 |  1.00th=[  724],  5.00th=[144384], 10.00th=[403456],
20.00th=[765952],
 | 30.00th=[1105920], 40.00th=[1368064], 50.00th=[1630208],
60.00th=[1875968],
 | 70.00th=[2179072], 80.00th=[2572288], 90.00th=[3031040],
95.00th=[3489792],
 | 99.00th=[4227072], 99.50th=[4423680], 99.90th=[4751360],
99.95th=[5210112],
 | 99.99th=[5996544]
lat (usec) : 100=0.15%, 250=0.05%, 500=0.06%, 750=0.09%, 1000=0.05%
lat (msec) : 2=0.28%, 4=0.09%, 10=0.15%, 20=0.39%, 50=1.81%
lat (msec) : 100=1.02%, 250=1.63%, 500=5.59%, 750=6.03%, 1000=7.31%
lat (msec) : 2000=35.61%, >=2000=39.67%
  cpu  : usr=0.01%, sys=0.01%, ctx=8218, majf=11, minf=295
  IO depths: 1=0.2%, 2=0.4%, 4=0.8%, 8=1.6%, 16=96.9%, 32=0.0%,
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.2%, 32=0.0%, 64=0.0%,
>=64=0.0%
 issued: total=r=5424/w=2357/d=0, short=r=0/w=0/d=0,
drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: io=43392KB, aggrb=715KB/s, minb=715KB/s, maxb=715KB/s,
mint=60610msec, maxt=60610msec
  WRITE: io=18856KB, aggrb=311KB/s, minb=311KB/s, maxb=311KB/s,
mint=60610msec, maxt=60610msec
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] small files performance

2017-10-06 Thread Gandalf Corvotempesta
Any update about this?
I've seen some works about optimizing performance for small files, is
now gluster "usable" for storing, in example, Maildirs or git sources
?

at least in 3.7 (or 3.8, I don't remember exactly), extracting kernel
sources took about 4-5 minutes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Small files performance on SSD

2017-04-28 Thread Szymon Miotk
Hello,

I have problems with tuning Gluster for optimal small files performance.
My usage scenario is, as I've learned, worst possible scenario, but
it's not up to me to change it:
- small 1KB files
- at least 20M of those
- approx. 10 files/directory
- mostly writes
- average speed 1000 files/sec with peaks up to 10K files/sec.

I'm doing something wrong, because I cannot get past performance metrics
4K files/sec for distributed volume (2 bricks)
2K files/sec for replicated volume (2 bricks).
I've been experimenting with various XFS formatting and mounting
options and with Gluster tuning, but no luck.
I've learned that it's not disk IO that is the bottleneck (direct
tests on mounted XFS partition show waaay better results, like 100K
files/sec).

As I've learned from
http://blog.gluster.org/2014/03/experiments-using-ssds-with-gluster/
it's possible to get 24K files/sec performance (and that was three years ago).

Test I'm using, run on one server (2 x Xeons, 256 GB RAM, 10GbE network):
smallfile_cli.py --operation create --threads 32 --file-size 1 --files
15625 --top /mnt/testdir/test

Setup:
2 servers with 2 Xeons each, 256GB RAM, 8 x 800GB SSD drives in RAID6,
10GbE network
Ubuntu 14.04
Gluster 3.7.3

Do you have any hints where I should start investigating for the bottleneck?

Best regards,
Szymon
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Small files performance on SSD

2017-04-28 Thread Amar Tumballi
On Fri, Apr 28, 2017 at 3:44 PM, Szymon Miotk 
wrote:

> Dear Gluster community,
>
> I have problems with tuning Gluster for small files performance on SSD.
>
> My usage scenario is, as I've learned, worst possible scenario, but
> it's not up to me to change it:
> - small 1KB files
> - at least 20M of those
> - approx. 10 files/directory
> - mostly writes
> - average speed 1000 files/sec with peaks up to 10K files/sec.
>
> I'm doing something wrong, because I cannot get past performance metrics
> 4K files/sec for distributed volume (2 bricks)
> 2K files/sec for replicated volume (2 bricks).
> I've been experimenting with various XFS formatting and mounting
> options and with Gluster tuning (md-cache, lookup optimize, thread,
> writeback, tiering), but no luck.
>
> I've learned that it's not disk IO that is the bottleneck (direct
> tests on mounted XFS partition show waaay better results, like 100K
> files/sec).
>
> As I've learned from
> http://blog.gluster.org/2014/03/experiments-using-ssds-with-gluster/
> it's possible to get 24K files/sec performance (and that was three years
> ago).
>
>
How many clients are you running? Considering Gluster is a distributed
solution, the performance should be measured as aggregate of all the
clients.


> Test I'm using, run on one server (2 x Xeons, 256 GB RAM, 10GbE network):
> smallfile_cli.py --operation create --threads 32 --file-size 1 --files
> 15625 --top /mnt/testdir/test
>
> Setup:
> 2 servers with 2 Xeons each, 256GB RAM, 8 x 800GB SSD drives in RAID6,
> 10GbE network
> Ubuntu 14.04
> Gluster 3.7.3
>
> Do you have any hints where I should start investigating for the
> bottleneck?
>
>
Currently just the fuse mount may be the bottleneck, but I recommend
running multiple clients (from different machines) doing these operations
in parallel to get the best results out of Gluster.

-Amar

Best regards,
> Szymon
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Small files performance on SSD

2017-04-28 Thread Szymon Miotk
Dear Gluster community,

I have problems with tuning Gluster for small files performance on SSD.

My usage scenario is, as I've learned, worst possible scenario, but
it's not up to me to change it:
- small 1KB files
- at least 20M of those
- approx. 10 files/directory
- mostly writes
- average speed 1000 files/sec with peaks up to 10K files/sec.

I'm doing something wrong, because I cannot get past performance metrics
4K files/sec for distributed volume (2 bricks)
2K files/sec for replicated volume (2 bricks).
I've been experimenting with various XFS formatting and mounting
options and with Gluster tuning (md-cache, lookup optimize, thread,
writeback, tiering), but no luck.

I've learned that it's not disk IO that is the bottleneck (direct
tests on mounted XFS partition show waaay better results, like 100K
files/sec).

As I've learned from
http://blog.gluster.org/2014/03/experiments-using-ssds-with-gluster/
it's possible to get 24K files/sec performance (and that was three years ago).

Test I'm using, run on one server (2 x Xeons, 256 GB RAM, 10GbE network):
smallfile_cli.py --operation create --threads 32 --file-size 1 --files
15625 --top /mnt/testdir/test

Setup:
2 servers with 2 Xeons each, 256GB RAM, 8 x 800GB SSD drives in RAID6,
10GbE network
Ubuntu 14.04
Gluster 3.7.3

Do you have any hints where I should start investigating for the bottleneck?

Best regards,
Szymon
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Small files performance

2016-06-24 Thread Luciano Giacchetta
About 40~60 MB/s with a 30% IOWait...

--
Saludos, LG

On Wed, Jun 22, 2016 at 10:04 AM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Il 21 giu 2016 19:02, "Luciano Giacchetta"  ha
> scritto:
> >
> > Hi,
> >
> > I have similar scenario, for a cars classified with millions of small
> files, mounted with gluster native client in a replica config.
> > The gluster server has 16gb RAM and 4 cores and mount the glusterfs with
> direct-io-mode=enable. Then i export to all servers ( windows included with
> CIFS )
> >
> > performance.cache-refresh-timeout: 60
> > performance.read-ahead: enable
> > performance.write-behind-window-size: 4MB
> > performance.io-thread-count: 64
> > performance.cache-size: 12GB
> > performance.quick-read: on
> > performance.flush-behind: on
> > performance.write-behind: on
> > nfs.disable: on
>
> Which performance are you getting?
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-24 Thread Luciano Giacchetta
This is my fstab

localhost:/root /mnt/root glusterfs defaults,*direct-io-mode=enable* 0 0

--
Saludos, LG

On Wed, Jun 22, 2016 at 9:49 AM, ML mail  wrote:

> Luciano, how do you enable direct-io-mode?
>
>
> On Wednesday, June 22, 2016 7:09 AM, Luciano Giacchetta <
> ldgiacche...@gmail.com> wrote:
>
>
> Hi,
>
> I have similar scenario, for a cars classified with millions of small
> files, mounted with gluster native client in a replica config.
> The gluster server has 16gb RAM and 4 cores and mount the glusterfs with
> direct-io-mode=enable. Then i export to all servers ( windows included with
> CIFS )
>
> performance.cache-refresh-timeout: 60
> performance.read-ahead: enable
> performance.write-behind-window-size: 4MB
> performance.io-thread-count: 64
> performance.cache-size: 12GB
> performance.quick-read: on
> performance.flush-behind: on
> performance.write-behind: on
> nfs.disable: on
>
>
> --
> Saludos, LG
>
> On Sat, May 28, 2016 at 6:46 AM, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
> if i remember properly, each stat() on a file needs to be sent to all host
> in replica to check if are in sync
> Is this true for both gluster native client and nfs ganesha?
> Which is the best for a shared hosting storage with many millions of small
> files? About 15.000.000 small files in 800gb ? Or even for Maildir hosting
> Ganesha can be configured for HA and loadbalancing so the biggest issue
> that was present in standard NFS now is gone
> Any advantage about native gluster over Ganesha? Removing the fuse
> requirement should also be a performance advantage for Ganesha over native
> client
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-22 Thread Gandalf Corvotempesta
Il 21 giu 2016 19:02, "Luciano Giacchetta"  ha
scritto:
>
> Hi,
>
> I have similar scenario, for a cars classified with millions of small
files, mounted with gluster native client in a replica config.
> The gluster server has 16gb RAM and 4 cores and mount the glusterfs with
direct-io-mode=enable. Then i export to all servers ( windows included with
CIFS )
>
> performance.cache-refresh-timeout: 60
> performance.read-ahead: enable
> performance.write-behind-window-size: 4MB
> performance.io-thread-count: 64
> performance.cache-size: 12GB
> performance.quick-read: on
> performance.flush-behind: on
> performance.write-behind: on
> nfs.disable: on

Which performance are you getting?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-22 Thread ML mail
Luciano, how do you enable direct-io-mode?

On Wednesday, June 22, 2016 7:09 AM, Luciano Giacchetta 
 wrote:
 

 Hi,

I have similar scenario, for a cars classified with millions of small files, 
mounted with gluster native client in a replica config.
The gluster server has 16gb RAM and 4 cores and mount the glusterfs with 
direct-io-mode=enable. Then i export to all servers ( windows included with 
CIFS )

performance.cache-refresh-timeout: 60
performance.read-ahead: enable
performance.write-behind-window-size: 4MB
performance.io-thread-count: 64
performance.cache-size: 12GB
performance.quick-read: on
performance.flush-behind: on
performance.write-behind: on
nfs.disable: on


--Saludos, LG
On Sat, May 28, 2016 at 6:46 AM, Gandalf Corvotempesta 
 wrote:

if i remember properly, each stat() on a file needs to be sent to all host in 
replica to check if are in syncIs this true for both gluster native client and 
nfs ganesha?Which is the best for a shared hosting storage with many millions 
of small files? About 15.000.000 small files in 800gb ? Or even for Maildir 
hostingGanesha can be configured for HA and loadbalancing so the biggest issue 
that was present in standard NFS now is goneAny advantage about native gluster 
over Ganesha? Removing the fuse requirement should also be a performance 
advantage for Ganesha over native client 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

   ___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-21 Thread Luciano Giacchetta
Hi,

I have similar scenario, for a cars classified with millions of small
files, mounted with gluster native client in a replica config.
The gluster server has 16gb RAM and 4 cores and mount the glusterfs with
direct-io-mode=enable. Then i export to all servers ( windows included with
CIFS )

performance.cache-refresh-timeout: 60
performance.read-ahead: enable
performance.write-behind-window-size: 4MB
performance.io-thread-count: 64
performance.cache-size: 12GB
performance.quick-read: on
performance.flush-behind: on
performance.write-behind: on
nfs.disable: on


--
Saludos, LG

On Sat, May 28, 2016 at 6:46 AM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> if i remember properly, each stat() on a file needs to be sent to all host
> in replica to check if are in sync
>
> Is this true for both gluster native client and nfs ganesha?
>
> Which is the best for a shared hosting storage with many millions of small
> files? About 15.000.000 small files in 800gb ? Or even for Maildir hosting
>
> Ganesha can be configured for HA and loadbalancing so the biggest issue
> that was present in standard NFS now is gone
>
> Any advantage about native gluster over Ganesha? Removing the fuse
> requirement should also be a performance advantage for Ganesha over native
> client
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-01 Thread Gmail

> On Jun 1, 2016, at 1:41 PM, Gandalf Corvotempesta 
>  wrote:
> 
> Il 01 giu 2016 22:34, "Gmail"  > ha scritto:
> >
> >
> >> On Jun 1, 2016, at 1:25 PM, Gandalf Corvotempesta 
> >> > 
> >> wrote:
> >> with nfs replication is made directly by gluster servers with no client 
> >> involved?
> >
> > correct
> 
> This is good
> what i really don't like in gluster is the client doing all the replication.
> replication and cluster management should be done directly by servers, not by 
> clients
> client side the resources used for replication and cluster management could 
> be used for something else like virtualizations and so on.
> 
> > the NFS client talks to only one NFS server (the one which it mounts), the 
> > NFS HA setup is only to failover a virtual IP to another healthy node. so 
> > the NFS client will just do 3 minor timeouts then it will do a major 
> > timeout, when that happens, the virtual IP failover will be already done.
> 
> The same is for native client.
> even the native client has to wait for a timeout before changing the storage 
> node, right?
> 
no, there is no timeout with the client, the client knows how to talk to all 
the nodes at the same time, so if a node goes down, not a big deal, it still 
can reach out to the others
> what happens to a virtual machine writing to disk during this timeout? 
If two out of three storage nodes acknowledged the writes (in case of replica 
3), it will be ok, the node failure will not affect the write performance, but 
when two nodes goes down, the third node with turn RO, as there is no quorum.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-01 Thread Gandalf Corvotempesta
Il 01 giu 2016 22:34, "Gmail"  ha scritto:
>
>
>> On Jun 1, 2016, at 1:25 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:
>> with nfs replication is made directly by gluster servers with no client
involved?
>
> correct

This is good
what i really don't like in gluster is the client doing all the replication.
replication and cluster management should be done directly by servers, not
by clients
client side the resources used for replication and cluster management could
be used for something else like virtualizations and so on.

> the NFS client talks to only one NFS server (the one which it mounts),
the NFS HA setup is only to failover a virtual IP to another healthy node.
so the NFS client will just do 3 minor timeouts then it will do a major
timeout, when that happens, the virtual IP failover will be already done.

The same is for native client.
even the native client has to wait for a timeout before changing the
storage node, right?
what happens to a virtual machine writing to disk during this timeout?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-01 Thread Gmail

> On Jun 1, 2016, at 1:25 PM, Gandalf Corvotempesta 
>  wrote:
> 
> Il 01 giu 2016 22:06, "Gmail"  > ha scritto:
> > stat() on NFS, is just a single stat() from the client to the storage node, 
> > then all the storage nodes in the same replica group talk to each other 
> > using libgfapi (no FUSE overhead)
> >
> > conclusion, I’d prefer NFS over FUSE with small files.
> > drawback, NFS HA is more complicated to setup and maintain than FUSE.
> 
> NFS HA with ganesha should be easier than kernel NFS
> 
> Skipping the whole fuse stack should be good also for big files
> 
with big files, i don’t notice much difference in performance for NFS over FUSE
> with nfs replication is made directly by gluster servers with no client 
> involved?
> 
correct
> In this case would be possibile to split the gluster networks with 10gb used 
> for replication and multiple 1gb bonded for clients.
> 
don’t forget the complication of Ganesha HA setup, pacemaker is pain in the 
butt.
> I can see only advantage for nfs over native gluster
> 
> One question: with no gluster client that always know on which node a single 
> file is located, who is telling nfs where to find the required file? Is nfs 
> totally distributed with no "gateway"/"proxy" or any centralized server?
> 
the NFS client talks to only one NFS server (the one which it mounts), the NFS 
HA setup is only to failover a virtual IP to another healthy node. so the NFS 
client will just do 3 minor timeouts then it will do a major timeout, when that 
happens, the virtual IP failover will be already done.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-01 Thread Gandalf Corvotempesta
Il 01 giu 2016 22:06, "Gmail"  ha scritto:
> stat() on NFS, is just a single stat() from the client to the storage
node, then all the storage nodes in the same replica group talk to each
other using libgfapi (no FUSE overhead)
>
> conclusion, I’d prefer NFS over FUSE with small files.
> drawback, NFS HA is more complicated to setup and maintain than FUSE.

NFS HA with ganesha should be easier than kernel NFS

Skipping the whole fuse stack should be good also for big files
with nfs replication is made directly by gluster servers with no client
involved?
In this case would be possibile to split the gluster networks with 10gb
used for replication and multiple 1gb bonded for clients.
I can see only advantage for nfs over native gluster

One question: with no gluster client that always know on which node a
single file is located, who is telling nfs where to find the required file?
Is nfs totally distributed with no "gateway"/"proxy" or any centralized
server?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-01 Thread Gmail
Find my answer inline.
> On Jun 1, 2016, at 12:30 PM, Gandalf Corvotempesta 
>  wrote:
> 
> Il 28/05/2016 11:46, Gandalf Corvotempesta ha scritto:
>> 
>> if i remember properly, each stat() on a file needs to be sent to all host 
>> in replica to check if are in sync
>> 
>> Is this true for both gluster native client and nfs ganesha?
stat() on FUSE mount is done from the client to all the bricks in the same 
replica group carrying the file, the data flow is as follows, the FUSE mount 
point do the call using libgfapi (FUSE overhead), libgfapi talks to the client 
kernel, then the client kernel talks to the kernels of all the storage nodes in 
the same replica group, the storage node kernel talks to the Gluster daemon 
then Gluster talks to the underlying filesystem, etc…

stat() on NFS, is just a single stat() from the client to the storage node, 
then all the storage nodes in the same replica group talk to each other using 
libgfapi (no FUSE overhead)

conclusion, I’d prefer NFS over FUSE with small files.
drawback, NFS HA is more complicated to setup and maintain than FUSE.
>> 
>> Which is the best for a shared hosting storage with many millions of small 
>> files? About 15.000.000 small files in 800gb ? Or even for Maildir hosting
>> 
>> Ganesha can be configured for HA and loadbalancing so the biggest issue that 
>> was present in standard NFS now is gone
>> 
>> Any advantage about native gluster over Ganesha? Removing the fuse 
>> requirement should also be a performance advantage for Ganesha over native 
>> client
>> 
> 
> bump
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Small files performance

2016-06-01 Thread Gandalf Corvotempesta

Il 28/05/2016 11:46, Gandalf Corvotempesta ha scritto:


if i remember properly, each stat() on a file needs to be sent to all 
host in replica to check if are in sync


Is this true for both gluster native client and nfs ganesha?

Which is the best for a shared hosting storage with many millions of 
small files? About 15.000.000 small files in 800gb ? Or even for 
Maildir hosting


Ganesha can be configured for HA and loadbalancing so the biggest 
issue that was present in standard NFS now is gone


Any advantage about native gluster over Ganesha? Removing the fuse 
requirement should also be a performance advantage for Ganesha over 
native client




bump
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Small files performance

2016-05-28 Thread Gandalf Corvotempesta
if i remember properly, each stat() on a file needs to be sent to all host
in replica to check if are in sync

Is this true for both gluster native client and nfs ganesha?

Which is the best for a shared hosting storage with many millions of small
files? About 15.000.000 small files in 800gb ? Or even for Maildir hosting

Ganesha can be configured for HA and loadbalancing so the biggest issue
that was present in standard NFS now is gone

Any advantage about native gluster over Ganesha? Removing the fuse
requirement should also be a performance advantage for Ganesha over native
client
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users