Harry, Have you seen this post?
http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/ Be sure and read all the comments, as Ben England chimes in on the comments, and he's one of the performance engineers at Red Hat. -JM ----- Harry Mangalam <hjmanga...@gmail.com> wrote: > This is a continuation of my previous posts about improving write perf > when trapping millions of small writes to a gluster filesystem. > I was able to improve write perf by ~30x by running STDOUT thru gzip > to consolidate and reduce the output stream. > > Today, another similar problem, having to do with yet another > bioinformatics program (which these days typically handle the 'short > reads' that come out of the majority of sequencing hardware, each read > being 30-150 characters, with some metadata typically in an ASCII file > containing millions of such entries). Reading them doesn't seem to be > a problem (at least on our systems) but writing them is quite awful.. > > The program is called 'art_illumina' from the Broad Inst's 'ALLPATHS' > suite and it generates an artificial Illumina data set from an input > genome. In this case about 5GB of the type of data described above. > Like before, the gluster process goes to >100% and the program itself > slows to ~20-30% of a CPU. In this case, the app's output cannot be > extrnally trapped by redirecting thru gzip since the output flag > specifies the base filename for 2 files that are created internally > and then written directly. This prevents even setting up a named pipe > to trap and process the output. > > Since this gluster storage was set up specifically for bioinformatics, > this is a repeating problem and while some of the issues can be dealt > with by trapping and converting output, it would be VERY NICE if we > could deal with it at the OS level. > > The gluster volume is running over IPoIB on QDR IB and looks like this: > Volume Name: gl > Type: Distribute > Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332 > Status: Started > Number of Bricks: 8 > Transport-type: tcp,rdma > Bricks: > Brick1: bs2:/raid1 > Brick2: bs2:/raid2 > Brick3: bs3:/raid1 > Brick4: bs3:/raid2 > Brick5: bs4:/raid1 > Brick6: bs4:/raid2 > Brick7: bs1:/raid1 > Brick8: bs1:/raid2 > Options Reconfigured: > performance.write-behind-window-size: 1024MB > performance.flush-behind: on > performance.cache-size: 268435456 > nfs.disable: on > performance.io-cache: on > performance.quick-read: on > performance.io-thread-count: 64 > auth.allow: 10.2.*.*,10.1.*.* > > I've tried to increase every caching option that might improve this > kind of performance, but it doesn't seem to help. At this point, I'm > wondering whether changing the client (or server) kernel parameters > will help. > > The client's meminfo is: > cat /proc/meminfo > MemTotal: 529425924 kB > MemFree: 241833188 kB > Buffers: 355248 kB > Cached: 279699444 kB > SwapCached: 0 kB > Active: 2241580 kB > Inactive: 278287248 kB > Active(anon): 190988 kB > Inactive(anon): 287952 kB > Active(file): 2050592 kB > Inactive(file): 277999296 kB > Unevictable: 16856 kB > Mlocked: 16856 kB > SwapTotal: 563198732 kB > SwapFree: 563198732 kB > Dirty: 1656 kB > Writeback: 0 kB > AnonPages: 486876 kB > Mapped: 19808 kB > Shmem: 164 kB > Slab: 1475476 kB > SReclaimable: 1205944 kB > SUnreclaim: 269532 kB > KernelStack: 5928 kB > PageTables: 27312 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 827911692 kB > Committed_AS: 536852 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 1227732 kB > VmallocChunk: 33888774404 kB > HardwareCorrupted: 0 kB > AnonHugePages: 376832 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 201088 kB > DirectMap2M: 15509504 kB > DirectMap1G: 521142272 kB > > and the server's meminfo is: > > $ cat /proc/meminfo > MemTotal: 32861400 kB > MemFree: 1232172 kB > Buffers: 29116 kB > Cached: 30017272 kB > SwapCached: 44 kB > Active: 18840852 kB > Inactive: 11772428 kB > Active(anon): 492928 kB > Inactive(anon): 75264 kB > Active(file): 18347924 kB > Inactive(file): 11697164 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 16382900 kB > SwapFree: 16382680 kB > Dirty: 8 kB > Writeback: 0 kB > AnonPages: 566876 kB > Mapped: 14212 kB > Shmem: 1276 kB > Slab: 429164 kB > SReclaimable: 324752 kB > SUnreclaim: 104412 kB > KernelStack: 3528 kB > PageTables: 16956 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 32813600 kB > Committed_AS: 3053096 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 340196 kB > VmallocChunk: 34342345980 kB > HardwareCorrupted: 0 kB > AnonHugePages: 200704 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 6656 kB > DirectMap2M: 2072576 kB > DirectMap1G: 31457280 kB > > Does this suggest any approach? Is there a doc that suggests optimal > kernel parameters for gluster? > > I guess the only other option is to use the glusterfs as an NFS mount > and use the NFS client's caching..? That will help on a single > process but decrease the overall cluster bandwidth considerably. > > -- > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 > 415 South Circle View Dr, Irvine, CA, 92697 [shipping] > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users