On Monday 08 June 2015 07:11 PM, Geoffrey Letessier wrote:
In addition, i notice a very big difference between the sum of DU on each brick and « quota list » display, as you can read below: [root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/amyloid_teamcl-storage1: 1,6T/export/brick_home/brick1/amyloid_team cl-storage3: 1,6T/export/brick_home/brick1/amyloid_team cl-storage1: 1,6T/export/brick_home/brick2/amyloid_team cl-storage3: 1,6T/export/brick_home/brick2/amyloid_team [root@lucifer ~]# gluster volume quota vol_home list /amyloid_teamPath Hard-limit Soft-limit Used Available-------------------------------------------------------------------------------- /amyloid_team 9.0TB 90% 7.8TB 1.2TBAs you can notice, the sum of all bricks gives me roughly 6.4TB and « quota list » around 7.8TB; so there is a difference of 1.4TB i’m not able to explain… Do you have any idea?
There were few issues when quota accounting the size, we have fixed some of these issues in 3.7 'df -h' will round off the values, can you please provide the output of 'df' without -h option?
we suspect 'vi' might have created tmp file before writing to a file. We are working on re-creating this problem and will update you on the same.Thanks, Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 ParisTel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr <mailto:geoffrey.letess...@ibpc.fr>Le 8 juin 2015 à 14:30, Geoffrey Letessier <geoffrey.letess...@cnrs.fr <mailto:geoffrey.letess...@cnrs.fr>> a écrit :Hello,Concerning the 3.5.3 version of GlusterFS, I met this morning a strange issue writing file when quota is exceeded.One person of my lab, whose her quota is exceeded (but she didn’t know about) try to modify a file but, because of exceeded quota, she was unable to and decided to exit VI. Now, her file is empty/blank as you can read below:
pdsh@lucifer: cl-storage3: ssh exited with exit code 2cl-storage1: ---------T 2 tarus amyloid_team 0 19 févr. 12:34 /export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh cl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0 8 juin 12:38 /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.shIn addition, i dont understand why, my volume being a distributed volume inside replica (cl-storage[1,3] is replicated only on cl-storage[2,4]), i have 2 « same » files (complete path) in 2 different bricks (as you can read above).Thanks by advance for your help and clarification. Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 ParisTel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr <mailto:geoffrey.letess...@ibpc.fr>Le 2 juin 2015 à 23:45, Geoffrey Letessier <geoffrey.letess...@cnrs.fr <mailto:geoffrey.letess...@cnrs.fr>> a écrit :Hi Ben,I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours..As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)…[root@nisus test]# ddt -t 10g /mnt/test/ Writing to /mnt/test/ddt.8362 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/test/ddt.8362 ... done. 10240MiB KiB/s CPU% Write 114770 4 Read 40675 4 for info: /mnt/test concerns the single v2 GlFS volume [root@nisus test]# ddt -t 10g /mnt/fhgfs/ Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/fhgfs/ddt.8380 ... done. 10240MiB KiB/s CPU% Write 102591 1 Read 98079 2Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)?--------------------------------------------------------------- | | UNTAR | DU | FIND | TAR | RM | --------------------------------------------------------------- | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | --------------------------------------------------------------- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --------------------------------------------------------------- | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | --------------------------------------------------------------- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --------------------------------------------------------------- | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | --------------------------------------------------------------- | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | --------------------------------------------------------------- | single (v2) | ~3m6s | ~14s | ~32s | ~1m2s | ~44s | --------------------------------------------------------------- for info:-BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers)- single (v2): simple gluster volume with default settingsI also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK.Thank you very much for your reply and help. Geoffrey ----------------------------------------------- Geoffrey Letessier Responsable informatique & ingénieur système CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 ParisTel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr <mailto:geoffrey.letess...@cnrs.fr>Le 2 juin 2015 à 21:53, Ben Turner <btur...@redhat.com <mailto:btur...@redhat.com>> a écrit :I am seeing problems on 3.7 as well. Can you check /var/log/messages on both the clients and servers for hung tasks like:Jun 2 15:23:14 gqac006 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 2 15:23:14 gqac006 kernel: iozone D 0000000000000001 0 21999 1 0x00000080 Jun 2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082 ffff880611321c18 ffffffffa027236e Jun 2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10 ffff88052bd1e040 ffff880611321c78 Jun 2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0 ffff880625addaf8 ffff880611321fd8Jun 2 15:23:14 gqac006 kernel: Call Trace:Jun 2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ? rpc_make_runnable+0x7e/0x80 [sunrpc] Jun 2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ? rpc_execute+0x50/0xa0 [sunrpc] Jun 2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ? ktime_get_ts+0xb1/0xf0 Jun 2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? sync_page+0x0/0x50 Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] io_schedule+0x73/0xc0 Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] sync_page+0x3d/0x50 Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>] __wait_on_bit+0x5f/0x90 Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124543>] wait_on_page_bit+0x73/0x80 Jun 2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ? wake_bit_function+0x0/0x50 Jun 2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ? pagevec_lookup_tag+0x25/0x40 Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112496b>] wait_on_page_writeback_range+0xfb/0x190 Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124b38>] filemap_write_and_wait_range+0x78/0x90 Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>] vfs_fsync_range+0x7e/0x100 Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] sys_fsync+0x10/0x20 Jun 2 15:23:14 gqac006 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1bDo you see a perf problem with just a simple DD or do you need a more complex workload to hit the issue? I think I saw an issue with metadata performance that I am trying to run down, let me know if you can see the problem with simple DD reads / writes or if we need to do some sort of dir / metadata access as well.-b ----- Original Message -----From: "Geoffrey Letessier" <geoffrey.letess...@cnrs.fr <mailto:geoffrey.letess...@cnrs.fr>> To: "Pranith Kumar Karampuri" <pkara...@redhat.com <mailto:pkara...@redhat.com>>Cc:gluster-users@gluster.org <mailto:gluster-users@gluster.org> Sent: Tuesday, June 2, 2015 8:09:04 AM Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances Hi Pranith,I’m sorry but I cannot bring you any comparison because comparison will be distorted by the fact in my HPC cluster in production the network technologyis InfiniBand QDR and my volumes are quite different (brick in RAID6 (12x2TB), 2 bricks per server and 4 servers into my pool)Concerning your demand, in attachments you can find all expected results hoping it can help you to solve this serious performance issue (maybe I needplay with glusterfs parameters?). Thank you very much by advance, Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 ParisTel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr <mailto:geoffrey.letess...@ibpc.fr>Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri < pkara...@redhat.com <mailto:pkara...@redhat.com> > aécrit : hi Geoffrey, Since you are saying it happens on all types of volumes, lets do the following: 1) Create a dist-repl volume 2) Set the options etc you need.3) enable gluster volume profile using "gluster volume profile <volname>start" 4) run the work load 5) give output of "gluster volume profile <volname> info"Repeat the steps above on new and old version you are comparing this with.That should give us insight into what could be causing the slowness. Pranith On 06/02/2015 03:22 AM, Geoffrey Letessier wrote: Dear all,I have a crash test cluster where i’ve tested the new version of GlusterFS(v3.7) before upgrading my HPC cluster in production. But… all my tests show me very very low performances.For my benches, as you can read below, I do some actions (untar, du, find, tar, rm) with linux kernel sources, dropping cache, each on distributed, replicated, distributed-replicated, single (single brick) volumes and thenative FS of one brick.# time (echo 3 > /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz;sync; echo 3 > /proc/sys/vm/drop_caches)# time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 >/proc/sys/vm/drop_caches)# time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3/proc/sys/vm/drop_caches)# time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) And here are the process times: --------------------------------------------------------------- | | UNTAR | DU | FIND | TAR | RM | --------------------------------------------------------------- | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | --------------------------------------------------------------- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --------------------------------------------------------------- | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | --------------------------------------------------------------- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --------------------------------------------------------------- | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | ---------------------------------------------------------------I get the same results, whether with default configurations with customconfigurations.if I look at the side of the ifstat command, I can note my IO write processesnever exceed 3MBs...EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS oneMy [test] storage cluster config is composed by 2 identical servers (biCPUIntel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet) My volume settings: single: 1server 1 brick replicated: 2 servers 1 brick each distributed: 2 servers 2 bricks each dist-repl: 2 bricks in the same server and replica 2 All seems to be OK in gluster status command line. Do you have an idea why I obtain so bad results? Thanks in advance. Geoffrey ----------------------------------------------- Geoffrey Letessier Responsable informatique & ingénieur système CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 ParisTel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr <mailto:geoffrey.letess...@cnrs.fr>_______________________________________________Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>http://www.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> http://www.gluster.org/mailman/listinfo/gluster-users_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users