Re: [Gluster-devel] Core from gNFS process
On 01/15/2016 08:38 AM, Soumya Koduri wrote: On 01/15/2016 06:52 PM, Soumya Koduri wrote: On 01/14/2016 08:41 PM, Vijay Bellur wrote: On 01/14/2016 04:11 AM, Jiffin Tony Thottan wrote: On 14/01/16 14:28, Jiffin Tony Thottan wrote: Hi, The core generated when encryption xlator is enabled [2016-01-14 08:13:15.740835] E [crypt.c:4298:master_set_master_vol_key] 0-test1-crypt: FATAL: missing master key [2016-01-14 08:13:15.740859] E [MSGID: 101019] [xlator.c:429:xlator_init] 0-test1-crypt: Initialization of volume 'test1-crypt' failed, review your volfile again [2016-01-14 08:13:15.740890] E [MSGID: 101066] [graph.c:324:glusterfs_graph_init] 0-test1-crypt: initializing translator failed [2016-01-14 08:13:15.740904] E [MSGID: 101176] [graph.c:670:glusterfs_graph_activate] 0-graph: init failed [2016-01-14 08:13:15.741676] W [glusterfsd.c:1231:cleanup_and_exit] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x307) [0x40d287] -->/usr/sbin/glusterfs(glusterfs_process_volfp+0x117) [0x4086c7] -->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407e1d] ) 0-: received signum (0), shutting down Forgot to mention this last mail, for crypt xlator needs master key before enabling the translator which cause the issue -- Irrespective of the problem, the nfs process should not crash. Can we check why there is a memory corruption during cleanup_and_exit()? That's right. This issue was reported quite a few times earlier in gluster-devel and it is not specific to gluster-nfs process. As updated in [1], we have raised bug1293594[2] against lib-gcc team to further investigate this. The segmentation fault in gcc is while attempting to print a backtrace upon glusterfs receiving a SIGSEGV. It would be good to isolate the reason for the initial SIGSEGV whose signal handler causes the further crash. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file
Another observation: if rsyncing is resumed after hang, rsync itself hangs a lot faster because it does stat of already copied files. So, the reason may be not writing itself, but massive stat on GlusterFS volume as well. 15.01.2016 09:40, Oleksandr Natalenko написав: While doing rsync over millions of files from ordinary partition to GlusterFS volume, just after approx. first 2 million rsync hang happens, and the following info appears in dmesg: === [17075038.924481] INFO: task rsync:10310 blocked for more than 120 seconds. [17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17075038.940748] rsync D 88207fc13680 0 10310 10309 0x0080 [17075038.940752] 8809c578be18 0086 8809c578bfd8 00013680 [17075038.940756] 8809c578bfd8 00013680 880310cbe660 881159d16a30 [17075038.940759] 881e3aa25800 8809c578be48 881159d16b10 88087d553980 [17075038.940762] Call Trace: [17075038.940770] [] schedule+0x29/0x70 [17075038.940797] [] __fuse_request_send+0x13d/0x2c0 [fuse] [17075038.940801] [] ? fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse] [17075038.940805] [] ? wake_up_bit+0x30/0x30 [17075038.940809] [] fuse_request_send+0x12/0x20 [fuse] [17075038.940813] [] fuse_flush+0xff/0x150 [fuse] [17075038.940817] [] filp_close+0x34/0x80 [17075038.940821] [] __close_fd+0x78/0xa0 [17075038.940824] [] SyS_close+0x23/0x50 [17075038.940828] [] system_call_fastpath+0x16/0x1b === rsync blocks in D state, and to kill it, I have to do umount --lazy on GlusterFS mountpoint, and then kill corresponding client glusterfs process. Then rsync exits. Here is GlusterFS volume info: === Volume Name: asterisk_records Type: Distributed-Replicate Volume ID: dc1fe561-fa3a-4f2e-8330-ec7e52c75ba4 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: server1:/bricks/10_megaraid_0_3_9_x_0_4_3_hdd_r1_nolvm_hdd_storage_01/asterisk/records Brick2: server2:/bricks/10_megaraid_8_5_14_x_8_6_16_hdd_r1_nolvm_hdd_storage_01/asterisk/records Brick3: server1:/bricks/11_megaraid_0_5_4_x_0_6_5_hdd_r1_nolvm_hdd_storage_02/asterisk/records Brick4: server2:/bricks/11_megaraid_8_7_15_x_8_8_20_hdd_r1_nolvm_hdd_storage_02/asterisk/records Brick5: server1:/bricks/12_megaraid_0_7_6_x_0_13_14_hdd_r1_nolvm_hdd_storage_03/asterisk/records Brick6: server2:/bricks/12_megaraid_8_9_19_x_8_13_24_hdd_r1_nolvm_hdd_storage_03/asterisk/records Options Reconfigured: cluster.lookup-optimize: on cluster.readdir-optimize: on client.event-threads: 2 network.inode-lru-limit: 4096 server.event-threads: 4 performance.client-io-threads: on storage.linux-aio: on performance.write-behind-window-size: 4194304 performance.stat-prefetch: on performance.quick-read: on performance.read-ahead: on performance.flush-behind: on performance.write-behind: on performance.io-thread-count: 2 performance.cache-max-file-size: 1048576 performance.cache-size: 33554432 features.cache-invalidation: on performance.readdir-ahead: on === The issue reproduces each time I rsync such an amount of files. How could I debug this issue better? ___ Gluster-users mailing list gluster-us...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Core from gNFS process
On 01/15/2016 06:52 PM, Soumya Koduri wrote: On 01/14/2016 08:41 PM, Vijay Bellur wrote: On 01/14/2016 04:11 AM, Jiffin Tony Thottan wrote: On 14/01/16 14:28, Jiffin Tony Thottan wrote: Hi, The core generated when encryption xlator is enabled [2016-01-14 08:13:15.740835] E [crypt.c:4298:master_set_master_vol_key] 0-test1-crypt: FATAL: missing master key [2016-01-14 08:13:15.740859] E [MSGID: 101019] [xlator.c:429:xlator_init] 0-test1-crypt: Initialization of volume 'test1-crypt' failed, review your volfile again [2016-01-14 08:13:15.740890] E [MSGID: 101066] [graph.c:324:glusterfs_graph_init] 0-test1-crypt: initializing translator failed [2016-01-14 08:13:15.740904] E [MSGID: 101176] [graph.c:670:glusterfs_graph_activate] 0-graph: init failed [2016-01-14 08:13:15.741676] W [glusterfsd.c:1231:cleanup_and_exit] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x307) [0x40d287] -->/usr/sbin/glusterfs(glusterfs_process_volfp+0x117) [0x4086c7] -->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407e1d] ) 0-: received signum (0), shutting down Forgot to mention this last mail, for crypt xlator needs master key before enabling the translator which cause the issue -- Irrespective of the problem, the nfs process should not crash. Can we check why there is a memory corruption during cleanup_and_exit()? That's right. This issue was reported quite a few times earlier in gluster-devel and it is not specific to gluster-nfs process. As updated in [1], we have raised bug1293594[2] against lib-gcc team to further investigate this. As requested in [1], kindly upload the core in the bug along with bt taken with gcc debuginfo packages installed. Might help to get their attention and get a closure on this issue sooner. Here is the bug link - https://bugzilla.redhat.com/show_bug.cgi?id=1293594 Request Raghavendra/Ravi to update it. Thanks, Soumya Thanks, Soumya [1] http://article.gmane.org/gmane.comp.file-systems.gluster.devel/13298 -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Core from gNFS process
On 01/14/2016 08:41 PM, Vijay Bellur wrote: On 01/14/2016 04:11 AM, Jiffin Tony Thottan wrote: On 14/01/16 14:28, Jiffin Tony Thottan wrote: Hi, The core generated when encryption xlator is enabled [2016-01-14 08:13:15.740835] E [crypt.c:4298:master_set_master_vol_key] 0-test1-crypt: FATAL: missing master key [2016-01-14 08:13:15.740859] E [MSGID: 101019] [xlator.c:429:xlator_init] 0-test1-crypt: Initialization of volume 'test1-crypt' failed, review your volfile again [2016-01-14 08:13:15.740890] E [MSGID: 101066] [graph.c:324:glusterfs_graph_init] 0-test1-crypt: initializing translator failed [2016-01-14 08:13:15.740904] E [MSGID: 101176] [graph.c:670:glusterfs_graph_activate] 0-graph: init failed [2016-01-14 08:13:15.741676] W [glusterfsd.c:1231:cleanup_and_exit] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x307) [0x40d287] -->/usr/sbin/glusterfs(glusterfs_process_volfp+0x117) [0x4086c7] -->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407e1d] ) 0-: received signum (0), shutting down Forgot to mention this last mail, for crypt xlator needs master key before enabling the translator which cause the issue -- Irrespective of the problem, the nfs process should not crash. Can we check why there is a memory corruption during cleanup_and_exit()? That's right. This issue was reported quite a few times earlier in gluster-devel and it is not specific to gluster-nfs process. As updated in [1], we have raised bug1293594[2] against lib-gcc team to further investigate this. As requested in [1], kindly upload the core in the bug along with bt taken with gcc debuginfo packages installed. Might help to get their attention and get a closure on this issue sooner. Thanks, Soumya [1] http://article.gmane.org/gmane.comp.file-systems.gluster.devel/13298 -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Blog post on GlusterFS-Quota
Hi all, I have written an initial blog post[1] on GlusterFS-quota. I will update the blog or add few more posts to add the remaining contents. It is also been shared on planet.gluster.org. Suggestions/comments are welcome :-) [1] https://manikandanselvaganesh.wordpress.com/category/glusterfs/ -- Thanks & Regards, Manikandan Selvaganesh. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gluster AFR volume write performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND in afr_writev
GLUSTERFS_WRITE_IS_APPEND Setting in afr_writev function at glusterfs client end makes the posix_writev in the server end deal IO write fops from parallel to serial in consequence. i.e. multiple io-worker threads carrying out IO write fops are blocked in posix_writev to execute final write fop pwrite/pwritev in __posix_writev function ONE AFTER ANOTHER. For example: thread1: iot_worker -> ... -> posix_writev() | thread2: iot_worker -> ... -> posix_writev() | thread3: iot_worker -> ... -> posix_writev() -> __posix_writev() thread4: iot_worker -> ... -> posix_writev() | there are 4 iot_worker thread doing the 128KB IO write fops as above, but only one can execute __posix_writev function and the others have to wait. however, if the afr volume is configured on with storage.linux-aio which is off in default, the iot_worker will use posix_aio_writev instead of posix_writev to write data. the posix_aio_writev function won't be affected by GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write performance goes up. So, my question is whether AFR volume could work fine with storage.linux-aio configuration which bypass the GLUSTERFS_WRITE_IS_APPEND setting in afr_writev, and why glusterfs keeps posix_aio_writev different from posix_writev ? Any replies to clear my confusion would be grateful, and thanks in advance. ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s). If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited. If you have received this mail in error, please delete it and notify us immediately. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel