Including devel

Pranith
On 06/14/2014 02:37 AM, David F. Robinson wrote:
Another update... The previous tests have shown that I can kill gluster with even a moderate load to the storage system. One thing we noticed with previous version of gluster was that the failure was sensitive to TCP parameters. I have seen other postings on the web noting similar behavior along with recommendations for TCP tuning parameters.

When I use the default TCP parameters, the job dies during i/o and gluster hangs during the heals with each of the bricks showing "crawl in progress". This never clears and the i/o gets killed...

When I set the following parameters in /etc/sysctl.conf, the job runs to completion without any issues and I don't get hung heal processes...

# Set by T. Young May 22 2014
net.core.netdev_max_backlog = 2500
net.ipv4.tcp_max_syn_backlog = 4096
net.core.rmem_max=8388608
net.core.wmem_max=8388608
net.core.rmem_default=65536
net.core.wmem_default=65536
net.ipv4.tcp_rmem=4096 87380 8388608
net.ipv4.tcp_wmem=4096 65536 8388608
net.ipv4.tcp_mem=8388608 8388608 8388608
net.ipv4.route.flush=1

I do still get many thousands of the following messages in the log files:

[2014-06-13 21:05:22.164073] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241947: LOOKUP (null) (89371586-2e16-4623-bc9b-feb069b5c982) ==> (Stale file handle) [2014-06-13 21:05:22.165627] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241948: LOOKUP (null) (8589b53e-f8b5-4bf9-9f54-f550e4e768c0) ==> (Stale file handle) [2014-06-13 21:05:22.166395] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241949: LOOKUP (null) (2ad6bcce-4842-4c29-a319-39f276239b8b) ==> (Stale file handle) [2014-06-13 21:05:22.166989] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241950: LOOKUP (null) (71b013f7-d508-41ee-8bc8-c8b328ff9f3a) ==> (Stale file handle) [2014-06-13 21:05:22.167653] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241951: LOOKUP (null) (1d0c99a8-b2ab-402c-a8b2-33f55bcf6123) ==> (Stale file handle) [2014-06-13 21:05:22.168270] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241952: LOOKUP (null) (c4f8b979-cbf3-4d6b-bcf9-6d5150521e19) ==> (Stale file handle) [2014-06-13 21:05:22.168797] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241953: LOOKUP (null) (81da3d62-49fc-4465-9fb2-baa6a3278ce3) ==> (Stale file handle) [2014-06-13 21:05:22.169420] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241954: LOOKUP (null) (dc9e9c2b-f801-452c-8ef7-009e600d23ca) ==> (Stale file handle)

David


------ Original Message ------
From: "Justin Clift" <jus...@gluster.org>
To: "David F. Robinson" <david.robin...@corvidtec.com>
Cc: "Ravishankar N" <ravishan...@redhat.com>; "Pranith Kumar Karampuri" <pkara...@redhat.com>; "Tom Young" <tom.yo...@corvidtec.com>
Sent: 6/13/2014 11:16:38 AM
Subject: Re: gluster 3.5.1 beta

Thanks, that's good news on the positive progress front. :)

+ Justin

On 13/06/2014, at 4:12 PM, David F. Robinson wrote:
FYI... The 3.5.1beta2 completed the large rsync... The last time I tried this, the rsync died after about 3-TB; this time it completed the 8TB transfer... The only messages that seem strange in the logs after the rsync completed are:

[2014-06-13 15:09:30.080574] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227104: LOOKUP (null) (3cf20fd1-ce27-4fbd-aaa6-cd31aa6a13e5) ==> (Stale file handle) [2014-06-13 15:09:30.969218] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227105: LOOKUP (null) (b7353434-32a4-4674-9f62-f373d3d1d4f2) ==> (Stale file handle) [2014-06-13 15:10:32.814144] I [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227114: LOOKUP (null) (ad34cd69-0c90-4de9-9688-34199f6a3ae1) ==> (Stale file handle)

 David


 ------ Original Message ------
 From: "Ravishankar N" <ravishan...@redhat.com>
 To: "Justin Clift" <jus...@gluster.org>
Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>; "Tom Young" <tom.yo...@corvidtec.com>; "David F. Robinson" <david.robin...@corvidtec.com>
 Sent: 6/13/2014 12:22:58 AM
 Subject: Re: gluster 3.5.1 beta

 On 06/13/2014 04:03 AM, Justin Clift wrote:
 Testing feedback for 3.5.1 beta2 (was in a different email chain).

 Some strange looking messages in the logs (scroll down for the
 better details):

[2014-06-12 22:09:54.482481] E [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base index is not createdunder index/base_indices_holder
This would be fixed once http://review.gluster.org/#/c/7897/ gets accepted.
 and:

[2014-06-12 21:49:54.326014] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Software-replicate-1: lookup failed on index dir on Software-client-2 - (Stale file handle)
 We still need to root cause this...

 + Justin


 Begin forwarded message:
 From: "David F. Robinson" <david.robin...@corvidtec.com>
 <snip>
FYI. I am retesting the gluster 3.5.1-beta2 using the same approach as before. I gluster mounted my homegfs partition to a workstation and am doing an rsync of roughly 8TB of data. The 3.5.1 version died after transferring roughly 3-4TB with the errors show in the previous emails. It seems to be doing fine and has already transferred 2.5TB. The log messages that seemed strage are:

[2014-06-12 22:01:59.872521] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7feb9f293e80]))) 0-dict: data is NULL [2014-06-12 22:01:59.872540] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7feb9f293e8b]))) 0-dict: data is NULL [2014-06-12 22:01:59.872545] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2014-06-12 22:02:02.872835] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7feb9f293e80]))) 0-dict: data is NULL [2014-06-12 22:02:02.872855] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7feb9f293e8b]))) 0-dict: data is NULL [2014-06-12 22:02:02.872860] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2014-06-12 22:02:05.873151] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7feb9f293e80]))) 0-dict: data is NULL [2014-06-12 22:02:05.873171] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7feb9f293e8b]))) 0-dict: data is NULL [2014-06-12 22:02:05.873176] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2014-06-12 22:02:08.873483] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7feb9f293e80]))) 0-dict: data is NULL [2014-06-12 22:02:08.873504] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7feb9f293e8b]))) 0-dict: data is NULL [2014-06-12 22:02:08.873509] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2014-06-12 22:02:11.873806] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7feb9f293e80]))) 0-dict: data is NULL [2014-06-12 22:02:11.873827] W [dict.c:1055:data_to_str] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) [0x7feb9f28f8ec] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7feb9f293fcd] (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7feb9f293e8b]))) 0-dict: data is NULL [2014-06-12 22:02:11.873832] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2014-06-12 22:02:46.073341] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled [2014-06-12 22:02:46.073369] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread

[2014-06-12 21:29:54.225860] E [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base index is not createdunder index/base_indices_holder [2014-06-12 21:39:54.276236] E [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base index is not createdunder index/base_indices_holder [2014-06-12 21:49:54.325532] E [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base index is not createdunder index/base_indices_holder [2014-06-12 21:59:54.374955] E [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base index is not createdunder index/base_indices_holder [2014-06-12 22:09:54.482350] E [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base index is not createdunder index/base_indices_holder [2014-06-12 22:09:54.482481] E [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base index is not createdunder index/base_indices_holder

I am also still seeing these messages (very strange because there are no files on on the Software volume. That volume is completely empty...):

[2014-06-12 21:49:54.326014] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Software-replicate-1: lookup failed on index dir on Software-client-2 - (Stale file handle) [2014-06-12 21:49:54.327077] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Software-replicate-0: lookup failed on index dir on Software-client-0 - (Stale file handle) [2014-06-12 21:59:54.373724] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Source-replicate-0: lookup failed on index dir on Source-client-0 - (Stale file handle) [2014-06-12 21:59:54.373950] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Source-replicate-1: lookup failed on index dir on Source-client-2 - (Stale file handle) [2014-06-12 21:59:54.375302] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Software-replicate-1: lookup failed on index dir on Software-client-2 - (Stale file handle) [2014-06-12 21:59:54.376673] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Software-replicate-0: lookup failed on index dir on Software-client-0 - (Stale file handle) [2014-06-12 22:09:54.424471] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Source-replicate-0: lookup failed on index dir on Source-client-0 - (Stale file handle) [2014-06-12 22:09:54.424667] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Source-replicate-1: lookup failed on index dir on Source-client-2 - (Stale file handle) [2014-06-12 22:09:54.482812] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Software-replicate-1: lookup failed on index dir on Software-client-2 - (Stale file handle) [2014-06-12 22:09:54.482910] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-Software-replicate-0: lookup failed on index dir on Software-client-0 - (Stale file handle)
 David


 On 12/06/2014, at 3:16 AM, David F. Robinson wrote:
Roger that. Thanks for the feedback. For testing, this approach would work fine. If we put gluster into production, it would not be optimal. Taking the entire data storage offline for the upgrade would be difficult given the number of machines and the cluster jobs that are always running.

If you get the rolling upgrade working and need someone to test, let me know. Happy to test and provide feedback.

 Thanks...

 David (Sent from mobile)

 ===============================
 David F. Robinson, Ph.D.
 President - Corvid Technologies
 704.799.6944 x101 [office]
 704.252.1310 [cell]
 704.799.7974 [fax]
 david.robin...@corvidtec.com
 http://www.corvidtechnologies.com
 --
 GlusterFS - http://www.gluster.org

 An open source, distributed file system scaling to several
 petabytes, and handling thousands of clients.

 My personal twitter: twitter.com/realjustinclift




--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Reply via email to