Re: [Gluster-devel] [Gluster-users] Not able to start glusterd
Abhishek, We need below information on investigate this issue. 1. gluster --version 2. Please run glusterd in gdb, so that we can capture the backtrace. I see some rpc errors in log, but backtrace will be more helpful. To run glusterd in gdb, you need start glusterd in gdb (i.e. gdb glusterd, and then give the command "run -N"). when you see a segmentation fault, please capture the backtrace and paste it here. On Wed, Mar 6, 2019 at 10:07 AM ABHISHEK PALIWAL wrote: > Hi Team, > > I am facing the issue where at the time of starting the glusterd > segmentation fault is reported. > > Below are the logs > > root@128:/usr/sbin# ./glusterd --debug > [1970-01-01 15:19:43.940386] I [MSGID: 100030] [glusterfsd.c:2691:main] > 0-./glusterd: Started running ./glusterd version 5.0 (args: ./glusterd > --debug) > [1970-01-01 15:19:43.940855] D > [logging.c:1833:__gf_log_inject_timer_event] 0-logging-infra: Starting > timer now. Timeout = 120, current buf size = 5 > [1970-01-01 15:19:43.941736] D [MSGID: 0] [glusterfsd.c:747:get_volfp] > 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol > [1970-01-01 15:19:43.945796] D [MSGID: 101097] > [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on > /usr/lib64/glusterfs/5.0/xlator/mgmt/glusterd.so: undefined symbol: > xlator_api. Fall back to old symbols > [1970-01-01 15:19:43.946279] I [MSGID: 106478] [glusterd.c:1435:init] > 0-management: Maximum allowed open file descriptors set to 65536 > [1970-01-01 15:19:43.946419] I [MSGID: 106479] [glusterd.c:1491:init] > 0-management: Using /var/lib/glusterd as working directory > [1970-01-01 15:19:43.946515] I [MSGID: 106479] [glusterd.c:1497:init] > 0-management: Using /var/run/gluster as pid file working directory > [1970-01-01 15:19:43.946968] D [MSGID: 0] > [glusterd.c:458:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog > value: 10 > [1970-01-01 15:19:43.947139] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: > RPC service inited. > [1970-01-01 15:19:43.947241] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, > Port: 0 > [1970-01-01 15:19:43.947379] D [rpc-transport.c:269:rpc_transport_load] > 0-rpc-transport: attempt to load file > /usr/lib64/glusterfs/5.0/rpc-transport/socket.so > [1970-01-01 15:19:43.955198] D [socket.c:4464:socket_init] > 0-socket.management: Configued transport.tcp-user-timeout=0 > [1970-01-01 15:19:43.955316] D [socket.c:4482:socket_init] > 0-socket.management: Reconfigued transport.keepalivecnt=9 > [1970-01-01 15:19:43.955415] D [socket.c:4167:ssl_setup_connection_params] > 0-socket.management: SSL support on the I/O path is NOT enabled > [1970-01-01 15:19:43.955504] D [socket.c:4170:ssl_setup_connection_params] > 0-socket.management: SSL support for glusterd is NOT enabled > [1970-01-01 15:19:43.955612] D [name.c:572:server_fill_address_family] > 0-socket.management: option address-family not specified, defaulting to > inet6 > [1970-01-01 15:19:43.955928] D [rpc-transport.c:269:rpc_transport_load] > 0-rpc-transport: attempt to load file > /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so > [1970-01-01 15:19:43.956079] E [rpc-transport.c:273:rpc_transport_load] > 0-rpc-transport: /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so: cannot > open shared object file: No such file or directory > [1970-01-01 15:19:43.956177] W [rpc-transport.c:277:rpc_transport_load] > 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not > valid or not found on this machine > [1970-01-01 15:19:43.956270] W [rpcsvc.c:1789:rpcsvc_create_listener] > 0-rpc-service: cannot create listener, initing the transport failed > [1970-01-01 15:19:43.956362] E [MSGID: 106244] [glusterd.c:1798:init] > 0-management: creation of 1 listeners failed, continuing with succeeded > transport > [1970-01-01 15:19:43.956459] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, > Ver: 2, Port: 0 > [1970-01-01 15:19:43.956561] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: > 1238463, Ver: 2, Port: 0 > [1970-01-01 15:19:43.95] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, > Ver: 2, Port: 0 > [1970-01-01 15:19:43.956758] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 1238433, > Ver: 3, Port: 0 > [1970-01-01 15:19:43.956853] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: > 1, Port: 0 > [1970-01-01 15:19:43.956946] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, > Ver: 2, Port: 0 > [1970-01-01 15:19:43.957062] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: Gluster MGMT Handshake, Num: > 1239873,
[Gluster-devel] Not able to start glusterd
Hi Team, I am facing the issue where at the time of starting the glusterd segmentation fault is reported. Below are the logs root@128:/usr/sbin# ./glusterd --debug [1970-01-01 15:19:43.940386] I [MSGID: 100030] [glusterfsd.c:2691:main] 0-./glusterd: Started running ./glusterd version 5.0 (args: ./glusterd --debug) [1970-01-01 15:19:43.940855] D [logging.c:1833:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [1970-01-01 15:19:43.941736] D [MSGID: 0] [glusterfsd.c:747:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol [1970-01-01 15:19:43.945796] D [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/lib64/glusterfs/5.0/xlator/mgmt/glusterd.so: undefined symbol: xlator_api. Fall back to old symbols [1970-01-01 15:19:43.946279] I [MSGID: 106478] [glusterd.c:1435:init] 0-management: Maximum allowed open file descriptors set to 65536 [1970-01-01 15:19:43.946419] I [MSGID: 106479] [glusterd.c:1491:init] 0-management: Using /var/lib/glusterd as working directory [1970-01-01 15:19:43.946515] I [MSGID: 106479] [glusterd.c:1497:init] 0-management: Using /var/run/gluster as pid file working directory [1970-01-01 15:19:43.946968] D [MSGID: 0] [glusterd.c:458:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog value: 10 [1970-01-01 15:19:43.947139] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: RPC service inited. [1970-01-01 15:19:43.947241] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [1970-01-01 15:19:43.947379] D [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/5.0/rpc-transport/socket.so [1970-01-01 15:19:43.955198] D [socket.c:4464:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0 [1970-01-01 15:19:43.955316] D [socket.c:4482:socket_init] 0-socket.management: Reconfigued transport.keepalivecnt=9 [1970-01-01 15:19:43.955415] D [socket.c:4167:ssl_setup_connection_params] 0-socket.management: SSL support on the I/O path is NOT enabled [1970-01-01 15:19:43.955504] D [socket.c:4170:ssl_setup_connection_params] 0-socket.management: SSL support for glusterd is NOT enabled [1970-01-01 15:19:43.955612] D [name.c:572:server_fill_address_family] 0-socket.management: option address-family not specified, defaulting to inet6 [1970-01-01 15:19:43.955928] D [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so [1970-01-01 15:19:43.956079] E [rpc-transport.c:273:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [1970-01-01 15:19:43.956177] W [rpc-transport.c:277:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [1970-01-01 15:19:43.956270] W [rpcsvc.c:1789:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [1970-01-01 15:19:43.956362] E [MSGID: 106244] [glusterd.c:1798:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [1970-01-01 15:19:43.956459] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, Ver: 2, Port: 0 [1970-01-01 15:19:43.956561] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: 1238463, Ver: 2, Port: 0 [1970-01-01 15:19:43.95] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, Ver: 2, Port: 0 [1970-01-01 15:19:43.956758] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 1238433, Ver: 3, Port: 0 [1970-01-01 15:19:43.956853] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: 1, Port: 0 [1970-01-01 15:19:43.956946] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, Ver: 2, Port: 0 [1970-01-01 15:19:43.957062] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster MGMT Handshake, Num: 1239873, Ver: 1, Port: 0 [1970-01-01 15:19:43.957205] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: RPC service inited. [1970-01-01 15:19:43.957303] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [1970-01-01 15:19:43.957408] D [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/5.0/rpc-transport/socket.so [1970-01-01 15:19:43.957563] D [socket.c:4424:socket_init] 0-socket.management: disabling nodelay [1970-01-01 15:19:43.957650] D [socket.c:4464:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0 [1970-01-01
Re: [Gluster-devel] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb)
On 3/4/19 12:33 PM, Shyam Ranganathan wrote: > On 3/4/19 10:08 AM, Atin Mukherjee wrote: >> >> >> On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan >> mailto:atumb...@redhat.com>> wrote: >> >> Thanks to those who participated. >> >> Update at present: >> >> We found 3 blocker bugs in upgrade scenarios, and hence have marked >> release >> as pending upon them. We will keep these lists updated about progress. >> >> >> I’d like to clarify that upgrade testing is blocked. So just fixing >> these test blocker(s) isn’t enough to call release-6 green. We need to >> continue and finish the rest of the upgrade tests once the respective >> bugs are fixed. > > Based on fixes expected by tomorrow for the upgrade fixes, we will build > an RC1 candidate on Wednesday (6-Mar) (tagging early Wed. Eastern TZ). > This RC can be used for further testing. There have been no backports for the upgrade failures, request folks working on the same to post a list of bugs that need to be fixed, to enable tracking the same. (also, ensure they are marked against the release-6 tracker [1]) Also, we need to start writing out the upgrade guide for release-6, any volunteers for the same? Thanks, Shyam [1] Release-6 tracker bug: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.0 ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Bitrot: Time of signing depending on the file size???
Hi David, Thanks for raising the bug. But from the above validation, it's clear that bitrot is not directly involved. Bitrot waits for last fd to be closed. We will have to investigate the reason for fd not being closed for large files. Thanks, Kotresh HR On Mon, Mar 4, 2019 at 3:13 PM David Spisla wrote: > Hello Kotresh, > > Yes, the fd was still open for larger files. I could verify this with a > 500MiB file and some smaller files. After a specific time only the fd for > the 500MiB was up and the file still had no signature, for the smaller > files there were no fds and they already had a signature. I don't know the > reason for this. Maybe the client still keep th fd open? I opened a bug for > this: > https://bugzilla.redhat.com/show_bug.cgi?id=1685023 > > Regards > David > > Am Fr., 1. März 2019 um 18:29 Uhr schrieb Kotresh Hiremath Ravishankar < > khire...@redhat.com>: > >> Interesting observation! But as discussed in the thread bitrot signing >> processes depends 2 min timeout (by default) after last fd closes. It >> doesn't have any co-relation with the size of the file. >> Did you happen to verify that the fd was still open for large files for >> some reason? >> >> >> >> On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: >> >>> Hello folks, >>> >>> I did some observations concerning the bitrot daemon. It seems to be >>> that the bitrot signer is signing files depending on file size. I copied >>> files with different sizes into a volume and I was wonderung because the >>> files get their signature not the same time (I keep the expiry time default >>> with 120). Here are some examples: >>> >>> 300 KB file ~2-3 m >>> 70 MB file ~ 40 m >>> 115 MB file ~ 1 Sh >>> 800 MB file ~ 4,5 h >>> >>> What is the expected behaviour here? >>> Why does it take so long to sign a 800MB file? >>> What about 500GB or 1TB? >>> Is there a way to speed up the sign process? >>> >>> My ambition is to understand this observation >>> >>> Regards >>> David Spisla >>> ___ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >> >> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -- Thanks and Regards, Kotresh H R ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gluster : Improvements on "heal info" command
Hi All, We have observed and heard from gluster users about the long time "heal info" command takes. Even when we all want to know if a gluster volume is healthy or not, it takes time to list down all the files from all the bricks after which we can be sure if the volume is healthy or not. Here, we have come up with some options for "heal info" command which provide report quickly and reliably. gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] Problem: "gluster v heal info" command picks each subvolume and checks the .glusterfs/indices/xattrop folder of every brick of that subvolume to find out if there is any entry which needs to be healed. It picks the entry and takes a lock on that entry to check xattrs to find out if that entry actually needs heal or not. This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. Let's consider two most often seen cases for which we use "heal info" and try to understand the improvements. Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. A brick of the volume is down and client has written 1 files on one of the mount point of this volume. Entries for these 10K files will be created on ".glusterfs/indices/xattrop" on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" command for this volume, it goes to all the bricks and picks these 10K file entries and goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens for all the bricks, that means, we check 50K files and perform the LOCK->CHECK-XATTR->UNLOCK cycle 50K times, while only 10K entries were sufficient to check. It is a very time consuming operation. If IO"s are happening one some of the new files, we check these files also which will add the time. Here, all we wanted to know if our volume has been healed and healthy. Solution : Whenever a brick goes down and comes up and when we use "heal info" command, our *main intention* is to find out if the volume is *healthy* or *unhealthy*. A volume is unhealthy even if one file is not healthy. So, we should scan bricks one by one and as soon as we find that one brick is having some entries which require to be healed, we can come out and list the files and say the volume is not healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" option has been introduced. "gluster v heal vol info --brick=[one,all]" "one" - It will scan the brick sequentially and as soon as it will find any unhealthy entries, it will list it out and stop scanning other bricks. "all" - It will act just like current behavior and provide all the files from all the bricks. If we do not provide this option, default (current) behavior will be applicable. Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* of the sub volume has been replaced and a heal has been triggered. To know if the volume is in healthy state, we go to each brick of *each and every sub volume* and check if there are any entries in ".glusterfs/indices/xattrop" folder which need heal or not. If we know which sub volume participated in brick replacement, we just need to check health of that sub volume and not query/check other sub volumes. If several clients are writing number of files on this volume, an entry for each of these files will be created in .glusterfs/indices/xattrop and "heal info' command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these entries need heal or not which takes lot of time. In addition to this a client will also see performance drop as it will have to release and take lock again. Solution: Provide an option to mention number of sub volume for which we want to check heal info. "gluster v heal vol info --subvol= " Here, --subvol will be given number of the subvolume we want to check. Example: "gluster v heal vol info --subvol=1 " === Performance Data - A quick performance test done on standalone system. Type: Distributed-Disperse Volume ID: ea40eb13-d42c-431c-9c89-0153e834e67e Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: apandey:/home/apandey/bricks/gluster/vol-1 Brick2: apandey:/home/apandey/bricks/gluster/vol-2 Brick3: apandey:/home/apandey/bricks/gluster/vol-3 Brick4: apandey:/home/apandey/bricks/gluster/vol-4 Brick5: apandey:/home/apandey/bricks/gluster/vol-5 Brick6: apandey:/home/apandey/bricks/gluster/vol-6 Brick7: apandey:/home/apandey/bricks/gluster/new-1 Brick8: apandey:/home/apandey/bricks/gluster/new-2 Brick9: apandey:/home/apandey/bricks/gluster/new-3 Brick10: apandey:/home/apandey/bricks/gluster/new-4 Brick11: apandey:/home/apandey/bricks/gluster/new-5 Brick12: apandey:/home/apandey/bricks/gluster/new-6 Just disabled the shd to get the data - Killed one brick each from two subvolumes and wrote 2000 files on mount point. [root@apandey