Re: [Gluster-users] RPC program not available (req 1298437 330)
Have you checked the brick logs to see if there's anything unusual there? Regards, Vijay On Thu, Sep 15, 2016 at 5:24 PM, Danny Lee wrote: > Hi, > > Environment: > Gluster Version: 3.8.3 > Operating System: CentOS Linux 7 (Core) > Kernel: Linux 3.10.0-327.28.3.el7.x86_64 > Architecture: x86-64 > Replicated 3-Node Volume > ~400GB of around a million files > > Description of Problem: > One of the brick dies. The only suspect log I see is in the > etc-glusterfs-glusterd.vol.log (shown below). Trying to get an idea of why > the brick died and how it could be prevented in the future. > > During this time, I was forcing replication (find . | xargs stat on the > mount). There were some services starting up as well that was using the > gluster mount. > > [2016-09-13 20:01:50.033369] W [socket.c:590:__socket_rwv] 0-management: > readv on /var/run/gluster/cfc57a83cf9864900aa08380be93.socket failed (No > data available) > [2016-09-13 20:01:50.033830] I [MSGID: 106005] > [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] 0-management: Brick > 172.17.32.28:/usr/local/volname/local-data/mirrored-data has disconnected > from glusterd. > [2016-09-13 20:01:50.121316] W [rpcsvc.c:265:rpcsvc_program_actor] > 0-rpc-service: RPC program not available (req 1298437 330) for > 172.17.32.28:49146 > [2016-09-13 20:01:50.121339] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfully > [2016-09-13 20:01:50.121383] W [rpcsvc.c:265:rpcsvc_program_actor] > 0-rpc-service: RPC program not available (req 1298437 330) for > 172.17.32.28:49146 > [2016-09-13 20:01:50.121392] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfully > The message "I [MSGID: 106005] > [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] 0-management: Brick > 172.17.32.28:/usr/local/volname/local-data/mirrored-data has disconnected > from glusterd." repeated 34 times between [2016-09-13 20:01:50.033830] and > [2016-09-13 20:03:40.010862] > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] geo-rep: -1 (Directory not empty) warning - STATUS Faulty
Thanks for the command, this is the result of that ls: lrwxrwxrwx 1 root root 60 Jul 26 16:12 /data/cloud-pro/brick/.glusterfs/f7/eb/f7eb9d21-d39a-4dd6-941c-46d430e18aa2 -> ../../aa/63/aa63d3f7-656e-4e92-a016-a878714e89f8/Booking On Friday, September 16, 2016 11:58 AM, Aravinda wrote: We can check in brick backend. ls -ld $BRICK_ROOT/.glusterfs/f7/eb/f7eb9d21-d39a-4dd6-941c-46d430e18aa2 regards Aravinda On Thursday 15 September 2016 09:12 PM, ML mail wrote: > So I ran a on my master a "find /mybrick -name 'File 2016.xlsx'" and got the > following two entries: > > 355235 19 drwxr-xr-x 3 www-data www-data3 Sep 15 12:48 > ./data/username/files_encryption/keys/files/Dir/File\ 2016.xlsx > 355234 56 -rw-r--r-- 2 www-data www-data44308 Sep 15 12:48 > ./data/username/files/Dir/File\ 2016.xlsx > > > As you can see, there is one file and one directory named 'File 2016.xlsx'. > This is actually the web application Nextcloud when it uses encryption: the > file is the encrypted file and the directory named like the file contains > some encryption keys to encrypt/decrypt that specific file. > > Now the next thing would be to find out if geo-rep failed on the file or the > directory named 'File 2016.xlsx'. Any ideas how I can check that? and what > else would you need to debug this issue? > > Regards > ML > > > > > > On Thursday, September 15, 2016 9:12 AM, Aravinda wrote: > Thanks for the logs from Master. Below error says directory not empty > but names looks like they are files. Please confirm from Master Volume > that path is directory or file. > > [2016-09-13 19:30:57.475649] W [fuse-bridge.c:1787:fuse_rename_cbk] > 0-glusterfs-fuse: 25: /.gfid/f7eb9d21-d39a-4dd6-941c-46d430e18aa2/File > 2016.xlsx.ocTransferId1333449197.part -> > /.gfid/f7eb9d21-d39a-4dd6-941c-46d430e18aa2/File 2016.xlsx => -1 > (Directory not empty) > > > regards > Aravinda > > > On Wednesday 14 September 2016 12:49 PM, ML mail wrote: >> As requested you will find below the last few 100 lines of the geo-rep log >> file from the master node. >> >> >> FILE: >> /var/log/glusterfs/geo-replication/cloud-pro/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Acloud-pro-geo.log >> >> >> [2016-09-13 19:39:48.686060] I >> [syncdutils(/data/cloud-pro/brick):220:finalize] : exiting. >> [2016-09-13 19:39:48.688112] I [repce(agent):92:service_loop] RepceServer: >> terminating on reaching EOF. >> [2016-09-13 19:39:48.688461] I [syncdutils(agent):220:finalize] : >> exiting. >> [2016-09-13 19:39:49.518510] I [monitor(monitor):343:monitor] Monitor: >> worker(/data/cloud-pro/brick) died in startup phase >> [2016-09-13 19:39:59.675405] I [monitor(monitor):266:monitor] Monitor: >> >> [2016-09-13 19:39:59.675740] I [monitor(monitor):267:monitor] Monitor: >> starting gsyncd worker >> [2016-09-13 19:39:59.768181] I [gsyncd(/data/cloud-pro/brick):710:main_i] >> : syncing: gluster://localhost:cloud-pro -> >> ssh://r...@gfs1geo.domain.tld:gluster://localhost:cloud-pro-geo >> [2016-09-13 19:39:59.768640] I [changelogagent(agent):73:__init__] >> ChangelogAgent: Agent listining... >> [2016-09-13 19:40:02.554076] I >> [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up xsync >> change detection mode >> [2016-09-13 19:40:02.554500] I [master(/data/cloud-pro/brick):367:__init__] >> _GMaster: using 'rsync' as the sync engine >> [2016-09-13 19:40:02.555332] I >> [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up >> changelog change detection mode >> [2016-09-13 19:40:02.555600] I [master(/data/cloud-pro/brick):367:__init__] >> _GMaster: using 'rsync' as the sync engine >> [2016-09-13 19:40:02.556711] I >> [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up >> changeloghistory change detection mode >> [2016-09-13 19:40:02.556983] I [master(/data/cloud-pro/brick):367:__init__] >> _GMaster: using 'rsync' as the sync engine >> [2016-09-13 19:40:04.692655] I [master(/data/cloud-pro/brick):1249:register] >> _GMaster: xsync temp directory: >> /var/lib/misc/glusterfsd/cloud-pro/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Acloud-pro-geo/6b844f56a11ecd24d5e36242f045e58c/xsync >> [2016-09-13 19:40:04.692945] I >> [resource(/data/cloud-pro/brick):1491:service_loop] GLUSTER: Register time: >> 1473795604 >> [2016-09-13 19:40:04.724827] I [master(/data/cloud-pro/brick):510:crawlwrap] >> _GMaster: primary master with volume id d99af2fa-439b-4a21-bf3a-38f3849f87ec >> ... >> [2016-09-13 19:40:04.734505] I [master(/data/cloud-pro/brick):519:crawlwrap] >> _GMaster: crawl interval: 1 seconds >> [2016-09-13 19:40:04.754487] I [master(/data/cloud-pro/brick):1163:crawl] >> _GMaster: starting history crawl... turns: 1, stime: (1473688054, 0), etime: >> 1473795604 >> [2016-09-13 19:40:05.757395] I [master(/data/cloud-pro/brick):1192:crawl] >> _GMaster: slave's time: (1473688054, 0) >> [2016-09-13 19:40:05.
Re: [Gluster-users] RPC program not available (req 1298437 330)
On Friday 16 September 2016, Danny Lee wrote: > Hi, > > Environment: > Gluster Version: 3.8.3 > Operating System: CentOS Linux 7 (Core) > Kernel: Linux 3.10.0-327.28.3.el7.x86_64 > Architecture: x86-64 > Replicated 3-Node Volume > ~400GB of around a million files > > Description of Problem: > One of the brick dies. The only suspect log I see is in the > etc-glusterfs-glusterd.vol.log (shown below). Trying to get an idea of why > the brick died and how it could be prevented in the future. > > During this time, I was forcing replication (find . | xargs stat on the > mount). There were some services starting up as well that was using the > gluster mount. > > [2016-09-13 20:01:50.033369] W [socket.c:590:__socket_rwv] 0-management: > readv on /var/run/gluster/cfc57a83cf9864900aa08380be93.socket failed > (No data available) > [2016-09-13 20:01:50.033830] I [MSGID: 106005] > [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] > 0-management: Brick 172.17.32.28:/usr/local/volname/local-data/mirrored-data > has disconnected from glusterd. > [2016-09-13 20:01:50.121316] W [rpcsvc.c:265:rpcsvc_program_actor] > 0-rpc-service: RPC program not available (req 1298437 330) for > 172.17.32.28:49146 > [2016-09-13 20:01:50.121339] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfully > [2016-09-13 20:01:50.121383] W [rpcsvc.c:265:rpcsvc_program_actor] > 0-rpc-service: RPC program not available (req 1298437 330) for > 172.17.32.28:49146 > [2016-09-13 20:01:50.121392] E [rpcsvc.c:560:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfully > I haven't checked the code yet, but at a guess a brick op (in transit) failed here when the brick went down. The message "I [MSGID: 106005] [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] > 0-management: Brick 172.17.32.28:/usr/local/volname/local-data/mirrored-data > has disconnected from glusterd." repeated 34 times between [2016-09-13 > 20:01:50.033830] and [2016-09-13 20:03:40.010862] > -- --Atin ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] GlusterFS 3.8.4 is available, Gluster users are advised to update
[from http://blog.nixpanic.net/2016/09/glusterfs-384-is-available.html] Packages have been built for many distributions and will become available in the standard repositories during the next few days if they are not there already. Kind regards, Niels Even though the last release 3.8 was just two weeks ago, we're sticking to the release schedule and have 3.8.4 ready for all our current and future users. As with all updates, we advise users of previous versions to upgrade to the latest and greatest. Several bugs have been fixed, and upgrading is one way to prevent hitting known problems in future. Release notes for Gluster 3.8.4 This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2 and 3.8.3 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release. Bugs addressed A total of 23 patches have been merged, addressing 22 bugs: * #1332424: geo-rep: address potential leak of memory * #1357760: Geo-rep silently ignores config parser errors * #1366496: 1 mkdir generates tons of log messages from dht xlator * #1366746: EINVAL errors while aggregating the directory size by quotad * #1368841: Applications not calling glfs_h_poll_upcall() have upcall events cached for no use * #1368918: tests/bugs/cli/bug-1320388.t: Infrequent failures * #1368927: Error: quota context not set inode (gfid:nnn) [Invalid argument] * #1369042: thread CPU saturation limiting throughput on write workloads * #1369187: fix bug in protocol/client lookup callback * #1369328: [RFE] Add a count of snapshots associated with a volume to the output of the vol info command * #1369372: gluster snap status xml output shows incorrect details when the snapshots are in deactivated state * #1369517: rotated FUSE mount log is using to populate the information after log rotate. * #1369748: Memory leak with a replica 3 arbiter 1 configuration * #1370172: protocol/server: readlink rsp xdr failed while readlink got an error * #1370390: Locks xlators is leaking fdctx in pl_release() * #1371194: segment fault while join thread reaper_thr in fini() * #1371650: [Open SSL] : Unable to mount an SSL enabled volume via SMB v3/Ganesha v4 * #1371912: gluster system:: uuid get hangs * #1372728: Node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs. * #1373530: Minor improvements and cleanup for the build system * #1374290: "gluster vol status all clients --xml" doesn't generate xml if there is a failure in between * #1374565: [Bitrot]: Recovery fails of a corrupted hardlink (and the corresponding parent file) in a disperse volume signature.asc Description: PGP signature ___ Announce mailing list annou...@gluster.org http://www.gluster.org/mailman/listinfo/announce ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Production cluster planning
Next year i'll start with our first production cluster. I'll put on that many VMs images (XenServer, ProxMox, ...). Currently I have 3 SuperMicro 6028R-E1CR12T to be used as storage nodes. I'll put 2 more 10GbT cards on each. Primary goal is to have MAXIMUM data redundancy and protection. we can live with not top-notch performances but data protection must be assured all the time in all conditions. Some questions: 1) should I create any raid on each server? If yes, which level and how many? - 4 RAID-5 with 3 disks each ? With this i'll create 4 bricks on each server - 6 RAID-1 with 2 disks each ? With this i'll create 6 bricks on each server (but space wasted is too high) - 2 RAID-6 with 6 disks each? With this i'll create 2 bricks on each server 2) Should I use standard replication (replicate 3) or EC ? 3) Our servers has 2 SSD in the back. Can I use this as tiering ? 4) Our servers has 1 SSD inside (not hotplug) for the OS. What would happens in case of a crash of this SSD ? Is gluster able to recover the whole failed node ? ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.8.3 Bitrot signature process
Hi, Can anyone reply to this mail. On Tue, Sep 13, 2016 at 12:49 PM, Amudhan P wrote: > Hi, > > I am testing bitrot feature in Gluster 3.8.3 with disperse EC volume 4+1. > > When i write single small file (< 10MB) after 2 seconds i can see bitrot > signature in bricks for the file, but when i write multiple files with > different size ( > 10MB) it takes long time (> 24hrs) to see bitrot > signature in all the files. > > My questions are. > 1. I have enabled scrub schedule as hourly and throttle as normal, does > this make any impact in delaying bitrot signature? > 2. other than "bitd.log" where else i can watch current status of bitrot, > like number of files added for signature and file status? > 3. where i can confirm that all the files in the brick are bitrot signed? > 4. is there any file read size limit in bitrot? > 5. options for tuning bitrot for faster signing of files? > > Thanks > Amudhan > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] geo-rep: -1 (Directory not empty) warning - STATUS Faulty
We can check in brick backend. ls -ld $BRICK_ROOT/.glusterfs/f7/eb/f7eb9d21-d39a-4dd6-941c-46d430e18aa2 regards Aravinda On Thursday 15 September 2016 09:12 PM, ML mail wrote: So I ran a on my master a "find /mybrick -name 'File 2016.xlsx'" and got the following two entries: 355235 19 drwxr-xr-x 3 www-data www-data3 Sep 15 12:48 ./data/username/files_encryption/keys/files/Dir/File\ 2016.xlsx 355234 56 -rw-r--r-- 2 www-data www-data44308 Sep 15 12:48 ./data/username/files/Dir/Ein\ und\ Ausgaben\ File\ 2016.xlsx As you can see, there is one file and one directory named 'File 2016.xlsx'. This is actually the web application Nextcloud when it uses encryption: the file is the encrypted file and the directory named like the file contains some encryption keys to encrypt/decrypt that specific file. Now the next thing would be to find out if geo-rep failed on the file or the directory named 'File 2016.xlsx'. Any ideas how I can check that? and what else would you need to debug this issue? Regards ML On Thursday, September 15, 2016 9:12 AM, Aravinda wrote: Thanks for the logs from Master. Below error says directory not empty but names looks like they are files. Please confirm from Master Volume that path is directory or file. [2016-09-13 19:30:57.475649] W [fuse-bridge.c:1787:fuse_rename_cbk] 0-glusterfs-fuse: 25: /.gfid/f7eb9d21-d39a-4dd6-941c-46d430e18aa2/File 2016.xlsx.ocTransferId1333449197.part -> /.gfid/f7eb9d21-d39a-4dd6-941c-46d430e18aa2/File 2016.xlsx => -1 (Directory not empty) regards Aravinda On Wednesday 14 September 2016 12:49 PM, ML mail wrote: As requested you will find below the last few 100 lines of the geo-rep log file from the master node. FILE: /var/log/glusterfs/geo-replication/cloud-pro/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Acloud-pro-geo.log [2016-09-13 19:39:48.686060] I [syncdutils(/data/cloud-pro/brick):220:finalize] : exiting. [2016-09-13 19:39:48.688112] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2016-09-13 19:39:48.688461] I [syncdutils(agent):220:finalize] : exiting. [2016-09-13 19:39:49.518510] I [monitor(monitor):343:monitor] Monitor: worker(/data/cloud-pro/brick) died in startup phase [2016-09-13 19:39:59.675405] I [monitor(monitor):266:monitor] Monitor: [2016-09-13 19:39:59.675740] I [monitor(monitor):267:monitor] Monitor: starting gsyncd worker [2016-09-13 19:39:59.768181] I [gsyncd(/data/cloud-pro/brick):710:main_i] : syncing: gluster://localhost:cloud-pro -> ssh://r...@gfs1geo.domain.tld:gluster://localhost:cloud-pro-geo [2016-09-13 19:39:59.768640] I [changelogagent(agent):73:__init__] ChangelogAgent: Agent listining... [2016-09-13 19:40:02.554076] I [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up xsync change detection mode [2016-09-13 19:40:02.554500] I [master(/data/cloud-pro/brick):367:__init__] _GMaster: using 'rsync' as the sync engine [2016-09-13 19:40:02.555332] I [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up changelog change detection mode [2016-09-13 19:40:02.555600] I [master(/data/cloud-pro/brick):367:__init__] _GMaster: using 'rsync' as the sync engine [2016-09-13 19:40:02.556711] I [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up changeloghistory change detection mode [2016-09-13 19:40:02.556983] I [master(/data/cloud-pro/brick):367:__init__] _GMaster: using 'rsync' as the sync engine [2016-09-13 19:40:04.692655] I [master(/data/cloud-pro/brick):1249:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/cloud-pro/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Acloud-pro-geo/6b844f56a11ecd24d5e36242f045e58c/xsync [2016-09-13 19:40:04.692945] I [resource(/data/cloud-pro/brick):1491:service_loop] GLUSTER: Register time: 1473795604 [2016-09-13 19:40:04.724827] I [master(/data/cloud-pro/brick):510:crawlwrap] _GMaster: primary master with volume id d99af2fa-439b-4a21-bf3a-38f3849f87ec ... [2016-09-13 19:40:04.734505] I [master(/data/cloud-pro/brick):519:crawlwrap] _GMaster: crawl interval: 1 seconds [2016-09-13 19:40:04.754487] I [master(/data/cloud-pro/brick):1163:crawl] _GMaster: starting history crawl... turns: 1, stime: (1473688054, 0), etime: 1473795604 [2016-09-13 19:40:05.757395] I [master(/data/cloud-pro/brick):1192:crawl] _GMaster: slave's time: (1473688054, 0) [2016-09-13 19:40:05.832189] E [repce(/data/cloud-pro/brick):207:__call__] RepceClient: call 28167:140561674450688:1473795605.78 (entry_ops) failed on peer with OSError [2016-09-13 19:40:05.832458] E [syncdutils(/data/cloud-pro/brick):276:log_raise_exception] : FAIL: Traceback (most recent call last): File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main main_i() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 720, in main_i local.service_loop(*[r for r in [rem