Re: [Gluster-users] RPC program not available (req 1298437 330)

2016-09-16 Thread Vijay Bellur
Have you checked the brick logs to see if there's anything unusual there?

Regards,
Vijay

On Thu, Sep 15, 2016 at 5:24 PM, Danny Lee  wrote:
> Hi,
>
> Environment:
> Gluster Version: 3.8.3
> Operating System: CentOS Linux 7 (Core)
> Kernel: Linux 3.10.0-327.28.3.el7.x86_64
> Architecture: x86-64
> Replicated 3-Node Volume
> ~400GB of around a million files
>
> Description of Problem:
> One of the brick dies.  The only suspect log I see is in the
> etc-glusterfs-glusterd.vol.log (shown below).  Trying to get an idea of why
> the brick died and how it could be prevented in the future.
>
> During this time, I was forcing replication (find . | xargs stat on the
> mount).  There were some services starting up as well that was using the
> gluster mount.
>
> [2016-09-13 20:01:50.033369] W [socket.c:590:__socket_rwv] 0-management:
> readv on /var/run/gluster/cfc57a83cf9864900aa08380be93.socket failed (No
> data available)
> [2016-09-13 20:01:50.033830] I [MSGID: 106005]
> [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] 0-management: Brick
> 172.17.32.28:/usr/local/volname/local-data/mirrored-data has disconnected
> from glusterd.
> [2016-09-13 20:01:50.121316] W [rpcsvc.c:265:rpcsvc_program_actor]
> 0-rpc-service: RPC program not available (req 1298437 330) for
> 172.17.32.28:49146
> [2016-09-13 20:01:50.121339] E [rpcsvc.c:560:rpcsvc_check_and_reply_error]
> 0-rpcsvc: rpc actor failed to complete successfully
> [2016-09-13 20:01:50.121383] W [rpcsvc.c:265:rpcsvc_program_actor]
> 0-rpc-service: RPC program not available (req 1298437 330) for
> 172.17.32.28:49146
> [2016-09-13 20:01:50.121392] E [rpcsvc.c:560:rpcsvc_check_and_reply_error]
> 0-rpcsvc: rpc actor failed to complete successfully
> The message "I [MSGID: 106005]
> [glusterd-handler.c:5050:__glusterd_brick_rpc_notify] 0-management: Brick
> 172.17.32.28:/usr/local/volname/local-data/mirrored-data has disconnected
> from glusterd." repeated 34 times between [2016-09-13 20:01:50.033830] and
> [2016-09-13 20:03:40.010862]
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] geo-rep: -1 (Directory not empty) warning - STATUS Faulty

2016-09-16 Thread ML mail
Thanks for the command, this is the result of that ls:

lrwxrwxrwx 1 root root 60 Jul 26 16:12 
/data/cloud-pro/brick/.glusterfs/f7/eb/f7eb9d21-d39a-4dd6-941c-46d430e18aa2 -> 
../../aa/63/aa63d3f7-656e-4e92-a016-a878714e89f8/Booking




On Friday, September 16, 2016 11:58 AM, Aravinda  wrote:
We can check in brick backend.

ls -ld $BRICK_ROOT/.glusterfs/f7/eb/f7eb9d21-d39a-4dd6-941c-46d430e18aa2

regards
Aravinda


On Thursday 15 September 2016 09:12 PM, ML mail wrote:
> So I ran a on my master a "find /mybrick -name 'File 2016.xlsx'" and got the 
> following two entries:
>
> 355235   19 drwxr-xr-x   3 www-data www-data3 Sep 15 12:48 
> ./data/username/files_encryption/keys/files/Dir/File\ 2016.xlsx
> 355234   56 -rw-r--r--   2 www-data www-data44308 Sep 15 12:48 
> ./data/username/files/Dir/File\ 2016.xlsx
>
>
> As you can see, there is one file and one directory named 'File 2016.xlsx'. 
> This is actually the web application Nextcloud when it uses encryption: the 
> file is the encrypted file and the directory named like the file contains 
> some encryption keys to encrypt/decrypt that specific file.
>
> Now the next thing would be to find out if geo-rep failed on the file or the 
> directory named 'File 2016.xlsx'. Any ideas how I can check that? and what 
> else would you need to debug this issue?
>
> Regards
> ML
>
>
>
>
>
> On Thursday, September 15, 2016 9:12 AM, Aravinda  wrote:
> Thanks for the logs from Master. Below error says directory not empty
> but names looks like they are files. Please confirm from Master Volume
> that path is directory or file.
>
> [2016-09-13 19:30:57.475649] W [fuse-bridge.c:1787:fuse_rename_cbk]
> 0-glusterfs-fuse: 25: /.gfid/f7eb9d21-d39a-4dd6-941c-46d430e18aa2/File
> 2016.xlsx.ocTransferId1333449197.part ->
> /.gfid/f7eb9d21-d39a-4dd6-941c-46d430e18aa2/File 2016.xlsx => -1
> (Directory not empty)
>
>
> regards
> Aravinda
>
>
> On Wednesday 14 September 2016 12:49 PM, ML mail wrote:
>> As requested you will find below the last few 100 lines of the geo-rep log 
>> file from the master node.
>>
>>
>> FILE: 
>> /var/log/glusterfs/geo-replication/cloud-pro/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Acloud-pro-geo.log
>> 
>>
>> [2016-09-13 19:39:48.686060] I 
>> [syncdutils(/data/cloud-pro/brick):220:finalize] : exiting.
>> [2016-09-13 19:39:48.688112] I [repce(agent):92:service_loop] RepceServer: 
>> terminating on reaching EOF.
>> [2016-09-13 19:39:48.688461] I [syncdutils(agent):220:finalize] : 
>> exiting.
>> [2016-09-13 19:39:49.518510] I [monitor(monitor):343:monitor] Monitor: 
>> worker(/data/cloud-pro/brick) died in startup phase
>> [2016-09-13 19:39:59.675405] I [monitor(monitor):266:monitor] Monitor: 
>> 
>> [2016-09-13 19:39:59.675740] I [monitor(monitor):267:monitor] Monitor: 
>> starting gsyncd worker
>> [2016-09-13 19:39:59.768181] I [gsyncd(/data/cloud-pro/brick):710:main_i] 
>> : syncing: gluster://localhost:cloud-pro -> 
>> ssh://r...@gfs1geo.domain.tld:gluster://localhost:cloud-pro-geo
>> [2016-09-13 19:39:59.768640] I [changelogagent(agent):73:__init__] 
>> ChangelogAgent: Agent listining...
>> [2016-09-13 19:40:02.554076] I 
>> [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up xsync 
>> change detection mode
>> [2016-09-13 19:40:02.554500] I [master(/data/cloud-pro/brick):367:__init__] 
>> _GMaster: using 'rsync' as the sync engine
>> [2016-09-13 19:40:02.555332] I 
>> [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up 
>> changelog change detection mode
>> [2016-09-13 19:40:02.555600] I [master(/data/cloud-pro/brick):367:__init__] 
>> _GMaster: using 'rsync' as the sync engine
>> [2016-09-13 19:40:02.556711] I 
>> [master(/data/cloud-pro/brick):83:gmaster_builder] : setting up 
>> changeloghistory change detection mode
>> [2016-09-13 19:40:02.556983] I [master(/data/cloud-pro/brick):367:__init__] 
>> _GMaster: using 'rsync' as the sync engine
>> [2016-09-13 19:40:04.692655] I [master(/data/cloud-pro/brick):1249:register] 
>> _GMaster: xsync temp directory: 
>> /var/lib/misc/glusterfsd/cloud-pro/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Acloud-pro-geo/6b844f56a11ecd24d5e36242f045e58c/xsync
>> [2016-09-13 19:40:04.692945] I 
>> [resource(/data/cloud-pro/brick):1491:service_loop] GLUSTER: Register time: 
>> 1473795604
>> [2016-09-13 19:40:04.724827] I [master(/data/cloud-pro/brick):510:crawlwrap] 
>> _GMaster: primary master with volume id d99af2fa-439b-4a21-bf3a-38f3849f87ec 
>> ...
>> [2016-09-13 19:40:04.734505] I [master(/data/cloud-pro/brick):519:crawlwrap] 
>> _GMaster: crawl interval: 1 seconds
>> [2016-09-13 19:40:04.754487] I [master(/data/cloud-pro/brick):1163:crawl] 
>> _GMaster: starting history crawl... turns: 1, stime: (1473688054, 0), etime: 
>> 1473795604
>> [2016-09-13 19:40:05.757395] I [master(/data/cloud-pro/brick):1192:crawl] 
>> _GMaster: slave's time: (1473688054, 0)
>> [2016-09-13 19:40:05.

Re: [Gluster-users] RPC program not available (req 1298437 330)

2016-09-16 Thread Atin Mukherjee
On Friday 16 September 2016, Danny Lee  wrote:

> Hi,
>
> Environment:
> Gluster Version: 3.8.3
> Operating System: CentOS Linux 7 (Core)
> Kernel: Linux 3.10.0-327.28.3.el7.x86_64
> Architecture: x86-64
> Replicated 3-Node Volume
> ~400GB of around a million files
>
> Description of Problem:
> One of the brick dies.  The only suspect log I see is in the
> etc-glusterfs-glusterd.vol.log (shown below).  Trying to get an idea of why
> the brick died and how it could be prevented in the future.
>
> During this time, I was forcing replication (find . | xargs stat on the
> mount).  There were some services starting up as well that was using the
> gluster mount.
>
> [2016-09-13 20:01:50.033369] W [socket.c:590:__socket_rwv] 0-management:
> readv on /var/run/gluster/cfc57a83cf9864900aa08380be93.socket failed
> (No data available)
> [2016-09-13 20:01:50.033830] I [MSGID: 106005] 
> [glusterd-handler.c:5050:__glusterd_brick_rpc_notify]
> 0-management: Brick 172.17.32.28:/usr/local/volname/local-data/mirrored-data
> has disconnected from glusterd.
> [2016-09-13 20:01:50.121316] W [rpcsvc.c:265:rpcsvc_program_actor]
> 0-rpc-service: RPC program not available (req 1298437 330) for
> 172.17.32.28:49146
> [2016-09-13 20:01:50.121339] E [rpcsvc.c:560:rpcsvc_check_and_reply_error]
> 0-rpcsvc: rpc actor failed to complete successfully
> [2016-09-13 20:01:50.121383] W [rpcsvc.c:265:rpcsvc_program_actor]
> 0-rpc-service: RPC program not available (req 1298437 330) for
> 172.17.32.28:49146
> [2016-09-13 20:01:50.121392] E [rpcsvc.c:560:rpcsvc_check_and_reply_error]
> 0-rpcsvc: rpc actor failed to complete successfully
>

I haven't checked the code yet, but at a guess a brick op (in transit)
failed here when the brick went down.

The message "I [MSGID: 106005]
[glusterd-handler.c:5050:__glusterd_brick_rpc_notify]
> 0-management: Brick 172.17.32.28:/usr/local/volname/local-data/mirrored-data
> has disconnected from glusterd." repeated 34 times between [2016-09-13
> 20:01:50.033830] and [2016-09-13 20:03:40.010862]
>


-- 
--Atin
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] GlusterFS 3.8.4 is available, Gluster users are advised to update

2016-09-16 Thread Niels de Vos
[from http://blog.nixpanic.net/2016/09/glusterfs-384-is-available.html]

   Packages have been built for many distributions and will become
available in the standard repositories during the next few days if they
are not there already.

Kind regards,
Niels


   Even though the last release 3.8 was just two weeks ago, we're
sticking to the release schedule and have 3.8.4 ready for all our
current and future users. As with all updates, we advise users of
previous versions to upgrade to the latest and greatest. Several bugs
have been fixed, and upgrading is one way to prevent hitting known
problems in future.

Release notes for Gluster 3.8.4

   This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2
and 3.8.3 contain a listing of all the new features that were added and
bugs fixed in the GlusterFS 3.8 stable release.

Bugs addressed

   A total of 23 patches have been merged, addressing 22 bugs:
 * #1332424: geo-rep: address potential leak of memory
 * #1357760: Geo-rep silently ignores config parser errors
 * #1366496: 1 mkdir generates tons of log messages from dht xlator
 * #1366746: EINVAL errors while aggregating the directory size by quotad
 * #1368841: Applications not calling glfs_h_poll_upcall() have upcall 
events cached for no use
 * #1368918: tests/bugs/cli/bug-1320388.t: Infrequent failures
 * #1368927: Error: quota context not set inode (gfid:nnn) [Invalid 
argument]
 * #1369042: thread CPU saturation limiting throughput on write workloads
 * #1369187: fix bug in protocol/client lookup callback
 * #1369328: [RFE] Add a count of snapshots associated with a volume to the 
output of the vol info command
 * #1369372: gluster snap status xml output shows incorrect details when 
the snapshots are in deactivated state
 * #1369517: rotated FUSE mount log is using to populate the information 
after log rotate.
 * #1369748: Memory leak with a replica 3 arbiter 1 configuration
 * #1370172: protocol/server: readlink rsp xdr failed while readlink got an 
error
 * #1370390: Locks xlators is leaking fdctx in pl_release()
 * #1371194: segment fault while join thread reaper_thr in fini()
 * #1371650: [Open SSL] : Unable to mount an SSL enabled volume via SMB 
v3/Ganesha v4
 * #1371912: gluster system:: uuid get hangs
 * #1372728: Node remains in stopped state in pcs status with 
"/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments 
]" messages in logs.
 * #1373530: Minor improvements and cleanup for the build system
 * #1374290: "gluster vol status all clients --xml" doesn't generate xml if 
there is a failure in between
 * #1374565: [Bitrot]: Recovery fails of a corrupted hardlink (and the 
corresponding parent file) in a disperse volume


signature.asc
Description: PGP signature
___
Announce mailing list
annou...@gluster.org
http://www.gluster.org/mailman/listinfo/announce
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Production cluster planning

2016-09-16 Thread Gandalf Corvotempesta
Next year i'll start with our first production cluster.
I'll put on that many VMs images (XenServer, ProxMox, ...).

Currently I have 3 SuperMicro 6028R-E1CR12T to be used as storage nodes.

I'll put 2 more 10GbT cards on each.

Primary goal is to have MAXIMUM data redundancy and protection.
we can live with not top-notch performances but data protection must
be assured all the time in all conditions.

Some questions:
1) should I create any raid on each server? If yes, which level and how many?
 - 4 RAID-5 with 3 disks each ? With this i'll create 4 bricks on each server
 - 6 RAID-1 with 2 disks each ? With this i'll create 6 bricks on each
server (but space wasted is too high)
 - 2 RAID-6 with 6 disks each? With this i'll create 2 bricks on each server

2) Should I use standard replication (replicate 3) or EC ?

3) Our servers has 2 SSD in the back. Can I use this as tiering ?

4) Our servers has 1 SSD inside (not hotplug) for the OS. What would
happens in case of a crash of this SSD ? Is gluster able to recover
the whole failed node ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.8.3 Bitrot signature process

2016-09-16 Thread Amudhan P
Hi,

Can anyone reply to this mail.

On Tue, Sep 13, 2016 at 12:49 PM, Amudhan P  wrote:

> Hi,
>
> I am testing bitrot feature in Gluster 3.8.3 with disperse EC volume 4+1.
>
> When i write single small file (< 10MB) after 2 seconds i can see bitrot
> signature in bricks for the file, but when i write multiple files with
> different size ( > 10MB) it takes long time (> 24hrs) to see bitrot
> signature in all the files.
>
> My questions are.
> 1. I have enabled scrub schedule as hourly and throttle as normal, does
> this make any impact in delaying bitrot signature?
> 2. other than "bitd.log" where else i can watch current status of bitrot,
> like number of files added for signature and file status?
> 3. where i can confirm that all the files in the brick are bitrot signed?
> 4. is there any file read size limit in bitrot?
> 5. options for tuning bitrot for faster signing of files?
>
> Thanks
> Amudhan
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-rep: -1 (Directory not empty) warning - STATUS Faulty

2016-09-16 Thread Aravinda

We can check in brick backend.

ls -ld $BRICK_ROOT/.glusterfs/f7/eb/f7eb9d21-d39a-4dd6-941c-46d430e18aa2

regards
Aravinda

On Thursday 15 September 2016 09:12 PM, ML mail wrote:

So I ran a on my master a "find /mybrick -name 'File 2016.xlsx'" and got the 
following two entries:

355235   19 drwxr-xr-x   3 www-data www-data3 Sep 15 12:48 
./data/username/files_encryption/keys/files/Dir/File\ 2016.xlsx
355234   56 -rw-r--r--   2 www-data www-data44308 Sep 15 12:48 
./data/username/files/Dir/Ein\ und\ Ausgaben\ File\ 2016.xlsx


As you can see, there is one file and one directory named 'File 2016.xlsx'. 
This is actually the web application Nextcloud when it uses encryption: the 
file is the encrypted file and the directory named like the file contains some 
encryption keys to encrypt/decrypt that specific file.

Now the next thing would be to find out if geo-rep failed on the file or the 
directory named 'File 2016.xlsx'. Any ideas how I can check that? and what else 
would you need to debug this issue?

Regards
ML





On Thursday, September 15, 2016 9:12 AM, Aravinda  wrote:
Thanks for the logs from Master. Below error says directory not empty
but names looks like they are files. Please confirm from Master Volume
that path is directory or file.

[2016-09-13 19:30:57.475649] W [fuse-bridge.c:1787:fuse_rename_cbk]
0-glusterfs-fuse: 25: /.gfid/f7eb9d21-d39a-4dd6-941c-46d430e18aa2/File
2016.xlsx.ocTransferId1333449197.part ->
/.gfid/f7eb9d21-d39a-4dd6-941c-46d430e18aa2/File 2016.xlsx => -1
(Directory not empty)


regards
Aravinda


On Wednesday 14 September 2016 12:49 PM, ML mail wrote:

As requested you will find below the last few 100 lines of the geo-rep log file 
from the master node.


FILE: 
/var/log/glusterfs/geo-replication/cloud-pro/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Acloud-pro-geo.log


[2016-09-13 19:39:48.686060] I [syncdutils(/data/cloud-pro/brick):220:finalize] 
: exiting.
[2016-09-13 19:39:48.688112] I [repce(agent):92:service_loop] RepceServer: 
terminating on reaching EOF.
[2016-09-13 19:39:48.688461] I [syncdutils(agent):220:finalize] : exiting.
[2016-09-13 19:39:49.518510] I [monitor(monitor):343:monitor] Monitor: 
worker(/data/cloud-pro/brick) died in startup phase
[2016-09-13 19:39:59.675405] I [monitor(monitor):266:monitor] Monitor: 

[2016-09-13 19:39:59.675740] I [monitor(monitor):267:monitor] Monitor: starting 
gsyncd worker
[2016-09-13 19:39:59.768181] I [gsyncd(/data/cloud-pro/brick):710:main_i] : 
syncing: gluster://localhost:cloud-pro -> 
ssh://r...@gfs1geo.domain.tld:gluster://localhost:cloud-pro-geo
[2016-09-13 19:39:59.768640] I [changelogagent(agent):73:__init__] 
ChangelogAgent: Agent listining...
[2016-09-13 19:40:02.554076] I [master(/data/cloud-pro/brick):83:gmaster_builder] 
: setting up xsync change detection mode
[2016-09-13 19:40:02.554500] I [master(/data/cloud-pro/brick):367:__init__] 
_GMaster: using 'rsync' as the sync engine
[2016-09-13 19:40:02.555332] I [master(/data/cloud-pro/brick):83:gmaster_builder] 
: setting up changelog change detection mode
[2016-09-13 19:40:02.555600] I [master(/data/cloud-pro/brick):367:__init__] 
_GMaster: using 'rsync' as the sync engine
[2016-09-13 19:40:02.556711] I [master(/data/cloud-pro/brick):83:gmaster_builder] 
: setting up changeloghistory change detection mode
[2016-09-13 19:40:02.556983] I [master(/data/cloud-pro/brick):367:__init__] 
_GMaster: using 'rsync' as the sync engine
[2016-09-13 19:40:04.692655] I [master(/data/cloud-pro/brick):1249:register] 
_GMaster: xsync temp directory: 
/var/lib/misc/glusterfsd/cloud-pro/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Acloud-pro-geo/6b844f56a11ecd24d5e36242f045e58c/xsync
[2016-09-13 19:40:04.692945] I 
[resource(/data/cloud-pro/brick):1491:service_loop] GLUSTER: Register time: 
1473795604
[2016-09-13 19:40:04.724827] I [master(/data/cloud-pro/brick):510:crawlwrap] 
_GMaster: primary master with volume id d99af2fa-439b-4a21-bf3a-38f3849f87ec ...
[2016-09-13 19:40:04.734505] I [master(/data/cloud-pro/brick):519:crawlwrap] 
_GMaster: crawl interval: 1 seconds
[2016-09-13 19:40:04.754487] I [master(/data/cloud-pro/brick):1163:crawl] 
_GMaster: starting history crawl... turns: 1, stime: (1473688054, 0), etime: 
1473795604
[2016-09-13 19:40:05.757395] I [master(/data/cloud-pro/brick):1192:crawl] 
_GMaster: slave's time: (1473688054, 0)
[2016-09-13 19:40:05.832189] E [repce(/data/cloud-pro/brick):207:__call__] 
RepceClient: call 28167:140561674450688:1473795605.78 (entry_ops) failed on 
peer with OSError
[2016-09-13 19:40:05.832458] E 
[syncdutils(/data/cloud-pro/brick):276:log_raise_exception] : FAIL:
Traceback (most recent call last):
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 
201, in main
main_i()
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 
720, in main_i
local.service_loop(*[r for r in [rem