Thanks for the update! On Fri, 13 Mar, 2020, 9:40 PM Pat Haley, <[email protected]> wrote:
> > Hi All, > > After performing Strahil's checks and poking around some more, we found > that the problem was with the underlying filesystem thinking it was full > when it wasn't. Following the information in the links below, we found > that mounting with 64bit inodes fixed this problem. > > > https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available > > https://support.microfocus.com/kb/doc.php?id=7014318 > > Thanks > > Pat > > > On 3/12/20 4:24 PM, Strahil Nikolov wrote: > > On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <[email protected]> > wrote: > >> Hi > >> > >> Yesterday we seemed to clear an issue with erroneous "No space left on > >> device" messages > >> ( > https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html) > >> > >> I am now seeing "Stale file handle" messages coming from directories > >> I've just created. > >> > >> We are running gluster 3.7.11 in a distributed volume across 2 servers > >> (2 bricks each). For the "Stale file handle" for a newly created > >> directory, I've noticed that the directory does not appear in brick1 > >> (it > >> is in the other 3 bricks). > >> > >> In the cli.log on the server with brick1 I'm seeing messages like > >> > >> -------------------------------------------------------- > >> [2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running > >> gluster with version 3.7.11 > >> [2020-03-12 17:21:36.604587] I > >> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not > >> > >> installed > >> [2020-03-12 17:21:36.605100] I [MSGID: 101190] > >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > >> > >> with index 1 > >> [2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler] > >> 0-transport: disconnecting now > >> [2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with: > >> 0 > >> -------------------------------------------------------- > >> > >> I'm not sure why I would be getting any geo-replication messages, we > >> aren't using replication. The cli.log on the other server is showing > >> > >> -------------------------------------------------------- > >> [2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running > >> gluster with version 3.7.11 > >> [2020-03-12 17:27:08.302564] I [MSGID: 101190] > >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > >> > >> with index 1 > >> [2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler] > >> 0-transport: disconnecting now > >> [2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with: > >> 0 > >> -------------------------------------------------------- > >> > >> > >> On the server with brick1, the etc-glusterfs-glusterd.vol.log is > >> showing > >> > >> -------------------------------------------------------- > >> [2020-03-12 17:21:25.925394] I [MSGID: 106499] > >> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management: > >> > >> Received status volume req for volume data-volume > >> [2020-03-12 17:21:25.946240] W [MSGID: 106217] > >> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed > >> uuid to hostname conversion > >> [2020-03-12 17:21:25.946282] W [MSGID: 106387] > >> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx > >> modification failed > >> [2020-03-12 17:21:36.617090] I [MSGID: 106487] > >> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends] > >> 0-glusterd: > >> Received cli list req > >> [2020-03-12 17:21:15.577829] I [MSGID: 106488] > >> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: > >> Received get vol req > >> -------------------------------------------------------- > >> > >> On the other server I'm seeing similar messages > >> > >> -------------------------------------------------------- > >> [2020-03-12 17:26:57.024168] I [MSGID: 106499] > >> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management: > >> > >> Received status volume req for volume data-volume > >> [2020-03-12 17:26:57.037269] W [MSGID: 106217] > >> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed > >> uuid to hostname conversion > >> [2020-03-12 17:26:57.037299] W [MSGID: 106387] > >> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx > >> modification failed > >> [2020-03-12 17:26:42.025200] I [MSGID: 106488] > >> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: > >> Received get vol req > >> [2020-03-12 17:27:08.304267] I [MSGID: 106487] > >> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends] > >> 0-glusterd: > >> Received cli list req > >> -------------------------------------------------------- > >> > >> And I've just noticed that I'm again seeing "No space left on device" > >> in > >> the logs of brick1 (although there is 3.5 TB free) > >> > >> -------------------------------------------------------- > >> [2020-03-12 17:19:54.576597] E [MSGID: 113027] > >> [posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of > >> /mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001 > >> failed [No space left on device] > >> [2020-03-12 17:19:54.576681] E [MSGID: 115056] > >> [server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698: > >> > >> MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001 > >> (96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space > >> left > >> on device) [No space left on device] > >> -------------------------------------------------------- > >> > >> Any thoughts would be greatly appreciated. (Some additional > >> information > >> below) > >> > >> Thanks > >> > >> Pat > >> > >> -------------------------------------------------------- > >> server 1: > >> [root@mseas-data2 ~]# df -h > >> Filesystem Size Used Avail Use% Mounted on > >> /dev/sdb 164T 161T 3.5T 98% /mnt/brick2 > >> /dev/sda 164T 159T 5.4T 97% /mnt/brick1 > >> > >> [root@mseas-data2 ~]# df -i > >> Filesystem Inodes IUsed IFree IUse% Mounted on > >> /dev/sdb 7031960320 31213790 7000746530 1% /mnt/brick2 > >> /dev/sda 7031960320 28707456 7003252864 1% /mnt/brick1 > >> -------------------------------------------------------- > >> > >> -------------------------------------------------------- > >> server 2: > >> [root@mseas-data3 ~]# df -h > >> Filesystem Size Used Avail Use% Mounted on > >> /dev/sda 91T 88T 3.9T 96% /export/sda/brick3 > >> /dev/mapper/vg_Data4-lv_Data4 > >> 91T 89T 2.6T 98% /export/sdc/brick4 > >> > >> [root@mseas-data3 glusterfs]# df -i > >> Filesystem Inodes IUsed IFree IUse% Mounted on > >> /dev/sda 1953182464 10039172 1943143292 1% > >> /export/sda/brick3 > >> /dev/mapper/vg_Data4-lv_Data4 > >> 3906272768 11917222 3894355546 1% > >> /export/sdc/brick4 > >> -------------------------------------------------------- > >> > >> -------------------------------------------------------- > >> [root@mseas-data2 ~]# gluster volume info > >> -------------------------------------------------------- > >> Volume Name: data-volume > >> Type: Distribute > >> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 > >> Status: Started > >> Number of Bricks: 4 > >> Transport-type: tcp > >> Bricks: > >> Brick1: mseas-data2:/mnt/brick1 > >> Brick2: mseas-data2:/mnt/brick2 > >> Brick3: mseas-data3:/export/sda/brick3 > >> Brick4: mseas-data3:/export/sdc/brick4 > >> Options Reconfigured: > >> cluster.min-free-disk: 1% > >> nfs.export-volumes: off > >> nfs.disable: on > >> performance.readdir-ahead: on > >> diagnostics.brick-sys-log-level: WARNING > >> nfs.exports-auth-enable: on > >> server.allow-insecure: on > >> auth.allow: * > >> disperse.eager-lock: off > >> performance.open-behind: off > >> performance.md-cache-timeout: 60 > >> network.inode-lru-limit: 50000 > >> diagnostics.client-log-level: ERROR > >> > >> -------------------------------------------------------- > >> [root@mseas-data2 ~]# gluster volume status data-volume detail > >> -------------------------------------------------------- > >> Status of volume: data-volume > >> > ------------------------------------------------------------------------------ > >> Brick : Brick mseas-data2:/mnt/brick1 > >> TCP Port : 49154 > >> RDMA Port : 0 > >> Online : Y > >> Pid : 4601 > >> File System : xfs > >> Device : /dev/sda > >> Mount Options : rw > >> Inode Size : 256 > >> Disk Space Free : 5.4TB > >> Total Disk Space : 163.7TB > >> Inode Count : 7031960320 > >> Free Inodes : 7003252864 > >> > ------------------------------------------------------------------------------ > >> Brick : Brick mseas-data2:/mnt/brick2 > >> TCP Port : 49155 > >> RDMA Port : 0 > >> Online : Y > >> Pid : 7949 > >> File System : xfs > >> Device : /dev/sdb > >> Mount Options : rw > >> Inode Size : 256 > >> Disk Space Free : 3.4TB > >> Total Disk Space : 163.7TB > >> Inode Count : 7031960320 > >> Free Inodes : 7000746530 > >> > ------------------------------------------------------------------------------ > >> Brick : Brick mseas-data3:/export/sda/brick3 > >> TCP Port : 49153 > >> RDMA Port : 0 > >> Online : Y > >> Pid : 4650 > >> File System : xfs > >> Device : /dev/sda > >> Mount Options : rw > >> Inode Size : 512 > >> Disk Space Free : 3.9TB > >> Total Disk Space : 91.0TB > >> Inode Count : 1953182464 > >> Free Inodes : 1943143292 > >> > ------------------------------------------------------------------------------ > >> Brick : Brick mseas-data3:/export/sdc/brick4 > >> TCP Port : 49154 > >> RDMA Port : 0 > >> Online : Y > >> Pid : 23772 > >> File System : xfs > >> Device : /dev/mapper/vg_Data4-lv_Data4 > >> Mount Options : rw > >> Inode Size : 256 > >> Disk Space Free : 2.6TB > >> Total Disk Space : 90.9TB > >> Inode Count : 3906272768 > >> Free Inodes : 3894355546 > >> > >> -- > >> > >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >> Pat Haley Email: [email protected] > >> Center for Ocean Engineering Phone: (617) 253-6824 > >> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >> 77 Massachusetts Avenue > >> Cambridge, MA 02139-4301 > >> > >> ________ > >> > >> > >> > >> Community Meeting Calendar: > >> > >> Schedule - > >> Every Tuesday at 14:30 IST / 09:00 UTC > >> Bridge: https://bluejeans.com/441850968 > >> > >> Gluster-users mailing list > >> [email protected] > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > Hey Pat, > > > > The logs are not providing much information , but the following > seems strange: > > 'Failed uuid to hostname conversion' > > > > Have you checked dns resolution (both short name and fqdn)? > > Also, check the systems' ntp/chrony is in sync and the 'gluster peer > status' on all nodes. > > > > Is it possible that the client is not reaching all bricks ? > > > > > > P.S.: Consider increasing the log level, as current level is not > sufficient. > > > > Best Regards, > > Strahil Nikolov > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: [email protected] > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users >
________ Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
