1. Could you share the output of `gluster volume heal <VOL> info`? 2. `gluster volume info` 3. fuse mount logs of the affected volume(s)? 4. glustershd logs 5. Brick logs
-Krutika On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <dgoss...@carouselchecks.com> wrote: > On Fri, Aug 12, 2016 at 4:25 PM, Dan Lavu <d...@redhat.com> wrote: > >> David, >> >> I'm seeing similar behavior in my lab, but it has been caused by healing >> files in the gluster cluster, though I attribute my problems to problems >> with the storage fabric. See if 'gluster volume heal $VOL info' indicates >> files that are being healed, and if those reduce in number, can the VM >> start? >> >> > I haven't had any files in a state of being healed according to either of > the 3 storage nodes. > > I shut down one VM that has been around awhile a moment ago then told it > to start on the one ovirt server that complained previously. It ran fine, > and I was able to migrate it off and on the host no issues. > > I told one of the new VM's to migrate to the one node and within seconds > it paused from unknown storage errors no shards showing heals nothing with > an error on storage node. Same stale file handle issues. > > I'll probably put this node in maintenance later and reboot it. Other > than that I may re-clone those 2 reccent VM's. maybe images just got > corrupted though why it would only fail on one node of 3 if image was bad > not sure. > > > Dan >> >> On Thu, Aug 11, 2016 at 7:52 AM, David Gossage < >> dgoss...@carouselchecks.com> wrote: >> >>> Figure I would repost here as well. one client out of 3 complaining of >>> stale file handles on a few new VM's I migrated over. No errors on storage >>> nodes just client. Maybe just put that one in maintenance and restart >>> gluster mount? >>> >>> *David Gossage* >>> *Carousel Checks Inc. | System Administrator* >>> *Office* 708.613.2284 >>> >>> ---------- Forwarded message ---------- >>> From: David Gossage <dgoss...@carouselchecks.com> >>> Date: Thu, Aug 11, 2016 at 12:17 AM >>> Subject: vm paused unknown storage error one node out of 3 only >>> To: users <us...@ovirt.org> >>> >>> >>> Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3 >>> replicate gluster 3.7.14 starting a VM i just copied in on one node of the >>> 3 gets the following errors. The other 2 the vm starts fine. All ovirt >>> and gluster are centos 7 based. VM on start of the one node it tries to >>> default to on its own accord immediately puts into paused for unknown >>> reason. Telling it to start on different node starts ok. node with issue >>> already has 5 VMs running fine on it same gluster storage plus the hosted >>> engine on different volume. >>> >>> gluster nodes logs did not have any errors for volume >>> nodes own gluster logs had this in log >>> >>> dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard >>> or images/ >>> >>> 7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive >>> of the vm >>> >>> [2016-08-11 04:31:39.982952] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:31:39.983683] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:31:39.984182] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:31:39.984221] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:31:39.985941] W [MSGID: 108008] >>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable >>> subvolume -1 found with event generation 3 for gfid >>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>> [2016-08-11 04:31:39.986633] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:31:39.987644] E [MSGID: 109040] >>> [dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht: >>> (null): failed to lookup the file on GLUSTER1-dht [Stale file handle] >>> [2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk] >>> 0-glusterfs-fuse: 15152930: READ => -1 >>> gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43 >>> fd=0x7f00a80bdb64 (Stale file handle) >>> [2016-08-11 04:31:39.986567] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:31:39.986567] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:35:21.210145] W [MSGID: 108008] >>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable >>> subvolume -1 found with event generation 3 for gfid >>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>> [2016-08-11 04:35:21.210873] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:35:21.210888] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:35:21.210947] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:35:21.213270] E [MSGID: 109040] >>> [dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht: >>> (null): failed to lookup the file on GLUSTER1-dht [Stale file handle] >>> [2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk] >>> 0-glusterfs-fuse: 15156910: READ => -1 >>> gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43 >>> fd=0x7f00a80bf6d0 (Stale file handle) >>> [2016-08-11 04:35:21.211516] W [MSGID: 108008] >>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable >>> subvolume -1 found with event generation 3 for gfid >>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain) >>> [2016-08-11 04:35:21.212013] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:35:21.212081] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: >>> remote operation failed [No such file or directory] >>> [2016-08-11 04:35:21.212121] W [MSGID: 114031] >>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: >>> remote operation failed [No such file or directory] >>> >>> I attached vdsm.log starting from when I spun up vm on offending node >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users