On 19-Jul-2014 11:06 pm, "Niels de Vos" <nde...@redhat.com> wrote: > > On Sat, Jul 19, 2014 at 08:23:29AM +0530, Pranith Kumar Karampuri wrote: > > Guys, > > Does anyone know why device-id can be different even though it > > is all single xfs filesystem? > > We see the following log in the brick-log. > > > > [2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard] > > 0-home-posix: mismatching ino/dev between file > > The device-id (major:minor number) of a block-device can change, but > will not change while the device is in use. Device-mapper (DM) is part > of the stack that includes multipath and lvm (and more, but these are > most common). The stack for the block-devices is built dynamically, and > the device-id is assigned when the block-device is made active. The > ordering of making devices active can change, hence the device-id too. > It is also possible to deactivate some logical-volumes, and activate > them in a different order. (You can not deactivate a dm-device when it > is in use, for example mounted.) > > Without device-mapper in the io-stack, re-ordering disks is possible > too, but requires a little more (advanced sysadmin) work. > > So, the main questions I'd ask would be: > 1. What kind of block storage is used, LVM, multipath, ...?
A single RAID10 XFS partition > 2. Were there any issues on the block-layer, scsi-errors, reconnects? Yes, one of the servers had a bad disk that was replaced > 3. Were there changes in the underlaying disks or their structure? Disks > added, removed or new partitions created. No > 4. Were disks deactivated+activated again, for example for creating > backups or snapshots on a level below the (XFS) filesystem? > No > HTH, > Niels > > > /data/gluster/home/techiebuzz/ techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old > > (1077282838/2431) and handle > > /data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0 > > (1077282836/2431) > > [2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix: > > setting gfid on > > /data/gluster/home/techiebuzz/ techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old > > failed > > > > > > Pranith > > On 07/17/2014 07:06 PM, Nilesh Govindrajan wrote: > > >log1 was the log from client of node2. The filesystems are mounted > > >locally. /data is a raid10 array and /data/gluster contains 4 volumes, > > >one of which is home which is a high read/write one (the log of which > > >was attached here). > > > > > >On Thu, Jul 17, 2014 at 11:54 AM, Pranith Kumar Karampuri > > ><pkara...@redhat.com> wrote: > > >>On 07/17/2014 08:41 AM, Nilesh Govindrajan wrote: > > >>>log1 and log2 are brick logs. The others are client logs. > > >>I see a lot of logs as below in 'log1' you attached. It seems like the > > >>device ID of where the file where it is actually stored, where the gfid-link > > >>of the same file is stored i.e inside <brick-dir>/.glusterfs/ are different. > > >>What all devices/filesystems are present inside the brick represented by > > >>'log1'? > > >> > > >>[2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard] > > >>0-home-posix: mismatching ino/dev between file > > >>/data/gluster/home/techiebuzz/ techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old > > >>(1077282838/2431) and handle > > >>/data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0 > > >>(1077282836/2431) > > >>[2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix: > > >>setting gfid on > > >>/data/gluster/home/techiebuzz/ techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old > > >>failed > > >> > > >>Pranith > > >> > > >> > > >>>On Thu, Jul 17, 2014 at 8:08 AM, Pranith Kumar Karampuri > > >>><pkara...@redhat.com> wrote: > > >>>>On 07/17/2014 07:28 AM, Nilesh Govindrajan wrote: > > >>>>>On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan < m...@nileshgr.com> > > >>>>>wrote: > > >>>>>>Hello, > > >>>>>> > > >>>>>>I'm having a weird issue. I have this config: > > >>>>>> > > >>>>>>node2 ~ # gluster peer status > > >>>>>>Number of Peers: 1 > > >>>>>> > > >>>>>>Hostname: sto1 > > >>>>>>Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5 > > >>>>>>State: Peer in Cluster (Connected) > > >>>>>> > > >>>>>>node1 ~ # gluster peer status > > >>>>>>Number of Peers: 1 > > >>>>>> > > >>>>>>Hostname: sto2 > > >>>>>>Port: 24007 > > >>>>>>Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9 > > >>>>>>State: Peer in Cluster (Connected) > > >>>>>> > > >>>>>>Volume Name: home > > >>>>>>Type: Replicate > > >>>>>>Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7 > > >>>>>>Status: Started > > >>>>>>Number of Bricks: 1 x 2 = 2 > > >>>>>>Transport-type: tcp > > >>>>>>Bricks: > > >>>>>>Brick1: sto1:/data/gluster/home > > >>>>>>Brick2: sto2:/data/gluster/home > > >>>>>>Options Reconfigured: > > >>>>>>performance.write-behind-window-size: 2GB > > >>>>>>performance.flush-behind: on > > >>>>>>performance.cache-size: 2GB > > >>>>>>cluster.choose-local: on > > >>>>>>storage.linux-aio: on > > >>>>>>transport.keepalive: on > > >>>>>>performance.quick-read: on > > >>>>>>performance.io-cache: on > > >>>>>>performance.stat-prefetch: on > > >>>>>>performance.read-ahead: on > > >>>>>>cluster.data-self-heal-algorithm: diff > > >>>>>>nfs.disable: on > > >>>>>> > > >>>>>>sto1/2 is alias to node1/2 respectively. > > >>>>>> > > >>>>>>As you see, NFS is disabled so I'm using the native fuse mount on both > > >>>>>>nodes. > > >>>>>>The volume contains files and php scripts that are served on various > > >>>>>>websites. When both nodes are active, I get split brain on many files > > >>>>>>and the mount on node2 going 'input/output error' on many of them > > >>>>>>which causes HTTP 500 errors. > > >>>>>> > > >>>>>>I delete the files from the brick using find -samefile. It fixes for a > > >>>>>>few minutes and then the problem is back. > > >>>>>> > > >>>>>>What could be the issue? This happens even if I use the NFS mounting > > >>>>>>method. > > >>>>>> > > >>>>>>Gluster 3.4.4 on Gentoo. > > >>>>>And yes, network connectivity is not an issue between them as both of > > >>>>>them are located in the same DC. They're connected via 1 Gbit line > > >>>>>(common for internal and external network) but external network > > >>>>>doesn't cross 200-500 Mbit/s leaving quite a good window for gluster. > > >>>>>I also tried enabling quorum but that doesn't help either. > > >>>>>_______________________________________________ > > >>>>>Gluster-users mailing list > > >>>>>gluster-us...@gluster.org > > >>>>>http://supercolony.gluster.org/mailman/listinfo/gluster-users > > >>>>hi Nilesh, > > >>>> Could you attach the mount, brick logs so that we can inspect what > > >>>>is > > >>>>going on the setup. > > >>>> > > >>>>Pranith > > >> > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel