Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-19 Thread Pranith Kumar Karampuri


On 07/19/2014 11:25 AM, Andrew Lau wrote:



On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri 
pkara...@redhat.com mailto:pkara...@redhat.com wrote:



On 07/18/2014 05:43 PM, Andrew Lau wrote:

​ ​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
vbel...@redhat.com mailto:vbel...@redhat.com wrote:

[Adding gluster-devel]


On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages,
hosted engine
won't work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is
currently something
we really warn against.


I think this bug should be closed or re-targeted at
documentation, because there is nothing we can do here.
Hosted engine assumes that all writes are atomic and
(immediately) available for all hosts in the cluster.
Gluster violates those assumptions.
​

I tried going through BZ1097639 but could not find much
detail with respect to gluster there.

A few questions around the problem:

1. Can somebody please explain in detail the scenario that
causes the problem?

2. Is hosted engine performing synchronous writes to ensure
that writes are durable?

Also, if there is any documentation that details the hosted
engine architecture that would help in enhancing our
understanding of its interactions with gluster.


​

Now my question, does this theory prevent a scenario of
perhaps
something like a gluster replicated volume being mounted
as a glusterfs
filesystem and then re-exported as the native kernel NFS
share for the
hosted-engine to consume? It could then be possible to
chuck ctdb in
there to provide a last resort failover solution. I have
tried myself
and suggested it to two people who are running a similar
setup. Now
using the native kernel NFS server for hosted-engine and
they haven't
reported as many issues. Curious, could anyone validate
my theory on this?


If we obtain more details on the use case and obtain gluster
logs from the failed scenarios, we should be able to
understand the problem better. That could be the first step
in validating your theory or evolving further recommendations :).


​ I'm not sure how useful this is, but ​Jiri Moskovcak tracked
this down in an off list message.

​ Message Quote:​

​ ==​

​We were able to track it down to this (thanks Andrew for
providing the testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 165, in handle
  response = success  + self._dispatch(data)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 261, in _dispatch
  .get_all_stats_for_service_type(**options)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 41, in get_all_stats_for_service_type
  d = self.get_raw_stats_for_service_type(storage_dir, service_type)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 74, in get_raw_stats_for_service_type
  f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle:

'/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'

Andrew/Jiri,
Would it be possible to post gluster logs of both the
mount and bricks on the bz? I can take a look at it once. If I
gather nothing then probably I will ask for your help in
re-creating the issue.

Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll 
try replicate when I get a chance. If I understand the comment from 
the BZ, I don't think it's a gluster bug per-say, more just how 
gluster does its replication.

hi Andrew,
 Thanks for that. I couldn't come to any conclusions because no 
logs were available. It is unlikely that self-heal is involved because 
there were no bricks going down/up according to the bug description.


Pranith





It's definitely connected to the storage which leads us to the
gluster, I'm not very familiar with the gluster so I need to
check this with our gluster gurus.​

​== ​

Thanks,
Vijay




___
Gluster-devel mailing list
Gluster-devel@gluster.org  mailto:Gluster-devel@gluster.org

Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-19 Thread Andrew Lau
On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri 
pkara...@redhat.com wrote:


 On 07/18/2014 05:43 PM, Andrew Lau wrote:

  ​ ​

  On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com
 wrote:

 [Adding gluster-devel]


 On 07/18/2014 05:20 PM, Andrew Lau wrote:

 Hi all,

 As most of you have got hints from previous messages, hosted engine
 won't work on gluster . A quote from BZ1097639

 Using hosted engine with Gluster backed storage is currently something
 we really warn against.


 I think this bug should be closed or re-targeted at documentation,
 because there is nothing we can do here. Hosted engine assumes that all
 writes are atomic and (immediately) available for all hosts in the cluster.
 Gluster violates those assumptions.
 ​

  I tried going through BZ1097639 but could not find much detail with
 respect to gluster there.

 A few questions around the problem:

 1. Can somebody please explain in detail the scenario that causes the
 problem?

 2. Is hosted engine performing synchronous writes to ensure that writes
 are durable?

 Also, if there is any documentation that details the hosted engine
 architecture that would help in enhancing our understanding of its
 interactions with gluster.


 ​

 Now my question, does this theory prevent a scenario of perhaps
 something like a gluster replicated volume being mounted as a glusterfs
 filesystem and then re-exported as the native kernel NFS share for the
 hosted-engine to consume? It could then be possible to chuck ctdb in
 there to provide a last resort failover solution. I have tried myself
 and suggested it to two people who are running a similar setup. Now
 using the native kernel NFS server for hosted-engine and they haven't
 reported as many issues. Curious, could anyone validate my theory on
 this?


  If we obtain more details on the use case and obtain gluster logs from
 the failed scenarios, we should be able to understand the problem better.
 That could be the first step in validating your theory or evolving further
 recommendations :).


  ​I'm not sure how useful this is, but ​Jiri Moskovcak tracked this down
 in an off list message.

  ​Message Quote:​

  ​==​

   ​We were able to track it down to this (thanks Andrew for providing the
 testing setup):

 -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
 Traceback (most recent call last):
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
 line 165, in handle
 response = success  + self._dispatch(data)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
 line 261, in _dispatch
 .get_all_stats_for_service_type(**options)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 41, in get_all_stats_for_service_type
 d = self.get_raw_stats_for_service_type(storage_dir, service_type)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 74, in get_raw_stats_for_service_type
 f = os.open(path, direct_flag | os.O_RDONLY)
 OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localho
 st:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted
 -engine.metadata'

 Andrew/Jiri,
 Would it be possible to post gluster logs of both the mount and
 bricks on the bz? I can take a look at it once. If I gather nothing then
 probably I will ask for your help in re-creating the issue.

 Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll try
replicate when I get a chance. If I understand the comment from the BZ, I
don't think it's a gluster bug per-say, more just how gluster does its
replication.





 It's definitely connected to the storage which leads us to the gluster,
 I'm not very familiar with the gluster so I need to check this with our
 gluster gurus.​

  ​==​



 Thanks,
 Vijay




 ___
 Gluster-devel mailing 
 listGluster-devel@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume

2014-07-19 Thread Benjamin Turner
On Fri, Jul 18, 2014 at 10:43 PM, Pranith Kumar Karampuri 
pkara...@redhat.com wrote:


 On 07/18/2014 07:57 PM, Anders Blomdell wrote:

 During testing of a 3*4 gluster (from master as of yesterday), I
 encountered
 two major weirdnesses:

1. A 'rm -rf some_dir' needed several invocations to finish, each
 time
   reporting a number of lines like these:
 rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty


This is reproducible for me when running dbench on nfs mounts.  I think I
may have seen it on glusterfs mounts as well but it seems more reproducible
on nfs.  I should have caught it sooner but it doesn't error out client
side when cleaning up, and the next test I run the deletes are successful.
 When this happens in the nfs.log I see:

This spams the log, from what I can tell it happens when dbench is creating
the files:
[2014-07-19 13:37:03.271651] I [MSGID: 109036]
[dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht:
Setting layout of /clients/client3/~dmtmp/SEED with [Subvol_name:
testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ],
[Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ],

Then when the deletes fail I see the following when the client is removing
the files:
[2014-07-18 23:31:44.272465] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs:
74a6541a: /run8063_dbench/clients = -1 (Directory not empty)
.
.
[2014-07-18 23:31:44.452988] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs:
7ea9541a: /run8063_dbench/clients = -1 (Directory not empty)
[2014-07-18 23:31:45.262651] W
[client-rpc-fops.c:1354:client3_3_access_cbk] 0-testvol-client-0: remote
operation failed: Stale file handle
[2014-07-18 23:31:45.263151] W [MSGID: 108008]
[afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable
subvolume -1 found with e
vent generation 2. (Possible split-brain)
[2014-07-18 23:31:45.264196] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs:
32ac541a: gfid:b073a189-91ea-46b2-b757-5b320591b848 = -1 (Stale fi
le handle)
[2014-07-18 23:31:45.264217] W [nfs3-helpers.c:3401:nfs3_log_common_res]
0-nfs-nfsv3: XID: 32ac541a, ACCESS: NFS: 70(Invalid file handle), P
OSIX: 116(Stale file handle)
[2014-07-18 23:31:45.266818] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs:
33ac541a: gfid:b073a189-91ea-46b2-b757-5b320591b848 = -1 (Stale fi
le handle)
[2014-07-18 23:31:45.266853] W [nfs3-helpers.c:3401:nfs3_log_common_res]
0-nfs-nfsv3: XID: 33ac541a, ACCESS: NFS: 70(Invalid file handle), P
OSIX: 116(Stale file handle)

Occasionally I see:
[2014-07-19 13:50:46.091429] W [socket.c:529:__socket_rwv] 0-NLM-client:
readv on 192.168.11.102:45823 failed (No data available)
[2014-07-19 13:50:46.091570] E [rpc-transport.c:485:rpc_transport_unref]
(--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a)
[0x7f53775128ea]
(--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75)
[0x7f537750e3e5] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63)
[0x7f5388914693]))) 0-rpc_transport: invalid argument: this

I'm opening a BZ now, I'll leave systems up and put the repro steps +
hostnames in the BZ in case anyone wants to poke around.

-b




2. After having successfully deleted all files from the volume,
   i have a single directory that is duplicated in gluster-fuse,
   like this:
 # ls -l /mnt/gluster
  total 24
  drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/
  drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/

 any idea on how to debug this issue?

 What are the steps to recreate? We need to first find what lead to this.
 Then probably which xlator leads to this.


I have not seen this but I am running on a 6x2 volume.  I wonder if this
may only happen with replica  2?



 Pranith


 /Anders



 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Random and frequent split brain

2014-07-19 Thread Niels de Vos
On Sat, Jul 19, 2014 at 08:23:29AM +0530, Pranith Kumar Karampuri wrote:
 Guys,
  Does anyone know why device-id can be different even though it
 is all single xfs filesystem?
 We see the following log in the brick-log.
 
 [2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard]
 0-home-posix: mismatching ino/dev between file

The device-id (major:minor number) of a block-device can change, but 
will not change while the device is in use. Device-mapper (DM) is part 
of the stack that includes multipath and lvm (and more, but these are 
most common). The stack for the block-devices is built dynamically, and 
the device-id is assigned when the block-device is made active. The 
ordering of making devices active can change, hence the device-id too.  
It is also possible to deactivate some logical-volumes, and activate 
them in a different order. (You can not deactivate a dm-device when it 
is in use, for example mounted.)

Without device-mapper in the io-stack, re-ordering disks is possible 
too, but requires a little more (advanced sysadmin) work.

So, the main questions I'd ask would be:
1. What kind of block storage is used, LVM, multipath, ...?
2. Were there any issues on the block-layer, scsi-errors, reconnects?
3. Were there changes in the underlaying disks or their structure? Disks 
   added, removed or new partitions created.
4. Were disks deactivated+activated again, for example for creating 
   backups or snapshots on a level below the (XFS) filesystem?

HTH,
Niels

 /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
 (1077282838/2431) and handle
 /data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0
 (1077282836/2431)
 [2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix:
 setting gfid on
 /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
 failed
 
 
 Pranith
 On 07/17/2014 07:06 PM, Nilesh Govindrajan wrote:
 log1 was the log from client of node2. The filesystems are mounted
 locally. /data is a raid10 array and /data/gluster contains 4 volumes,
 one of which is home which is a high read/write one (the log of which
 was attached here).
 
 On Thu, Jul 17, 2014 at 11:54 AM, Pranith Kumar Karampuri
 pkara...@redhat.com wrote:
 On 07/17/2014 08:41 AM, Nilesh Govindrajan wrote:
 log1 and log2 are brick logs. The others are client logs.
 I see a lot of logs as below in 'log1' you attached. It seems like the
 device ID of where the file where it is actually stored, where the gfid-link
 of the same file is stored i.e inside brick-dir/.glusterfs/ are different.
 What all devices/filesystems are present inside the brick represented by
 'log1'?
 
 [2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard]
 0-home-posix: mismatching ino/dev between file
 /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
 (1077282838/2431) and handle
 /data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0
 (1077282836/2431)
 [2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix:
 setting gfid on
 /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
 failed
 
 Pranith
 
 
 On Thu, Jul 17, 2014 at 8:08 AM, Pranith Kumar Karampuri
 pkara...@redhat.com wrote:
 On 07/17/2014 07:28 AM, Nilesh Govindrajan wrote:
 On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan m...@nileshgr.com
 wrote:
 Hello,
 
 I'm having a weird issue. I have this config:
 
 node2 ~ # gluster peer status
 Number of Peers: 1
 
 Hostname: sto1
 Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5
 State: Peer in Cluster (Connected)
 
 node1 ~ # gluster peer status
 Number of Peers: 1
 
 Hostname: sto2
 Port: 24007
 Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9
 State: Peer in Cluster (Connected)
 
 Volume Name: home
 Type: Replicate
 Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: sto1:/data/gluster/home
 Brick2: sto2:/data/gluster/home
 Options Reconfigured:
 performance.write-behind-window-size: 2GB
 performance.flush-behind: on
 performance.cache-size: 2GB
 cluster.choose-local: on
 storage.linux-aio: on
 transport.keepalive: on
 performance.quick-read: on
 performance.io-cache: on
 performance.stat-prefetch: on
 performance.read-ahead: on
 cluster.data-self-heal-algorithm: diff
 nfs.disable: on
 
 sto1/2 is alias to node1/2 respectively.
 
 As you see, NFS is disabled so I'm using the native fuse mount on both
 nodes.
 The volume contains files and php scripts that 

Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume

2014-07-19 Thread Niels de Vos
On Sat, Jul 19, 2014 at 10:02:33AM -0400, Benjamin Turner wrote:
 On Fri, Jul 18, 2014 at 10:43 PM, Pranith Kumar Karampuri 
 pkara...@redhat.com wrote:
 
 
  On 07/18/2014 07:57 PM, Anders Blomdell wrote:
 
  During testing of a 3*4 gluster (from master as of yesterday), I
  encountered
  two major weirdnesses:
 
 1. A 'rm -rf some_dir' needed several invocations to finish, each
  time
reporting a number of lines like these:
  rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty
 
 
 This is reproducible for me when running dbench on nfs mounts.  I think I
 may have seen it on glusterfs mounts as well but it seems more reproducible
 on nfs.  I should have caught it sooner but it doesn't error out client
 side when cleaning up, and the next test I run the deletes are successful.
  When this happens in the nfs.log I see:
 
 This spams the log, from what I can tell it happens when dbench is creating
 the files:
 [2014-07-19 13:37:03.271651] I [MSGID: 109036]
 [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht:
 Setting layout of /clients/client3/~dmtmp/SEED with [Subvol_name:
 testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ],
 [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ],

My guess is that DHT/AFR fail to create the whole directory structure on 
all bricks (remember that directories should get created on all bricks, 
even for a dht only volume). If creating a directory fails on 
a particular brick, self-heal should pick it up... But well, maybe 
self-heal is not run when deleting directories, causing some directories 
on some bricks to be non-empty, but empty on others. It may be that this 
conflict is not handled correctly.

You could maybe test with different volumes, and narrow down where the 
issue occurs:
- a volume of one brick
- a replicate volume with two bricks
- a distribute volume with two bricks

Potentially increase the number of bricks when a 2-brick afr-only, or 
dht-only volume does not trigger the issue reliably or quickly.

 Occasionally I see:
 [2014-07-19 13:50:46.091429] W [socket.c:529:__socket_rwv] 0-NLM-client:
 readv on 192.168.11.102:45823 failed (No data available)
 [2014-07-19 13:50:46.091570] E [rpc-transport.c:485:rpc_transport_unref]
 (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a)
 [0x7f53775128ea]
 (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75)
 [0x7f537750e3e5] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63)
 [0x7f5388914693]))) 0-rpc_transport: invalid argument: this

This looks like a bug in the NFS-server, I suggest to file is 
independently from the directory tree create/delete issue.

 I'm opening a BZ now, I'll leave systems up and put the repro steps +
 hostnames in the BZ in case anyone wants to poke around.

Thanks! The NFS problem does not need any checking on the running 
system.

Niels

 
 -b
 
 
 
 
 2. After having successfully deleted all files from the volume,
i have a single directory that is duplicated in gluster-fuse,
like this:
  # ls -l /mnt/gluster
   total 24
   drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/
   drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/
 
  any idea on how to debug this issue?
 
  What are the steps to recreate? We need to first find what lead to this.
  Then probably which xlator leads to this.
 
 
 I have not seen this but I am running on a 6x2 volume.  I wonder if this
 may only happen with replica  2?
 
 
 
  Pranith
 
 
  /Anders
 
 
 
  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://supercolony.gluster.org/mailman/listinfo/gluster-devel
 

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel