Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Jason Brooks


- Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: users users@ovirt.org
 Sent: Friday, July 18, 2014 4:50:31 AM
 Subject: [ovirt-users] Can we debug some truths/myths/facts about 
 hosted-engine and gluster?
 
 Hi all,
 
 As most of you have got hints from previous messages, hosted engine won't
 work on gluster . A quote from BZ1097639
 
 Using hosted engine with Gluster backed storage is currently something we
 really warn against.

My current setup is hosted engine, configured w/ gluster storage as described 
in my 
blog post, but with three hosts and replica 3 volumes. 

Only issue I've seen is an errant message about the Hosted Engine being down 
following an engine migration. The engine does migrate successfully, though.

RE your bug, what do you use for a mount point for the nfs storage?

Jason


 
 
 I think this bug should be closed or re-targeted at documentation,
 because there is nothing we can do here. Hosted engine assumes that
 all writes are atomic and (immediately) available for all hosts in the
 cluster. Gluster violates those assumptions.
 
 ​
 
 ​Until the documentation gets updated, I hope this serves as a useful
 notice at least to save people some of the headaches I hit like
 hosted-engine starting up multiple VMs because of above issue.
 ​
 
 Now my question, does this theory prevent a scenario of perhaps something
 like a gluster replicated volume being mounted as a glusterfs filesystem
 and then re-exported as the native kernel NFS share for the hosted-engine
 to consume? It could then be possible to chuck ctdb in there to provide a
 last resort failover solution. I have tried myself and suggested it to two
 people who are running a similar setup. Now using the native kernel NFS
 server for hosted-engine and they haven't reported as many issues. Curious,
 could anyone validate my theory on this?
 
 Thanks,
 Andrew
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Jason Brooks


- Original Message -
 From: Jason Brooks jbro...@redhat.com
 To: Andrew Lau and...@andrewklau.com
 Cc: users users@ovirt.org
 Sent: Tuesday, July 22, 2014 8:29:46 AM
 Subject: Re: [ovirt-users] Can we debug some truths/myths/facts   about   
 hosted-engine and gluster?
 
 
 
 - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: users users@ovirt.org
  Sent: Friday, July 18, 2014 4:50:31 AM
  Subject: [ovirt-users] Can we debug some truths/myths/facts about
  hosted-engine and gluster?
  
  Hi all,
  
  As most of you have got hints from previous messages, hosted engine won't
  work on gluster . A quote from BZ1097639
  
  Using hosted engine with Gluster backed storage is currently something we
  really warn against.
 
 My current setup is hosted engine, configured w/ gluster storage as described
 in my
 blog post, but with three hosts and replica 3 volumes.
 
 Only issue I've seen is an errant message about the Hosted Engine being down
 following an engine migration. The engine does migrate successfully, though.
 
 RE your bug, what do you use for a mount point for the nfs storage?

In the log you attached to your bug, it looks like you're using localhost as
the nfs mount point. I use a dns name that resolves to the virtual IP hosted
by ctdb. So, you're only ever talking to one nfs server at a time, and failover
between the nfs hosts is handled by ctdb.

Anyway, like I said, my main testing rig is now using this configuration, 
help me try and break it. :)

 
 Jason
 
 
  
  
  I think this bug should be closed or re-targeted at documentation,
  because there is nothing we can do here. Hosted engine assumes that
  all writes are atomic and (immediately) available for all hosts in the
  cluster. Gluster violates those assumptions.
  
  ​
  
  ​Until the documentation gets updated, I hope this serves as a useful
  notice at least to save people some of the headaches I hit like
  hosted-engine starting up multiple VMs because of above issue.
  ​
  
  Now my question, does this theory prevent a scenario of perhaps something
  like a gluster replicated volume being mounted as a glusterfs filesystem
  and then re-exported as the native kernel NFS share for the hosted-engine
  to consume? It could then be possible to chuck ctdb in there to provide a
  last resort failover solution. I have tried myself and suggested it to two
  people who are running a similar setup. Now using the native kernel NFS
  server for hosted-engine and they haven't reported as many issues. Curious,
  could anyone validate my theory on this?
  
  Thanks,
  Andrew
  
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
  
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Andrew Lau
On 23/07/2014 1:45 am, Jason Brooks jbro...@redhat.com wrote:



 - Original Message -
  From: Jason Brooks jbro...@redhat.com
  To: Andrew Lau and...@andrewklau.com
  Cc: users users@ovirt.org
  Sent: Tuesday, July 22, 2014 8:29:46 AM
  Subject: Re: [ovirt-users] Can we debug some truths/myths/facts
about   hosted-engine and gluster?
 
 
 
  - Original Message -
   From: Andrew Lau and...@andrewklau.com
   To: users users@ovirt.org
   Sent: Friday, July 18, 2014 4:50:31 AM
   Subject: [ovirt-users] Can we debug some truths/myths/facts about
   hosted-engine and gluster?
  
   Hi all,
  
   As most of you have got hints from previous messages, hosted engine
won't
   work on gluster . A quote from BZ1097639
  
   Using hosted engine with Gluster backed storage is currently
something we
   really warn against.
 
  My current setup is hosted engine, configured w/ gluster storage as
described
  in my
  blog post, but with three hosts and replica 3 volumes.
 
  Only issue I've seen is an errant message about the Hosted Engine being
down
  following an engine migration. The engine does migrate successfully,
though.
​​That was fixed in 3.4.3 I believe, although when it happened to me my
engine didn't migrate it ju​st sat there.


 
  RE your bug, what do you use for a mount point for the nfs storage?

 In the log you attached to your bug, it looks like you're using localhost
as
 the nfs mount point. I use a dns name that resolves to the virtual IP
hosted
 by ctdb. So, you're only ever talking to one nfs server at a time, and
failover
 between the nfs hosts is handled by ctdb.

I also tried your setup, but hit other complications. I used localhost
​in an old setup, ​
previously as I was under the assumption when accessing anything gluster
related,
​ the connection point only provides the volume info and you connect to any
server in the volume group.​


 Anyway, like I said, my main testing rig is now using this configuration,
 help me try and break it. :)

rm -rf /

Jokes aside, are you able to reboot a server without losing the VM ?
​ My experience with ctdb (based on your blog) was even with the
floating/virtual IP it wasn't fast enough, or something in the gluster
layer delayed the failover. Either way, the VM goes into paused state and
can't be resumed.​


 
  Jason
 
 
  
  
   I think this bug should be closed or re-targeted at documentation,
   because there is nothing we can do here. Hosted engine assumes that
   all writes are atomic and (immediately) available for all hosts in the
   cluster. Gluster violates those assumptions.
  
   ​
  
   ​Until the documentation gets updated, I hope this serves as a useful
   notice at least to save people some of the headaches I hit like
   hosted-engine starting up multiple VMs because of above issue.
   ​
  
   Now my question, does this theory prevent a scenario of perhaps
something
   like a gluster replicated volume being mounted as a glusterfs
filesystem
   and then re-exported as the native kernel NFS share for the
hosted-engine
   to consume? It could then be possible to chuck ctdb in there to
provide a
   last resort failover solution. I have tried myself and suggested it
to two
   people who are running a similar setup. Now using the native kernel
NFS
   server for hosted-engine and they haven't reported as many issues.
Curious,
   could anyone validate my theory on this?
  
   Thanks,
   Andrew
  
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Jason Brooks


   RE your bug, what do you use for a mount point for the nfs storage?
 
  In the log you attached to your bug, it looks like you're using localhost
 as
  the nfs mount point. I use a dns name that resolves to the virtual IP
 hosted
  by ctdb. So, you're only ever talking to one nfs server at a time, and
 failover
  between the nfs hosts is handled by ctdb.
 
 I also tried your setup, but hit other complications. I used localhost
 ​in an old setup, ​
 previously as I was under the assumption when accessing anything gluster
 related,
 ​ the connection point only provides the volume info and you connect to any
 server in the volume group.​

As I understand it, with Gluster's nfs, the server you mount is the
only one you're accessing directly, which is why you need to use something 
else to distribute the load, like round robin dns, with gluster's nfs.

 
 
  Anyway, like I said, my main testing rig is now using this configuration,
  help me try and break it. :)
 
 rm -rf /
 
 Jokes aside, are you able to reboot a server without losing the VM ?
 ​ My experience with ctdb (based on your blog) was even with the
 floating/virtual IP it wasn't fast enough, or something in the gluster
 layer delayed the failover. Either way, the VM goes into paused state and
 can't be resumed.​

I have rebooted my hosts without issue. If I want to reboot the host
that's serving the nfs storage, I've stopped ctdb first on that host, 
to make it hand off the nfs -- I've done this out of caution, but I
should try just pulling the plug, too.

The main source of VM pausing I've seen is when you have two nodes, one
goes down, and the gluster quorum business goes into effect. With my 
current 3 node, replica 3 setup, gluster stays happy wrt quorum.

I'll be sure to post about it if I have problems, but it's been working
well for me.

Jason


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Andrew Lau
Hi all,

As most of you have got hints from previous messages, hosted engine won't
work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is currently something we
really warn against.


I think this bug should be closed or re-targeted at documentation,
because there is nothing we can do here. Hosted engine assumes that
all writes are atomic and (immediately) available for all hosts in the
cluster. Gluster violates those assumptions.

​

​Until the documentation gets updated, I hope this serves as a useful
notice at least to save people some of the headaches I hit like
hosted-engine starting up multiple VMs because of above issue.
​

Now my question, does this theory prevent a scenario of perhaps something
like a gluster replicated volume being mounted as a glusterfs filesystem
and then re-exported as the native kernel NFS share for the hosted-engine
to consume? It could then be possible to chuck ctdb in there to provide a
last resort failover solution. I have tried myself and suggested it to two
people who are running a similar setup. Now using the native kernel NFS
server for hosted-engine and they haven't reported as many issues. Curious,
could anyone validate my theory on this?

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Vijay Bellur

[Adding gluster-devel]

On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages, hosted engine
won't work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is currently something
we really warn against.


I think this bug should be closed or re-targeted at documentation, because 
there is nothing we can do here. Hosted engine assumes that all writes are 
atomic and (immediately) available for all hosts in the cluster. Gluster 
violates those assumptions.
​
I tried going through BZ1097639 but could not find much detail with 
respect to gluster there.


A few questions around the problem:

1. Can somebody please explain in detail the scenario that causes the 
problem?


2. Is hosted engine performing synchronous writes to ensure that writes 
are durable?


Also, if there is any documentation that details the hosted engine 
architecture that would help in enhancing our understanding of its 
interactions with gluster.



​

Now my question, does this theory prevent a scenario of perhaps
something like a gluster replicated volume being mounted as a glusterfs
filesystem and then re-exported as the native kernel NFS share for the
hosted-engine to consume? It could then be possible to chuck ctdb in
there to provide a last resort failover solution. I have tried myself
and suggested it to two people who are running a similar setup. Now
using the native kernel NFS server for hosted-engine and they haven't
reported as many issues. Curious, could anyone validate my theory on this?



If we obtain more details on the use case and obtain gluster logs from 
the failed scenarios, we should be able to understand the problem 
better. That could be the first step in validating your theory or 
evolving further recommendations :).


Thanks,
Vijay
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Andrew Lau
​​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com wrote:

 [Adding gluster-devel]


 On 07/18/2014 05:20 PM, Andrew Lau wrote:

 Hi all,

 As most of you have got hints from previous messages, hosted engine
 won't work on gluster . A quote from BZ1097639

 Using hosted engine with Gluster backed storage is currently something
 we really warn against.


 I think this bug should be closed or re-targeted at documentation,
 because there is nothing we can do here. Hosted engine assumes that all
 writes are atomic and (immediately) available for all hosts in the cluster.
 Gluster violates those assumptions.
 ​

 I tried going through BZ1097639 but could not find much detail with
 respect to gluster there.

 A few questions around the problem:

 1. Can somebody please explain in detail the scenario that causes the
 problem?

 2. Is hosted engine performing synchronous writes to ensure that writes
 are durable?

 Also, if there is any documentation that details the hosted engine
 architecture that would help in enhancing our understanding of its
 interactions with gluster.


  ​

 Now my question, does this theory prevent a scenario of perhaps
 something like a gluster replicated volume being mounted as a glusterfs
 filesystem and then re-exported as the native kernel NFS share for the
 hosted-engine to consume? It could then be possible to chuck ctdb in
 there to provide a last resort failover solution. I have tried myself
 and suggested it to two people who are running a similar setup. Now
 using the native kernel NFS server for hosted-engine and they haven't
 reported as many issues. Curious, could anyone validate my theory on this?


 If we obtain more details on the use case and obtain gluster logs from the
 failed scenarios, we should be able to understand the problem better. That
 could be the first step in validating your theory or evolving further
 recommendations :).


​I'm not sure how useful this is, but ​Jiri Moskovcak tracked this down in
an off list message.

​Message Quote:​

​==​

​We were able to track it down to this (thanks Andrew for providing the
testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 165, in handle
response = success  + self._dispatch(data)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 261, in _dispatch
.get_all_stats_for_service_type(**options)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 41, in get_all_stats_for_service_type
d = self.get_raw_stats_for_service_type(storage_dir, service_type)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 74, in get_raw_stats_for_service_type
f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localho
st:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-
engine.metadata'

It's definitely connected to the storage which leads us to the gluster, I'm
not very familiar with the gluster so I need to check this with our gluster
 gurus.​

​==​



 Thanks,
 Vijay

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users