Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Itamar Heim

On 07/22/2014 04:28 AM, Vijay Bellur wrote:

On 07/21/2014 05:09 AM, Pranith Kumar Karampuri wrote:


On 07/21/2014 02:08 PM, Jiri Moskovcak wrote:

On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote:


On 07/19/2014 11:25 AM, Andrew Lau wrote:



On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri
pkara...@redhat.com mailto:pkara...@redhat.com wrote:


On 07/18/2014 05:43 PM, Andrew Lau wrote:

​ ​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
vbel...@redhat.com mailto:vbel...@redhat.com wrote:

[Adding gluster-devel]


On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages,
hosted engine
won't work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is
currently something
we really warn against.


I think this bug should be closed or re-targeted at
documentation, because there is nothing we can do here.
Hosted engine assumes that all writes are atomic and
(immediately) available for all hosts in the cluster.
Gluster violates those assumptions.
​

I tried going through BZ1097639 but could not find much
detail with respect to gluster there.

A few questions around the problem:

1. Can somebody please explain in detail the scenario that
causes the problem?

2. Is hosted engine performing synchronous writes to ensure
that writes are durable?

Also, if there is any documentation that details the hosted
engine architecture that would help in enhancing our
understanding of its interactions with gluster.


​

Now my question, does this theory prevent a scenario of
perhaps
something like a gluster replicated volume being mounted
as a glusterfs
filesystem and then re-exported as the native kernel NFS
share for the
hosted-engine to consume? It could then be possible to
chuck ctdb in
there to provide a last resort failover solution. I have
tried myself
and suggested it to two people who are running a similar
setup. Now
using the native kernel NFS server for hosted-engine and
they haven't
reported as many issues. Curious, could anyone validate
my theory on this?


If we obtain more details on the use case and obtain gluster
logs from the failed scenarios, we should be able to
understand the problem better. That could be the first step
in validating your theory or evolving further
recommendations :).


​ I'm not sure how useful this is, but ​Jiri Moskovcak tracked
this down in an off list message.

​ Message Quote:​

​ ==​

​We were able to track it down to this (thanks Andrew for
providing the testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,


line 165, in handle
  response = success  + self._dispatch(data)
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,


line 261, in _dispatch
  .get_all_stats_for_service_type(**options)
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,


line 41, in get_all_stats_for_service_type
  d = self.get_raw_stats_for_service_type(storage_dir,
service_type)
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,


line 74, in get_raw_stats_for_service_type
  f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle:
'/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'



Andrew/Jiri,
Would it be possible to post gluster logs of both the
mount and bricks on the bz? I can take a look at it once. If I
gather nothing then probably I will ask for your help in
re-creating the issue.

Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll
try replicate when I get a chance. If I understand the comment from
the BZ, I don't think it's a gluster bug per-say, more just how
gluster does its replication.

hi Andrew,
  Thanks for that. I couldn't come to any conclusions
because no
logs were available. It is unlikely that self-heal is involved because
there were no bricks going down/up according to the bug description.



Hi,
I've never had such setup, I guessed problem with gluster based on
OSError: [Errno 116] Stale file handle: which happens when the file
opened by application on client gets removed on the server. I'm pretty
sure 

Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-21 Thread Pranith Kumar Karampuri


On 07/21/2014 02:08 PM, Jiri Moskovcak wrote:

On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote:


On 07/19/2014 11:25 AM, Andrew Lau wrote:



On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri
pkara...@redhat.com mailto:pkara...@redhat.com wrote:


On 07/18/2014 05:43 PM, Andrew Lau wrote:

​ ​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
vbel...@redhat.com mailto:vbel...@redhat.com wrote:

[Adding gluster-devel]


On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages,
hosted engine
won't work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is
currently something
we really warn against.


I think this bug should be closed or re-targeted at
documentation, because there is nothing we can do here.
Hosted engine assumes that all writes are atomic and
(immediately) available for all hosts in the cluster.
Gluster violates those assumptions.
​

I tried going through BZ1097639 but could not find much
detail with respect to gluster there.

A few questions around the problem:

1. Can somebody please explain in detail the scenario that
causes the problem?

2. Is hosted engine performing synchronous writes to ensure
that writes are durable?

Also, if there is any documentation that details the hosted
engine architecture that would help in enhancing our
understanding of its interactions with gluster.


​

Now my question, does this theory prevent a scenario of
perhaps
something like a gluster replicated volume being mounted
as a glusterfs
filesystem and then re-exported as the native kernel NFS
share for the
hosted-engine to consume? It could then be possible to
chuck ctdb in
there to provide a last resort failover solution. I have
tried myself
and suggested it to two people who are running a similar
setup. Now
using the native kernel NFS server for hosted-engine and
they haven't
reported as many issues. Curious, could anyone validate
my theory on this?


If we obtain more details on the use case and obtain gluster
logs from the failed scenarios, we should be able to
understand the problem better. That could be the first step
in validating your theory or evolving further 
recommendations :).



​ I'm not sure how useful this is, but ​Jiri Moskovcak tracked
this down in an off list message.

​ Message Quote:​

​ ==​

​We were able to track it down to this (thanks Andrew for
providing the testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 165, in handle
  response = success  + self._dispatch(data)
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 261, in _dispatch
  .get_all_stats_for_service_type(**options)
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 41, in get_all_stats_for_service_type
  d = self.get_raw_stats_for_service_type(storage_dir, 
service_type)

File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 74, in get_raw_stats_for_service_type
  f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle:
'/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'

Andrew/Jiri,
Would it be possible to post gluster logs of both the
mount and bricks on the bz? I can take a look at it once. If I
gather nothing then probably I will ask for your help in
re-creating the issue.

Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll
try replicate when I get a chance. If I understand the comment from
the BZ, I don't think it's a gluster bug per-say, more just how
gluster does its replication.

hi Andrew,
  Thanks for that. I couldn't come to any conclusions because no
logs were available. It is unlikely that self-heal is involved because
there were no bricks going down/up according to the bug description.



Hi,
I've never had such setup, I guessed problem with gluster based on 
OSError: [Errno 116] Stale file handle: which happens when the file 
opened by application on client gets removed on the server. I'm pretty 
sure we (hosted-engine) don't remove that file, so I think it's some 
gluster magic moving the data 

Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-21 Thread Jiri Moskovcak

On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote:


On 07/19/2014 11:25 AM, Andrew Lau wrote:



On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri
pkara...@redhat.com mailto:pkara...@redhat.com wrote:


On 07/18/2014 05:43 PM, Andrew Lau wrote:

​ ​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
vbel...@redhat.com mailto:vbel...@redhat.com wrote:

[Adding gluster-devel]


On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages,
hosted engine
won't work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is
currently something
we really warn against.


I think this bug should be closed or re-targeted at
documentation, because there is nothing we can do here.
Hosted engine assumes that all writes are atomic and
(immediately) available for all hosts in the cluster.
Gluster violates those assumptions.
​

I tried going through BZ1097639 but could not find much
detail with respect to gluster there.

A few questions around the problem:

1. Can somebody please explain in detail the scenario that
causes the problem?

2. Is hosted engine performing synchronous writes to ensure
that writes are durable?

Also, if there is any documentation that details the hosted
engine architecture that would help in enhancing our
understanding of its interactions with gluster.


​

Now my question, does this theory prevent a scenario of
perhaps
something like a gluster replicated volume being mounted
as a glusterfs
filesystem and then re-exported as the native kernel NFS
share for the
hosted-engine to consume? It could then be possible to
chuck ctdb in
there to provide a last resort failover solution. I have
tried myself
and suggested it to two people who are running a similar
setup. Now
using the native kernel NFS server for hosted-engine and
they haven't
reported as many issues. Curious, could anyone validate
my theory on this?


If we obtain more details on the use case and obtain gluster
logs from the failed scenarios, we should be able to
understand the problem better. That could be the first step
in validating your theory or evolving further recommendations :).


​ I'm not sure how useful this is, but ​Jiri Moskovcak tracked
this down in an off list message.

​ Message Quote:​

​ ==​

​We were able to track it down to this (thanks Andrew for
providing the testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 165, in handle
  response = success  + self._dispatch(data)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 261, in _dispatch
  .get_all_stats_for_service_type(**options)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 41, in get_all_stats_for_service_type
  d = self.get_raw_stats_for_service_type(storage_dir, service_type)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 74, in get_raw_stats_for_service_type
  f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle:

'/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'

Andrew/Jiri,
Would it be possible to post gluster logs of both the
mount and bricks on the bz? I can take a look at it once. If I
gather nothing then probably I will ask for your help in
re-creating the issue.

Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll
try replicate when I get a chance. If I understand the comment from
the BZ, I don't think it's a gluster bug per-say, more just how
gluster does its replication.

hi Andrew,
  Thanks for that. I couldn't come to any conclusions because no
logs were available. It is unlikely that self-heal is involved because
there were no bricks going down/up according to the bug description.



Hi,
I've never had such setup, I guessed problem with gluster based on 
OSError: [Errno 116] Stale file handle: which happens when the file 
opened by application on client gets removed on the server. I'm pretty 
sure we (hosted-engine) don't remove that file, so I think it's some 
gluster magic moving the data around...


--Jirka


Pranith





 

Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-21 Thread Vijay Bellur

On 07/21/2014 05:09 AM, Pranith Kumar Karampuri wrote:


On 07/21/2014 02:08 PM, Jiri Moskovcak wrote:

On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote:


On 07/19/2014 11:25 AM, Andrew Lau wrote:



On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri
pkara...@redhat.com mailto:pkara...@redhat.com wrote:


On 07/18/2014 05:43 PM, Andrew Lau wrote:

​ ​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
vbel...@redhat.com mailto:vbel...@redhat.com wrote:

[Adding gluster-devel]


On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages,
hosted engine
won't work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is
currently something
we really warn against.


I think this bug should be closed or re-targeted at
documentation, because there is nothing we can do here.
Hosted engine assumes that all writes are atomic and
(immediately) available for all hosts in the cluster.
Gluster violates those assumptions.
​

I tried going through BZ1097639 but could not find much
detail with respect to gluster there.

A few questions around the problem:

1. Can somebody please explain in detail the scenario that
causes the problem?

2. Is hosted engine performing synchronous writes to ensure
that writes are durable?

Also, if there is any documentation that details the hosted
engine architecture that would help in enhancing our
understanding of its interactions with gluster.


​

Now my question, does this theory prevent a scenario of
perhaps
something like a gluster replicated volume being mounted
as a glusterfs
filesystem and then re-exported as the native kernel NFS
share for the
hosted-engine to consume? It could then be possible to
chuck ctdb in
there to provide a last resort failover solution. I have
tried myself
and suggested it to two people who are running a similar
setup. Now
using the native kernel NFS server for hosted-engine and
they haven't
reported as many issues. Curious, could anyone validate
my theory on this?


If we obtain more details on the use case and obtain gluster
logs from the failed scenarios, we should be able to
understand the problem better. That could be the first step
in validating your theory or evolving further
recommendations :).


​ I'm not sure how useful this is, but ​Jiri Moskovcak tracked
this down in an off list message.

​ Message Quote:​

​ ==​

​We were able to track it down to this (thanks Andrew for
providing the testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,

line 165, in handle
  response = success  + self._dispatch(data)
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,

line 261, in _dispatch
  .get_all_stats_for_service_type(**options)
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,

line 41, in get_all_stats_for_service_type
  d = self.get_raw_stats_for_service_type(storage_dir,
service_type)
File
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,

line 74, in get_raw_stats_for_service_type
  f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle:
'/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'


Andrew/Jiri,
Would it be possible to post gluster logs of both the
mount and bricks on the bz? I can take a look at it once. If I
gather nothing then probably I will ask for your help in
re-creating the issue.

Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll
try replicate when I get a chance. If I understand the comment from
the BZ, I don't think it's a gluster bug per-say, more just how
gluster does its replication.

hi Andrew,
  Thanks for that. I couldn't come to any conclusions because no
logs were available. It is unlikely that self-heal is involved because
there were no bricks going down/up according to the bug description.



Hi,
I've never had such setup, I guessed problem with gluster based on
OSError: [Errno 116] Stale file handle: which happens when the file
opened by application on client gets removed on the server. I'm pretty
sure we (hosted-engine) don't remove that file, so I 

Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-19 Thread Pranith Kumar Karampuri


On 07/19/2014 11:25 AM, Andrew Lau wrote:



On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri 
pkara...@redhat.com mailto:pkara...@redhat.com wrote:



On 07/18/2014 05:43 PM, Andrew Lau wrote:

​ ​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
vbel...@redhat.com mailto:vbel...@redhat.com wrote:

[Adding gluster-devel]


On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages,
hosted engine
won't work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is
currently something
we really warn against.


I think this bug should be closed or re-targeted at
documentation, because there is nothing we can do here.
Hosted engine assumes that all writes are atomic and
(immediately) available for all hosts in the cluster.
Gluster violates those assumptions.
​

I tried going through BZ1097639 but could not find much
detail with respect to gluster there.

A few questions around the problem:

1. Can somebody please explain in detail the scenario that
causes the problem?

2. Is hosted engine performing synchronous writes to ensure
that writes are durable?

Also, if there is any documentation that details the hosted
engine architecture that would help in enhancing our
understanding of its interactions with gluster.


​

Now my question, does this theory prevent a scenario of
perhaps
something like a gluster replicated volume being mounted
as a glusterfs
filesystem and then re-exported as the native kernel NFS
share for the
hosted-engine to consume? It could then be possible to
chuck ctdb in
there to provide a last resort failover solution. I have
tried myself
and suggested it to two people who are running a similar
setup. Now
using the native kernel NFS server for hosted-engine and
they haven't
reported as many issues. Curious, could anyone validate
my theory on this?


If we obtain more details on the use case and obtain gluster
logs from the failed scenarios, we should be able to
understand the problem better. That could be the first step
in validating your theory or evolving further recommendations :).


​ I'm not sure how useful this is, but ​Jiri Moskovcak tracked
this down in an off list message.

​ Message Quote:​

​ ==​

​We were able to track it down to this (thanks Andrew for
providing the testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 165, in handle
  response = success  + self._dispatch(data)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 261, in _dispatch
  .get_all_stats_for_service_type(**options)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 41, in get_all_stats_for_service_type
  d = self.get_raw_stats_for_service_type(storage_dir, service_type)
File

/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 74, in get_raw_stats_for_service_type
  f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle:

'/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'

Andrew/Jiri,
Would it be possible to post gluster logs of both the
mount and bricks on the bz? I can take a look at it once. If I
gather nothing then probably I will ask for your help in
re-creating the issue.

Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll 
try replicate when I get a chance. If I understand the comment from 
the BZ, I don't think it's a gluster bug per-say, more just how 
gluster does its replication.

hi Andrew,
 Thanks for that. I couldn't come to any conclusions because no 
logs were available. It is unlikely that self-heal is involved because 
there were no bricks going down/up according to the bug description.


Pranith





It's definitely connected to the storage which leads us to the
gluster, I'm not very familiar with the gluster so I need to
check this with our gluster gurus.​

​== ​

Thanks,
Vijay




___
Gluster-devel mailing list
Gluster-devel@gluster.org  mailto:Gluster-devel@gluster.org

Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-19 Thread Andrew Lau
On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri 
pkara...@redhat.com wrote:


 On 07/18/2014 05:43 PM, Andrew Lau wrote:

  ​ ​

  On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com
 wrote:

 [Adding gluster-devel]


 On 07/18/2014 05:20 PM, Andrew Lau wrote:

 Hi all,

 As most of you have got hints from previous messages, hosted engine
 won't work on gluster . A quote from BZ1097639

 Using hosted engine with Gluster backed storage is currently something
 we really warn against.


 I think this bug should be closed or re-targeted at documentation,
 because there is nothing we can do here. Hosted engine assumes that all
 writes are atomic and (immediately) available for all hosts in the cluster.
 Gluster violates those assumptions.
 ​

  I tried going through BZ1097639 but could not find much detail with
 respect to gluster there.

 A few questions around the problem:

 1. Can somebody please explain in detail the scenario that causes the
 problem?

 2. Is hosted engine performing synchronous writes to ensure that writes
 are durable?

 Also, if there is any documentation that details the hosted engine
 architecture that would help in enhancing our understanding of its
 interactions with gluster.


 ​

 Now my question, does this theory prevent a scenario of perhaps
 something like a gluster replicated volume being mounted as a glusterfs
 filesystem and then re-exported as the native kernel NFS share for the
 hosted-engine to consume? It could then be possible to chuck ctdb in
 there to provide a last resort failover solution. I have tried myself
 and suggested it to two people who are running a similar setup. Now
 using the native kernel NFS server for hosted-engine and they haven't
 reported as many issues. Curious, could anyone validate my theory on
 this?


  If we obtain more details on the use case and obtain gluster logs from
 the failed scenarios, we should be able to understand the problem better.
 That could be the first step in validating your theory or evolving further
 recommendations :).


  ​I'm not sure how useful this is, but ​Jiri Moskovcak tracked this down
 in an off list message.

  ​Message Quote:​

  ​==​

   ​We were able to track it down to this (thanks Andrew for providing the
 testing setup):

 -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
 Traceback (most recent call last):
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
 line 165, in handle
 response = success  + self._dispatch(data)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
 line 261, in _dispatch
 .get_all_stats_for_service_type(**options)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 41, in get_all_stats_for_service_type
 d = self.get_raw_stats_for_service_type(storage_dir, service_type)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 74, in get_raw_stats_for_service_type
 f = os.open(path, direct_flag | os.O_RDONLY)
 OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localho
 st:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted
 -engine.metadata'

 Andrew/Jiri,
 Would it be possible to post gluster logs of both the mount and
 bricks on the bz? I can take a look at it once. If I gather nothing then
 probably I will ask for your help in re-creating the issue.

 Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll try
replicate when I get a chance. If I understand the comment from the BZ, I
don't think it's a gluster bug per-say, more just how gluster does its
replication.





 It's definitely connected to the storage which leads us to the gluster,
 I'm not very familiar with the gluster so I need to check this with our
 gluster gurus.​

  ​==​



 Thanks,
 Vijay




 ___
 Gluster-devel mailing 
 listGluster-devel@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Vijay Bellur

[Adding gluster-devel]

On 07/18/2014 05:20 PM, Andrew Lau wrote:

Hi all,

As most of you have got hints from previous messages, hosted engine
won't work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is currently something
we really warn against.


I think this bug should be closed or re-targeted at documentation, because 
there is nothing we can do here. Hosted engine assumes that all writes are 
atomic and (immediately) available for all hosts in the cluster. Gluster 
violates those assumptions.
​
I tried going through BZ1097639 but could not find much detail with 
respect to gluster there.


A few questions around the problem:

1. Can somebody please explain in detail the scenario that causes the 
problem?


2. Is hosted engine performing synchronous writes to ensure that writes 
are durable?


Also, if there is any documentation that details the hosted engine 
architecture that would help in enhancing our understanding of its 
interactions with gluster.



​

Now my question, does this theory prevent a scenario of perhaps
something like a gluster replicated volume being mounted as a glusterfs
filesystem and then re-exported as the native kernel NFS share for the
hosted-engine to consume? It could then be possible to chuck ctdb in
there to provide a last resort failover solution. I have tried myself
and suggested it to two people who are running a similar setup. Now
using the native kernel NFS server for hosted-engine and they haven't
reported as many issues. Curious, could anyone validate my theory on this?



If we obtain more details on the use case and obtain gluster logs from 
the failed scenarios, we should be able to understand the problem 
better. That could be the first step in validating your theory or 
evolving further recommendations :).


Thanks,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Andrew Lau
​​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com wrote:

 [Adding gluster-devel]


 On 07/18/2014 05:20 PM, Andrew Lau wrote:

 Hi all,

 As most of you have got hints from previous messages, hosted engine
 won't work on gluster . A quote from BZ1097639

 Using hosted engine with Gluster backed storage is currently something
 we really warn against.


 I think this bug should be closed or re-targeted at documentation,
 because there is nothing we can do here. Hosted engine assumes that all
 writes are atomic and (immediately) available for all hosts in the cluster.
 Gluster violates those assumptions.
 ​

 I tried going through BZ1097639 but could not find much detail with
 respect to gluster there.

 A few questions around the problem:

 1. Can somebody please explain in detail the scenario that causes the
 problem?

 2. Is hosted engine performing synchronous writes to ensure that writes
 are durable?

 Also, if there is any documentation that details the hosted engine
 architecture that would help in enhancing our understanding of its
 interactions with gluster.


  ​

 Now my question, does this theory prevent a scenario of perhaps
 something like a gluster replicated volume being mounted as a glusterfs
 filesystem and then re-exported as the native kernel NFS share for the
 hosted-engine to consume? It could then be possible to chuck ctdb in
 there to provide a last resort failover solution. I have tried myself
 and suggested it to two people who are running a similar setup. Now
 using the native kernel NFS server for hosted-engine and they haven't
 reported as many issues. Curious, could anyone validate my theory on this?


 If we obtain more details on the use case and obtain gluster logs from the
 failed scenarios, we should be able to understand the problem better. That
 could be the first step in validating your theory or evolving further
 recommendations :).


​I'm not sure how useful this is, but ​Jiri Moskovcak tracked this down in
an off list message.

​Message Quote:​

​==​

​We were able to track it down to this (thanks Andrew for providing the
testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 165, in handle
response = success  + self._dispatch(data)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 261, in _dispatch
.get_all_stats_for_service_type(**options)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 41, in get_all_stats_for_service_type
d = self.get_raw_stats_for_service_type(storage_dir, service_type)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 74, in get_raw_stats_for_service_type
f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localho
st:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-
engine.metadata'

It's definitely connected to the storage which leads us to the gluster, I'm
not very familiar with the gluster so I need to check this with our gluster
 gurus.​

​==​



 Thanks,
 Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel