Re: [Linux-cluster] mixing OS versions?

2014-04-25 Thread Pavel Herrmann
Hi,

On Friday 25 of April 2014 12:42:59 Steven Whitehouse wrote:
> Hi,
> 
> On 24/04/14 17:29, Alan Brown wrote:
> > On 30/03/14 12:34, Steven Whitehouse wrote:
> >> Well that is not entirely true. We have done a great deal of
> >> investigation into this issue. We do test quotas (among many other
> >> things) on each release to ensure that they are working. Our tests have
> >> all passed correctly, and to date you have provided the only report of
> >> this particular issue via our support team. So it is certainly not
> >> something that lots of people are hitting.
> > 
> > Someone else reported it on this list (on centos), so we're not an
> > isolated case.
> > 
> >> We do now have a good idea of where the issue is. However it is clear
> >> that simply exceeding quotas is not enough to trigger it. Instead quotas
> >> need to be exceeded in a particular way.
> > 
> > My suspicion is that it's some kind of interaction between quotas and
> > NFS, but it'd be good if you could provide a fuller explanation.
> 
> Yes, thats what we thought to start with... however that turned out to
> be a bit of a red herring. Or at least the issue has nothing
> specifically to do with NFS. The problem was related to when quota was
> exceeded, and specifically what operation was in progress. You could
> write to files as often as you wanted to, and exceeding quota would be
> handled correctly. The problem was a specific code path within the inode
> creation code, if it didn't result in quota being exceeded on that one
> specific code path, then everything would work as expected.

could you please provide a (somewhat reliable) test case to reproduce this 
bug? I have looked at the patch, and found nothing obviously related to quotas 
(it seems the patch only changes the fail-path of posix_acl_create() call, 
which doesn't appear to have nothing to do with quotas)

I have been facing a possibly quota-related oops in GFS2 for some time, which 
I am unable to reproduce without switching my cluster to production use (which 
means potentialy facing the anger of my users, which I'd rather not do without 
at least a chance of the issue being fixed).

sadly, I don't have RedHat support subscription (nor do I use RHEL or 
derivates), my kernel is mostly upstream.

thanks
Pavel Herrmann

> 
> Also, quite often when the problem did appear, it did not actually
> trigger a problem until later, making it difficult to track down.
> 
> You are correct that someone else reported the issue on the list,
> however I'm not aware of any other reports beyond yours and theirs.
> Also, this was specific to certain versions of GFS2, and not something
> that relates to all versions.
> 
> The upstream patch is here:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/gfs
> 2?id=059788039f1e6343f34f46d202f8d9f2158c2783
> 
> It should be available in RHEL shortly - please ping support via the
> ticket for updates,
> 
> Steve.
> 
> >> Returning to the original point however, it is certainly not recommended
> >> to have mixed RHEL or CentOS versions running in the same cluster. It is
> >> much better to keep everything the same, even though the GFS2 on-disk
> >> format has not changed between the versions.
> > 
> > More specfically (for those who are curious): Whilst the on-disk
> > format has not changed between EL5 and EL6, the way that RH cluster
> > members communicate with each other has.
> > 
> > I ran a quick test some time back and the 2 different OS cluster
> > versions didn't see each other for LAN heartbeating.

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] mixing OS versions?

2014-04-25 Thread Steven Whitehouse

Hi,

On 24/04/14 17:29, Alan Brown wrote:

On 30/03/14 12:34, Steven Whitehouse wrote:


Well that is not entirely true. We have done a great deal of
investigation into this issue. We do test quotas (among many other
things) on each release to ensure that they are working. Our tests have
all passed correctly, and to date you have provided the only report of
this particular issue via our support team. So it is certainly not
something that lots of people are hitting.


Someone else reported it on this list (on centos), so we're not an 
isolated case.



We do now have a good idea of where the issue is. However it is clear
that simply exceeding quotas is not enough to trigger it. Instead quotas
need to be exceeded in a particular way.


My suspicion is that it's some kind of interaction between quotas and 
NFS, but it'd be good if you could provide a fuller explanation.


Yes, thats what we thought to start with... however that turned out to 
be a bit of a red herring. Or at least the issue has nothing 
specifically to do with NFS. The problem was related to when quota was 
exceeded, and specifically what operation was in progress. You could 
write to files as often as you wanted to, and exceeding quota would be 
handled correctly. The problem was a specific code path within the inode 
creation code, if it didn't result in quota being exceeded on that one 
specific code path, then everything would work as expected.


Also, quite often when the problem did appear, it did not actually 
trigger a problem until later, making it difficult to track down.


You are correct that someone else reported the issue on the list, 
however I'm not aware of any other reports beyond yours and theirs. 
Also, this was specific to certain versions of GFS2, and not something 
that relates to all versions.


The upstream patch is here:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/gfs2?id=059788039f1e6343f34f46d202f8d9f2158c2783

It should be available in RHEL shortly - please ping support via the 
ticket for updates,


Steve.


Returning to the original point however, it is certainly not recommended
to have mixed RHEL or CentOS versions running in the same cluster. It is
much better to keep everything the same, even though the GFS2 on-disk
format has not changed between the versions.


More specfically (for those who are curious): Whilst the on-disk 
format has not changed between EL5 and EL6, the way that RH cluster 
members communicate with each other has.


I ran a quick test some time back and the 2 different OS cluster 
versions didn't see each other for LAN heartbeating.






--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] mixing OS versions?

2014-03-30 Thread Steven Whitehouse
Hi,

On Fri, 2014-03-28 at 22:07 +, Alan Brown wrote:
> On 28/03/14 19:31, Fabio M. Di Nitto wrote:
> 
> > 
> > Are there any known issues, guidelines, or recommendations for having
> > a single RHCS cluster with different OS releases on the nodes?
> > Only one answer.. don't do it. It's not supported and it's only asking
> > for troubles.
> > 
> > 
> 
> Seconded. There are _substantial_ differences between Centos/RHEL 5
> and 6 clustering.
> 
> You can run one or the other OS, but you can't mix them. The on-disk
> format isn't affected.
> 
> Best path is to setup a cluster in 6, shut down the 5 cluster, attach
> disks to the 6 cluster and bring it all back up. The 5 boxes can be
> converted to version 6 afterwards.
> 
> (I'm going through this at the moment, as I have 2 EL5 clusters and 1
> EL6 cluster.)
> 
> TAKE NOTE:  RHEL/CentOS6 clustering is not quite ready for prime-time
> - if you enable GFS2 quotas and someone busts his quota the machine
> will panic.
> 
Well that is not entirely true. We have done a great deal of
investigation into this issue. We do test quotas (among many other
things) on each release to ensure that they are working. Our tests have
all passed correctly, and to date you have provided the only report of
this particular issue via our support team. So it is certainly not
something that lots of people are hitting.

We do now have a good idea of where the issue is. However it is clear
that simply exceeding quotas is not enough to trigger it. Instead quotas
need to be exceeded in a particular way.

Abhi is working on a fix which should be available very shortly now.

Returning to the original point however, it is certainly not recommended
to have mixed RHEL or CentOS versions running in the same cluster. It is
much better to keep everything the same, even though the GFS2 on-disk
format has not changed between the versions.

I hope that answers a few questions - let us know if you need more info,

Steve.


> -- 
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] mixing OS versions?

2014-03-29 Thread Masopust, Christian

> In the message dated: Sat, 29 Mar 2014 09:00:04 -, The pithy ruminations 
> from "Masopust, Christian" on
>  were:
> => > =>
> => > => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time 
> - => > => if you enable GFS2 quotas 
> and someone busts his quota the machine will => panic.
> => >
> => > That's an example of why I no longer use GFS2. :) => > => > Thanks, => > 
> => > Mark => => Hi Mark, => 
> => what instead of GFS2 ?
> 
> GPFS, as I wrote in the message to which you replied:
> 

sorry, my fault... didn't notice it as GPFS has not been on my radar up to now 
:)


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] mixing OS versions?

2014-03-29 Thread bergman


In the message dated: Sat, 29 Mar 2014 09:00:04 -,
The pithy ruminations from "Masopust, Christian" on 
 were:
=> > => 
=> > => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - 
=> > => if you enable GFS2 quotas and someone busts his quota the machine will 
=> panic.
=> > 
=> > That's an example of why I no longer use GFS2. :)
=> > 
=> > Thanks,
=> > 
=> > Mark
=> 
=> Hi Mark,
=> 
=> what instead of GFS2 ?

GPFS, as I wrote in the message to which you replied:


---------------
From: berg...@merctech.com
To: linux clustering 
Subject: Re: [Linux-cluster] mixing OS versions?
Date: Fri, 28 Mar 2014 18:35:42 -0400

[SNIP!]

For clarification, we're not using RHCS to manange any shared storage. The
only 'disk' component is the quorum disk.

We're using GPFS as the storage layer.

---

=> 
=> br,
=> christian
=> 
=> 

-- 
Mark Bergman 

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] mixing OS versions?

2014-03-29 Thread Masopust, Christian
> => 
> => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - 
> => if you enable GFS2 quotas and someone busts his quota the machine will => 
> panic.
> 
> That's an example of why I no longer use GFS2. :)
> 
> Thanks,
> 
> Mark

Hi Mark,

what instead of GFS2 ?

br,
christian


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] mixing OS versions?

2014-03-28 Thread bergman


In the message dated: Fri, 28 Mar 2014 22:07:48 -,
The pithy ruminations from Alan Brown on 
 were:
=> On 28/03/14 19:31, Fabio M. Di Nitto wrote:
=> >
=> > Are there any known issues, guidelines, or recommendations for having
=> > a single RHCS cluster with different OS releases on the nodes?
=> > Only one answer.. don't do it. It's not supported and it's only asking
=> > for troubles.

Thanks for all the warnings...not what I wanted to hear, but it's good
to get a clear, consistent message.

=> >
=> >
=> 
=> Seconded. There are _substantial_ differences between Centos/RHEL 5 and 
=> 6 clustering.
=> 
=> You can run one or the other OS, but you can't mix them. The on-disk 
=> format isn't affected.

For clarification, we're not using RHCS to manange any shared storage. The
only 'disk' component is the quorum disk.

We're using GPFS as the storage layer.

RHCS manages several services, such as:

httpd
mysql
nis
pgsql

=> 
=> Best path is to setup a cluster in 6, shut down the 5 cluster, attach 
=> disks to the 6 cluster and bring it all back up. The 5 boxes can be 
=> converted to version 6 afterwards.

That's what I was expecting, unfortunately.

I'll probably do a more gradual approach...bring up a CentOS6 cluster
with it's own quorum disk, and one-by-one add services (httpd, nis,
etc.) to that, bringing them down on the old cluster. Add in some CNAMES
and coordination with the network group and it should be relatively
transparent to the users.
=> 
=> (I'm going through this at the moment, as I have 2 EL5 clusters and 1 
=> EL6 cluster.)
=> 
=> TAKE NOTE:  RHEL/CentOS6 clustering is not quite ready for prime-time - 
=> if you enable GFS2 quotas and someone busts his quota the machine will 
=> panic.

That's an example of why I no longer use GFS2. :)

Thanks,

Mark

=> 
=> 
=> 
=> 
=> -- 
=> Linux-cluster mailing list
=> Linux-cluster@redhat.com
=> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] mixing OS versions?

2014-03-28 Thread Alan Brown

On 28/03/14 19:31, Fabio M. Di Nitto wrote:


Are there any known issues, guidelines, or recommendations for having
a single RHCS cluster with different OS releases on the nodes?
Only one answer.. don't do it. It's not supported and it's only asking
for troubles.




Seconded. There are _substantial_ differences between Centos/RHEL 5 and 
6 clustering.


You can run one or the other OS, but you can't mix them. The on-disk 
format isn't affected.


Best path is to setup a cluster in 6, shut down the 5 cluster, attach 
disks to the 6 cluster and bring it all back up. The 5 boxes can be 
converted to version 6 afterwards.


(I'm going through this at the moment, as I have 2 EL5 clusters and 1 
EL6 cluster.)


TAKE NOTE:  RHEL/CentOS6 clustering is not quite ready for prime-time - 
if you enable GFS2 quotas and someone busts his quota the machine will 
panic.




-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] mixing OS versions?

2014-03-28 Thread James Washer
You can get by, for a short time, with a minor revision difference, say 5.7
and 5.8, but, mixing 5 and 6 will not work. Period


On Fri, Mar 28, 2014 at 12:31 PM, Fabio M. Di Nitto wrote:

> On 03/28/2014 05:37 PM, berg...@merctech.com wrote:
> >
> >
> > I've got a 3-node cluster under CentOS5.
> >
> > I'd like to add 3 additional nodes, running CentOS6.
> >
> > Are there any known issues, guidelines, or recommendations for having
> > a single RHCS cluster with different OS releases on the nodes?
>
> Only one answer.. don't do it. It's not supported and it's only asking
> for troubles.
>
> Fabio
>
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 


 - jim
-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] mixing OS versions?

2014-03-28 Thread Fabio M. Di Nitto
On 03/28/2014 05:37 PM, berg...@merctech.com wrote:
> 
> 
> I've got a 3-node cluster under CentOS5.
> 
> I'd like to add 3 additional nodes, running CentOS6.
> 
> Are there any known issues, guidelines, or recommendations for having
> a single RHCS cluster with different OS releases on the nodes?

Only one answer.. don't do it. It's not supported and it's only asking
for troubles.

Fabio

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Linux-cluster] mixing OS versions?

2014-03-28 Thread bergman


I've got a 3-node cluster under CentOS5.

I'd like to add 3 additional nodes, running CentOS6.

Are there any known issues, guidelines, or recommendations for having
a single RHCS cluster with different OS releases on the nodes?

Thanks,

Mark

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster