Re: [Linux-cluster] mixing OS versions?

2014-04-25 Thread Steven Whitehouse

Hi,

On 24/04/14 17:29, Alan Brown wrote:

On 30/03/14 12:34, Steven Whitehouse wrote:


Well that is not entirely true. We have done a great deal of
investigation into this issue. We do test quotas (among many other
things) on each release to ensure that they are working. Our tests have
all passed correctly, and to date you have provided the only report of
this particular issue via our support team. So it is certainly not
something that lots of people are hitting.


Someone else reported it on this list (on centos), so we're not an 
isolated case.



We do now have a good idea of where the issue is. However it is clear
that simply exceeding quotas is not enough to trigger it. Instead quotas
need to be exceeded in a particular way.


My suspicion is that it's some kind of interaction between quotas and 
NFS, but it'd be good if you could provide a fuller explanation.


Yes, thats what we thought to start with... however that turned out to 
be a bit of a red herring. Or at least the issue has nothing 
specifically to do with NFS. The problem was related to when quota was 
exceeded, and specifically what operation was in progress. You could 
write to files as often as you wanted to, and exceeding quota would be 
handled correctly. The problem was a specific code path within the inode 
creation code, if it didn't result in quota being exceeded on that one 
specific code path, then everything would work as expected.


Also, quite often when the problem did appear, it did not actually 
trigger a problem until later, making it difficult to track down.


You are correct that someone else reported the issue on the list, 
however I'm not aware of any other reports beyond yours and theirs. 
Also, this was specific to certain versions of GFS2, and not something 
that relates to all versions.


The upstream patch is here:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/gfs2?id=059788039f1e6343f34f46d202f8d9f2158c2783

It should be available in RHEL shortly - please ping support via the 
ticket for updates,


Steve.


Returning to the original point however, it is certainly not recommended
to have mixed RHEL or CentOS versions running in the same cluster. It is
much better to keep everything the same, even though the GFS2 on-disk
format has not changed between the versions.


More specfically (for those who are curious): Whilst the on-disk 
format has not changed between EL5 and EL6, the way that RH cluster 
members communicate with each other has.


I ran a quick test some time back and the 2 different OS cluster 
versions didn't see each other for LAN heartbeating.






--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Linux-cluster] iSCSI GFS2 CMIRRORD

2014-04-25 Thread Schaefer, Micah
Hello All,
I have been successfully running a cluster for about a year. I have a question 
about best practice for my storage setup.

Currently, I have 2 front end nodes and two back end nodes. The front end nodes 
are part of the cluster, run all the services, etc. The back end nodes are only 
exporting raw block devices via iSCSI and are not cluster aware. The front end 
import the raw block and use GFS2 with LVM for storage. At this time, I am only 
using the block devices from one of the back end nodes.

I would like the LVMs to be mirrored across the two iSCSI devices, creating 
redundancy at the block level. The last time I tried this, when creating the 
LVM, it basically sat for 2 days making no progress. I now have 10GB network 
connections at my front end and back end nodes (was 1GB only before).

Also, on topology, these 4 nodes are across 2 buildings, 1 front end and 1 back 
end in each building. There are switches in each building that have layer 2 
connectivity (10GB) to each other. I also have 2 each 10GB connections per 
node, and multiple 1GB connections per node.

I have come up with the following scenarios, and am looking for advise on which 
of these methods to use (or none).

1:

 *   Connect all nodes to the 10GB switches.
 *   Use 1 10GB for iSCSI only and 1 for other ip traffic

2:

 *   Connect each back end node to each from end node via 10GB
 *   Use 1GB for other ip traffic

3:

 *   Connect the front end nodes to each other via 10GB
 *   Connect front end and back end nodes to 10GB switch for Ip traffic

I am also willing to use device mapper multi path if needed.

Thanks in advance for any assistance.

Regards,
---
Micah Schaefer
JHU/ APL
-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] iSCSI GFS2 CMIRRORD

2014-04-25 Thread emmanuel segura
you can use multipath when the system see a lun from more than one path,
but you your case are importing two differents devices from your backend
servers in your frontend server, sou you can use lvm mirror with cmirror in
your fronted cluster


2014-04-25 16:05 GMT+02:00 Schaefer, Micah micah.schae...@jhuapl.edu:

 Hello All,
 I have been successfully running a cluster for about a year. I have a
 question about best practice for my storage setup.

 Currently, I have 2 front end nodes and two back end nodes. The front end
 nodes are part of the cluster, run all the services, etc. The back end
 nodes are only exporting raw block devices via iSCSI and are not cluster
 aware. The front end import the raw block and use GFS2 with LVM for
 storage. At this time, I am only using the block devices from one of the
 back end nodes.

 I would like the LVMs to be mirrored across the two iSCSI devices,
 creating redundancy at the block level. The last time I tried this, when
 creating the LVM, it basically sat for 2 days making no progress. I now
 have 10GB network connections at my front end and back end nodes (was 1GB
 only before).

 Also, on topology, these 4 nodes are across 2 buildings, 1 front end and 1
 back end in each building. There are switches in each building that have
 layer 2 connectivity (10GB) to each other. I also have 2 each 10GB
 connections per node, and multiple 1GB connections per node.

 I have come up with the following scenarios, and am looking for advise on
 which of these methods to use (or none).

 1:

- Connect all nodes to the 10GB switches.
- Use 1 10GB for iSCSI only and 1 for other ip traffic

 2:

- Connect each back end node to each from end node via 10GB
- Use 1GB for other ip traffic

 3:

- Connect the front end nodes to each other via 10GB
- Connect front end and back end nodes to 10GB switch for Ip traffic

 I am also willing to use device mapper multi path if needed.

 Thanks in advance for any assistance.

 Regards,
 ---
 Micah Schaefer
 JHU/ APL

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster




-- 
esta es mi vida e me la vivo hasta que dios quiera
-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] luci question

2014-04-25 Thread Neale Ferguson
Hi,
 One of the guys created a simple configuration and was attempting to use luci 
to administer the cluster. It comes up fine but the links Admin ... Logout at 
the top left of the window that usually appears is not appearing. Looking at 
the code in the header html I see the following:

span py:if=tg.auth_stack_enabled py:strip=True
py:if test=request.identity
  li class=loginlogouta href=${tg.url('/admin')} class=${('', 
'active')[defined('page') and page==page=='admin']}Admin/a/li
  li class=loginlogouta href=${tg.url('/prefs')} class=${('', 
'active')[defined('page') and page==page=='prefs']}Preferences/a/li
  li id=login class=loginlogouta 
href=${tg.url('/logout_handler')}Logout/a/li
/py:if
   li py:if=not request.identity id=login class=loginlogouta 
href=${tg.url('/login')}Login/a/li
/span

What affects (or effects) the tg.auth_stack_enabled value? I assume its some 
browser setting but really have no clue.

Neale


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] luci question

2014-04-25 Thread Cao, Vinh
What type of the browser are you using?
I have the same issue with IE. But if I use Firefox. It's there for me. 
I'm hoping that is it what you are looking for.

Vinh
-Original Message-
From: linux-cluster-boun...@redhat.com
[mailto:linux-cluster-boun...@redhat.com] On Behalf Of Neale Ferguson
Sent: Friday, April 25, 2014 3:14 PM
To: linux clustering
Subject: [Linux-cluster] luci question

Hi,
 One of the guys created a simple configuration and was attempting to use
luci to administer the cluster. It comes up fine but the links Admin ...
Logout at the top left of the window that usually appears is not appearing.
Looking at the code in the header html I see the following:

span py:if=tg.auth_stack_enabled py:strip=True
py:if test=request.identity
  li class=loginlogouta href=${tg.url('/admin')}
class=${('', 'active')[defined('page') and
page==page=='admin']}Admin/a/li
  li class=loginlogouta href=${tg.url('/prefs')}
class=${('', 'active')[defined('page') and
page==page=='prefs']}Preferences/a/li
  li id=login class=loginlogouta
href=${tg.url('/logout_handler')}Logout/a/li
/py:if
   li py:if=not request.identity id=login class=loginlogouta
href=${tg.url('/login')}Login/a/li
/span

What affects (or effects) the tg.auth_stack_enabled value? I assume its some
browser setting but really have no clue.

Neale


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


smime.p7s
Description: S/MIME cryptographic signature
-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] mixing OS versions?

2014-04-25 Thread Pavel Herrmann
Hi,

On Friday 25 of April 2014 12:42:59 Steven Whitehouse wrote:
 Hi,
 
 On 24/04/14 17:29, Alan Brown wrote:
  On 30/03/14 12:34, Steven Whitehouse wrote:
  Well that is not entirely true. We have done a great deal of
  investigation into this issue. We do test quotas (among many other
  things) on each release to ensure that they are working. Our tests have
  all passed correctly, and to date you have provided the only report of
  this particular issue via our support team. So it is certainly not
  something that lots of people are hitting.
  
  Someone else reported it on this list (on centos), so we're not an
  isolated case.
  
  We do now have a good idea of where the issue is. However it is clear
  that simply exceeding quotas is not enough to trigger it. Instead quotas
  need to be exceeded in a particular way.
  
  My suspicion is that it's some kind of interaction between quotas and
  NFS, but it'd be good if you could provide a fuller explanation.
 
 Yes, thats what we thought to start with... however that turned out to
 be a bit of a red herring. Or at least the issue has nothing
 specifically to do with NFS. The problem was related to when quota was
 exceeded, and specifically what operation was in progress. You could
 write to files as often as you wanted to, and exceeding quota would be
 handled correctly. The problem was a specific code path within the inode
 creation code, if it didn't result in quota being exceeded on that one
 specific code path, then everything would work as expected.

could you please provide a (somewhat reliable) test case to reproduce this 
bug? I have looked at the patch, and found nothing obviously related to quotas 
(it seems the patch only changes the fail-path of posix_acl_create() call, 
which doesn't appear to have nothing to do with quotas)

I have been facing a possibly quota-related oops in GFS2 for some time, which 
I am unable to reproduce without switching my cluster to production use (which 
means potentialy facing the anger of my users, which I'd rather not do without 
at least a chance of the issue being fixed).

sadly, I don't have RedHat support subscription (nor do I use RHEL or 
derivates), my kernel is mostly upstream.

thanks
Pavel Herrmann

 
 Also, quite often when the problem did appear, it did not actually
 trigger a problem until later, making it difficult to track down.
 
 You are correct that someone else reported the issue on the list,
 however I'm not aware of any other reports beyond yours and theirs.
 Also, this was specific to certain versions of GFS2, and not something
 that relates to all versions.
 
 The upstream patch is here:
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/gfs
 2?id=059788039f1e6343f34f46d202f8d9f2158c2783
 
 It should be available in RHEL shortly - please ping support via the
 ticket for updates,
 
 Steve.
 
  Returning to the original point however, it is certainly not recommended
  to have mixed RHEL or CentOS versions running in the same cluster. It is
  much better to keep everything the same, even though the GFS2 on-disk
  format has not changed between the versions.
  
  More specfically (for those who are curious): Whilst the on-disk
  format has not changed between EL5 and EL6, the way that RH cluster
  members communicate with each other has.
  
  I ran a quick test some time back and the 2 different OS cluster
  versions didn't see each other for LAN heartbeating.

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster