Re: [Linux-ha-dev] NFS resource agent for active-active clusters.

Ben Timby Thu, 25 Mar 2010 07:19:47 -0700

Tim, I replied in line below so I could address each of your comments.

On Wed, Mar 24, 2010 at 11:55 PM, Tim Serong <tser...@novell.com> wrote:
> On 3/25/2010 at 05:59 AM, Ben Timby <bti...@gmail.com> wrote:
>> Attached is a resource agent that I call exportfs.
>>
>> Rather than starting/stopping NFS, it uses exportfs to add/remove
>> individual exports.
>
> Awesome, as Florian said :)
>
> A couple of random comments...
>
>> It also takes care to use cluster-wide unique fsid parameters for each
>> export. It ensures that this fsid is migrated with the resource.
>
> An alternative to automating this would be to just push the burden of
> fsid assignment to the sysadmin (have the RA return $OCF_ERR_CONFIGURED
> if no fsid was explicitly specified).  Makes the code slightly simpler
> at the expense of some small administrative effort :)


I just provided a patch for this change. I was on the fence of whether
to do it this way or not, but ultimately decided to make the script as
turnkey as possible. I am happy either way :-).

> Now for a little potential nastiness...  I did some work in this area
> a year or two ago, and at the time, we ran into some curious edge cases.
> Hopefully things have moved on a little since then in NFS-land (I was
> using SLES 10 SP2, from memory), but for reference, have a look at:
>
>  http://marc.info/?l=linux-nfs&m=123175640421702&w=2
>
> This describes an edge case where (depending on what the clients are
> doing), it's possible that running "exportfs -i" to export one directory
> will result in an interruption of service to an unrelated exported
> directory on the same node.

I think you are advocating additional testing, I address that below...

> There's also a problem whereby you almost certainly can't rely on the
> return code from exportfs actually telling you the directory was exported
> successfully.  The only reason exportfs will fail is if you pass invalid
> options, and it's possible that exportfs will return before the export
> has actually appeared in /var/lib/nfs/etab (exportfs says "please kernel,
> export this when you get a chance, kthxbye").

Are you suggesting that we poll the etab file to make sure our export
appears before calling the operation a success?

> We ran into these issues because we were doing failover testing while
> the system was under heavy load (continuous write of several GB, followed
> by reading the same data back for verification), while failing over
> multiple NFS exports from node to node.  You probably won't ever hit them
> unless the system is being severely hammered...  But I'd still recommend
> further testing along these lines, out of sheer paranoia.

I will definitely do some testing. If I understand your statement,
then the following scenario will help to determine if this is a
problem for me or not.

1. Bring up cluster in active-active mode, both nodes online.
2. Start $ dd if=/dev/zero of=/path/to/fs0/bigfile bs=1GB count=10 on client
3. Fail over resource fs1.
4. Make sure the addition of fs1 to the node handling fs0 does not
cause disruption.

... and then ...

1. Bring up cluster in active-active mode, both nodes online.
2. Start $ dd if=/path/to/fs0/bigfile of=/dev/null bs=1GB count=10 on client
3. Fail over resource fs1.
4. Make sure the addition of fs1 to the node handling fs0 does not
cause disruption.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] NFS resource agent for active-active clusters.

Reply via email to