Re: [Pacemaker] drbd + lvm

2014-06-12 Thread Lars Ellenberg
On Thu, Mar 13, 2014 at 03:57:28PM -0400, David Vossel wrote:
> 
> 
> 
> 
> - Original Message -
> > From: "Infoomatic" 
> > To: pacemaker@oss.clusterlabs.org
> > Sent: Thursday, March 13, 2014 2:26:00 PM
> > Subject: [Pacemaker] drbd + lvm
> > 
> > Hi list,
> > 
> > I am having troubles with pacemaker and lvm and stacked drbd resources.
> > The system consists of 2 Ubuntu 12 LTS servers, each having two partitions 
> > of
> > an underlying raid 1+0 as volume group with one LV each as a drbd backing
> > device. The purpose is for usage with VMs and adjusting needed disk space
> > flexible, so on top of the drbd resources there are LVs for each VM.
> > I created a stack with LCMC, which is like:
> > 
> > DRBD->LV->libvirt and
> > DRBD->LV->Filesystem->lxc
> > 
> > The problem now: the system has "hickups" - when VM01 runs on HOST01 (being
> > primary DRBD) and HOST02 is restarting, lvm is reloaded (at boot time) and
> > the LVs are being activated. This of course results in an error, the log
> > entry:
> > 
> > Mar 13 17:58:42 host01 pengine: [27563]: ERROR: native_create_actions:
> > Resource res_LVM_1 (ocf::LVM) is active on 2 nodes attempting recovery
> > 
> > Therefore, as configured, the resource is stopped and started again (on only
> > one node). Thus, all VMs and containers relying on this are also restared.
> > 
> > When I disable the LVs that use the DRBD resource at boot (lvm.conf:
> > volume_list only containing the VG from the partitions of the raidsystem) a
> > reboot of the secondary does not restart the VMs running on the primary.
> > However, if the primary goes down (e.g. power interruption), the secondary
> > cannot activate the LVs of the VMs because they are not in the list of
> > lvm.conf to be activated.
> > 
> > Has anyone had this issue and resolved it? Any ideas? Thanks in advance!
> 
> Yep, i've hit this as well. Use the latest LVM agent. I already fixed all of 
> this.

I you exclude the DRBD lower level devices in your lvm.conf filter
(and update your initramfs to have a proper copy of that lvm.conf),
and only allow them to be accessed via DRBD,
LVM cannot possibly activate them "on boot".
But only after DRBD was promoted.
Which supposedly happens via pacemaker only.
And unless some udev rule auto-activates any VG found immediately,
it should only be activated via pacemaker as well.

So something like that should be in your lvm.conf:
  filter = [ "a|^/dev/your/system/PVs|", "a|^/dev/drbd|", "r|.|" ]

> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM
> 
> Keep your volume_list the way it is and use the 'exclusive=true' LVM
> option.   This will allow the LVM agent to activate volumes that don't
> exist in the volume_list.

That is a nice feature, but if I'm correct, it is unrelated here.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd + lvm

2014-03-14 Thread David Vossel




- Original Message -
> From: "Infoomatic" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Thursday, March 13, 2014 5:28:19 PM
> Subject: Re: [Pacemaker] drbd + lvm
> 
> > > Has anyone had this issue and resolved it? Any ideas? Thanks in advance!
> > 
> > Yep, i've hit this as well. Use the latest LVM agent. I already fixed all
> > of this.
> > 
> > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM
> > 
> > Keep your volume_list the way it is and use the 'exclusive=true' LVM
> > option.   This will allow the LVM agent to activate volumes that don't
> > exist in the volume_list.
> > 
> > Hope that helps
> 
> Thanks for the fast response. I upgraded LVM to the backports
> (2.02.95-4ubuntu1.1~precise1) and used this script, but I am getting errors
> when one of the nodes tries to activate the VG.
> 
> The log:
> Mar 13 23:21:03 lxc02 LVM[7235]: INFO: 0 logical volume(s) in volume group
> "replicated" now active
> Mar 13 23:21:03 lxc02 LVM[7235]: INFO: LVM Volume replicated is not available
> (stopped)
> 
> "exclusive" is true and the tag is "pacemaker". Someone got hints? tia!

Yeah, those aren't errors. It's just telling you that the LVM agent stopped 
successfully. I would expect to see these after you did a failover or resource 
recovery.

Is the resource not starting and stopping correctly for you? If not, I'll need 
more logs.

-- Vossel

> infoomatic
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd + lvm

2014-03-13 Thread Infoomatic
> > Has anyone had this issue and resolved it? Any ideas? Thanks in advance!
> 
> Yep, i've hit this as well. Use the latest LVM agent. I already fixed all of 
> this.
> 
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM
> 
> Keep your volume_list the way it is and use the 'exclusive=true' LVM option.  
>  This will allow the LVM agent to activate volumes that don't exist in the 
> volume_list.
> 
> Hope that helps

Thanks for the fast response. I upgraded LVM to the backports 
(2.02.95-4ubuntu1.1~precise1) and used this script, but I am getting errors 
when one of the nodes tries to activate the VG.

The log:
Mar 13 23:21:03 lxc02 LVM[7235]: INFO: 0 logical volume(s) in volume group 
"replicated" now active
Mar 13 23:21:03 lxc02 LVM[7235]: INFO: LVM Volume replicated is not available 
(stopped)

"exclusive" is true and the tag is "pacemaker". Someone got hints? tia!

infoomatic


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd + lvm

2014-03-13 Thread David Vossel




- Original Message -
> From: "Infoomatic" 
> To: pacemaker@oss.clusterlabs.org
> Sent: Thursday, March 13, 2014 2:26:00 PM
> Subject: [Pacemaker] drbd + lvm
> 
> Hi list,
> 
> I am having troubles with pacemaker and lvm and stacked drbd resources.
> The system consists of 2 Ubuntu 12 LTS servers, each having two partitions of
> an underlying raid 1+0 as volume group with one LV each as a drbd backing
> device. The purpose is for usage with VMs and adjusting needed disk space
> flexible, so on top of the drbd resources there are LVs for each VM.
> I created a stack with LCMC, which is like:
> 
> DRBD->LV->libvirt and
> DRBD->LV->Filesystem->lxc
> 
> The problem now: the system has "hickups" - when VM01 runs on HOST01 (being
> primary DRBD) and HOST02 is restarting, lvm is reloaded (at boot time) and
> the LVs are being activated. This of course results in an error, the log
> entry:
> 
> Mar 13 17:58:42 host01 pengine: [27563]: ERROR: native_create_actions:
> Resource res_LVM_1 (ocf::LVM) is active on 2 nodes attempting recovery
> 
> Therefore, as configured, the resource is stopped and started again (on only
> one node). Thus, all VMs and containers relying on this are also restared.
> 
> When I disable the LVs that use the DRBD resource at boot (lvm.conf:
> volume_list only containing the VG from the partitions of the raidsystem) a
> reboot of the secondary does not restart the VMs running on the primary.
> However, if the primary goes down (e.g. power interruption), the secondary
> cannot activate the LVs of the VMs because they are not in the list of
> lvm.conf to be activated.
> 
> Has anyone had this issue and resolved it? Any ideas? Thanks in advance!

Yep, i've hit this as well. Use the latest LVM agent. I already fixed all of 
this.

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM

Keep your volume_list the way it is and use the 'exclusive=true' LVM option.   
This will allow the LVM agent to activate volumes that don't exist in the 
volume_list.

Hope that helps

-- Vossel





> 
> infoomatic
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd + lvm

2014-03-13 Thread Digimer

On 13/03/14 03:26 PM, Infoomatic wrote:

Hi list,

I am having troubles with pacemaker and lvm and stacked drbd resources.
The system consists of 2 Ubuntu 12 LTS servers, each having two partitions of 
an underlying raid 1+0 as volume group with one LV each as a drbd backing 
device. The purpose is for usage with VMs and adjusting needed disk space 
flexible, so on top of the drbd resources there are LVs for each VM.
I created a stack with LCMC, which is like:

DRBD->LV->libvirt and
DRBD->LV->Filesystem->lxc

The problem now: the system has "hickups" - when VM01 runs on HOST01 (being 
primary DRBD) and HOST02 is restarting, lvm is reloaded (at boot time) and the LVs are 
being activated. This of course results in an error, the log entry:

Mar 13 17:58:42 host01 pengine: [27563]: ERROR: native_create_actions: Resource 
res_LVM_1 (ocf::LVM) is active on 2 nodes attempting recovery

Therefore, as configured, the resource is stopped and started again (on only 
one node). Thus, all VMs and containers relying on this are also restared.

When I disable the LVs that use the DRBD resource at boot (lvm.conf: 
volume_list only containing the VG from the partitions of the raidsystem) a 
reboot of the secondary does not restart the VMs running on the primary. 
However, if the primary goes down (e.g. power interruption), the secondary 
cannot activate the LVs of the VMs because they are not in the list of lvm.conf 
to be activated.

Has anyone had this issue and resolved it? Any ideas? Thanks in advance!

infoomatic


Are you using clustered LVM?

I don't know much about ubuntu, but I do it this way:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Configuring_DRBD (and the 
section after it on LVM setup). You can obviously ignore the EL6 
specific stuff and the concepts will map. I've used this setup (DRBD + 
clvmd + LV-per-VM) for years now without issue.


I'm in the early stages of testing this on Pacemaker, and so far, so 
good. So I see no reason why the concepts from that tutorial can't be 
well ported to pacemaker in your environment.


Cheers

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] drbd + lvm

2014-03-13 Thread Infoomatic
Hi list,

I am having troubles with pacemaker and lvm and stacked drbd resources.
The system consists of 2 Ubuntu 12 LTS servers, each having two partitions of 
an underlying raid 1+0 as volume group with one LV each as a drbd backing 
device. The purpose is for usage with VMs and adjusting needed disk space 
flexible, so on top of the drbd resources there are LVs for each VM.
I created a stack with LCMC, which is like:

DRBD->LV->libvirt and
DRBD->LV->Filesystem->lxc

The problem now: the system has "hickups" - when VM01 runs on HOST01 (being 
primary DRBD) and HOST02 is restarting, lvm is reloaded (at boot time) and the 
LVs are being activated. This of course results in an error, the log entry:

Mar 13 17:58:42 host01 pengine: [27563]: ERROR: native_create_actions: Resource 
res_LVM_1 (ocf::LVM) is active on 2 nodes attempting recovery

Therefore, as configured, the resource is stopped and started again (on only 
one node). Thus, all VMs and containers relying on this are also restared.

When I disable the LVs that use the DRBD resource at boot (lvm.conf: 
volume_list only containing the VG from the partitions of the raidsystem) a 
reboot of the secondary does not restart the VMs running on the primary. 
However, if the primary goes down (e.g. power interruption), the secondary 
cannot activate the LVs of the VMs because they are not in the list of lvm.conf 
to be activated.

Has anyone had this issue and resolved it? Any ideas? Thanks in advance!

infoomatic

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-26 Thread Vladislav Bogdanov
Dennis Jacobfeuerborn  wrote:

>On 26.03.2013 06:14, Vladislav Bogdanov wrote:
>> 26.03.2013 04:23, Dennis Jacobfeuerborn wrote:
>>> I have now reduced the configuration further and removed LVM from
>the
>>> picture. Still the cluster fails when I set the master node to
>standby.
>>> What's interesting is that things get fixed when I issue a simple
>>> "cleanup" for the filesystem resource.
>>>
>>> This is what my current config looks like:
>>>
>>> node nfs1 \
>>>  attributes standby="off"
>>> node nfs2
>>> primitive p_drbd_web1 ocf:linbit:drbd \
>>>  params drbd_resource="web1" \
>>>  op monitor interval="15" role="Master" \
>>>  op monitor interval="30" role="Slave"
>>> primitive p_fs_web1 ocf:heartbeat:Filesystem \
>>>  params device="/dev/drbd0" \
>>>  directory="/srv/nfs/web1" fstype="ext4" \
>>>  op monitor interval="10s"
>>> ms ms_drbd_web1 p_drbd_web1 \
>>>  meta master-max="1" master-node-max="1" \
>>>  clone-max="2" clone-node-max="1" notify="true"
>>> colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1
>>
>> Above means: "colocate ms_drbd_web1:Master with p_fs_web1", or
>"promote
>> ms_drbd_web1 where p_fs_web1 is (or "is about to be")".
>>
>> Probably that is not exactly what you want (although that is also
>valid,
>> but uses different logic internally). I usually place resources in a
>> different order in colocation and order constraints, and that works.
>
>Indeed I had the colocation semnatics backwards. With that change the 
>failover work correctly, thanks!
>
>I still have problems when using LVM although these don't seem to be 
>pacemaker related. I have defined /dev/drbd0 with a backing device 
>/dev/vdb. The problem is that when I create a physical volume on 
>/dev/drbd0 and do a pvs the output shows the physical volume on
>/dev/vdb 
>instead. I already disabled caching in /etc/lvm/lvm.conf, prepended a 
>filter "r|/dev/vdb.*|" and recreated the initramfs and reboot but LVM 
>still sees the backing device as the physical volume and not the 
>actually replicated /dev/drbd0.
>
>Any idea why LVM is still scanning /dev/vdb for physical volumes
>despite 
>the filter?
>

That is because /dev has a lot of symlinks which point to that vdb device file. 
You'd better allow what is allowed and deny everything else.

>Regards,
>   Dennis
>
>___
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-26 Thread Dennis Jacobfeuerborn

On 26.03.2013 06:14, Vladislav Bogdanov wrote:

26.03.2013 04:23, Dennis Jacobfeuerborn wrote:

I have now reduced the configuration further and removed LVM from the
picture. Still the cluster fails when I set the master node to standby.
What's interesting is that things get fixed when I issue a simple
"cleanup" for the filesystem resource.

This is what my current config looks like:

node nfs1 \
 attributes standby="off"
node nfs2
primitive p_drbd_web1 ocf:linbit:drbd \
 params drbd_resource="web1" \
 op monitor interval="15" role="Master" \
 op monitor interval="30" role="Slave"
primitive p_fs_web1 ocf:heartbeat:Filesystem \
 params device="/dev/drbd0" \
 directory="/srv/nfs/web1" fstype="ext4" \
 op monitor interval="10s"
ms ms_drbd_web1 p_drbd_web1 \
 meta master-max="1" master-node-max="1" \
 clone-max="2" clone-node-max="1" notify="true"
colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1


Above means: "colocate ms_drbd_web1:Master with p_fs_web1", or "promote
ms_drbd_web1 where p_fs_web1 is (or "is about to be")".

Probably that is not exactly what you want (although that is also valid,
but uses different logic internally). I usually place resources in a
different order in colocation and order constraints, and that works.


Indeed I had the colocation semnatics backwards. With that change the 
failover work correctly, thanks!


I still have problems when using LVM although these don't seem to be 
pacemaker related. I have defined /dev/drbd0 with a backing device 
/dev/vdb. The problem is that when I create a physical volume on 
/dev/drbd0 and do a pvs the output shows the physical volume on /dev/vdb 
instead. I already disabled caching in /etc/lvm/lvm.conf, prepended a 
filter "r|/dev/vdb.*|" and recreated the initramfs and reboot but LVM 
still sees the backing device as the physical volume and not the 
actually replicated /dev/drbd0.


Any idea why LVM is still scanning /dev/vdb for physical volumes despite 
the filter?


Regards,
  Dennis

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-26 Thread emmanuel segura
Hello Dennis

This constrain is wrong

colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1

it should be

colocation c_web1_on_drbd inf: p_fs_web1 ms_drbd_web1:Master

Thanks

2013/3/26 Dennis Jacobfeuerborn 

> I have now reduced the configuration further and removed LVM from the
> picture. Still the cluster fails when I set the master node to standby.
> What's interesting is that things get fixed when I issue a simple
> "cleanup" for the filesystem resource.
>
> This is what my current config looks like:
>
> node nfs1 \
> attributes standby="off"
> node nfs2
> primitive p_drbd_web1 ocf:linbit:drbd \
> params drbd_resource="web1" \
> op monitor interval="15" role="Master" \
> op monitor interval="30" role="Slave"
> primitive p_fs_web1 ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" \
> directory="/srv/nfs/web1" fstype="ext4" \
> op monitor interval="10s"
> ms ms_drbd_web1 p_drbd_web1 \
> meta master-max="1" master-node-max="1" \
> clone-max="2" clone-node-max="1" notify="true"
> colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1
> order o_drbd_before_web1 inf: ms_drbd_web1:promote p_fs_web1
> property $id="cib-bootstrap-options" \
> dc-version="1.1.8-7.el6-**394e906" \
> cluster-infrastructure="**classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1364259713" \
> maintenance-mode="false"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
> I cannot figure out what is wrong with this configuration.
>
> Regards,
>   Dennis
>
> On 25.03.2013 13:09, Dennis Jacobfeuerborn wrote:
>
>> I just found the following in the dmesg output which might or might not
>> add to understanding the problem:
>>
>> device-mapper: table: 253:2: linear: dm-linear: Device lookup failed
>> device-mapper: ioctl: error adding target to table
>>
>> Regards,
>>Dennis
>>
>> On 25.03.2013 13:04, Dennis Jacobfeuerborn wrote:
>>
>>> Hi,
>>> I'm currently trying create a two node redundant NFS setup on CentOS 6.4
>>> using pacemaker and crmsh.
>>>
>>> I use this Document as a starting poing:
>>> https://www.suse.com/**documentation/sle_ha/**singlehtml/book_sleha_**
>>> techguides/book_sleha_**techguides.html
>>>
>>>
>>>
>>> The first issue is that using these instructions I get the cluster up
>>> and running but the moment I try to stop the pacemaker service on the
>>> current master node several resources just fail and everything goes
>>> pear-shaped.
>>>
>>> Since the problem seems to relate to the nfs bits in the configuration I
>>> removed these in order to get to a minimal working setup and then add
>>> things piece by piece in order to find the source of the problem.
>>>
>>> Now I am at a point where I basically have only
>>> DRBD+LVM+Filesystems+IPAddr2 configured and now LVM seems to act up.
>>>
>>> I can start the cluster and everything is fine but the moment I stop
>>> pacemaker on the master i end up with this as a status:
>>>
>>> ===
>>> Node nfs2: standby
>>> Online: [ nfs1 ]
>>>
>>>   Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
>>>   Masters: [ nfs1 ]
>>>   Stopped: [ p_drbd_nfs:1 ]
>>>
>>> Failed actions:
>>>  p_lvm_nfs_start_0 (node=nfs1, call=505, rc=1, status=complete):
>>> unknown error
>>> ===
>>>
>>> and in the log on nfs1 I see:
>>> LVM(p_lvm_nfs)[7515]:2013/03/25_12:34:21 ERROR: device-mapper:
>>> reload ioctl on failed: Invalid argument device-mapper: reload ioctl on
>>> failed: Invalid argument 2 logical volume(s) in volume group "nfs" now
>>> active
>>>
>>> However a lvs in this state shows:
>>> [root@nfs1 ~]# lvs
>>>LV  VGAttr  LSize   Pool Origin Data%  Move Log
>>>web1nfs   -wi--   2,00g
>>>web2nfs   -wi--   2,00g
>>>lv_root vg_nfs1.local -wi-ao---   2,45g
>>>lv_swap vg_nfs1.local -wi-ao--- 256,00m
>>>
>>> So the volume group is present.
>>>
>>> My current configuration looks like this:
>>>
>>> node nfs1 \
>>>  attributes standby="off"
>>> node nfs2 \
>>>  attributes standby="on"
>>> primitive p_drbd_nfs ocf:linbit:drbd \
>>>  params drbd_resource="nfs" \
>>>  op monitor interval="15" role="Master" \
>>>  op monitor interval="30" role="Slave"
>>> primitive p_fs_web1 ocf:heartbeat:Filesystem \
>>>  params device="/dev/nfs/web1" \
>>>directory="/srv/nfs/web1" \
>>>fstype="ext4" \
>>>  op monitor interval="10s"
>>> primitive p_fs_web2 ocf:heartbeat:Filesystem \
>>>  params device="/dev/nfs/web2" \
>>>directory="/srv/nfs/web2" \
>>>fstype="ext4" \
>>>  op monitor interval="10s"
>>> primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
>>>  params ip="10.99.0.142" cid

Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-25 Thread Vladislav Bogdanov
26.03.2013 04:23, Dennis Jacobfeuerborn wrote:
> I have now reduced the configuration further and removed LVM from the
> picture. Still the cluster fails when I set the master node to standby.
> What's interesting is that things get fixed when I issue a simple
> "cleanup" for the filesystem resource.
> 
> This is what my current config looks like:
> 
> node nfs1 \
> attributes standby="off"
> node nfs2
> primitive p_drbd_web1 ocf:linbit:drbd \
> params drbd_resource="web1" \
> op monitor interval="15" role="Master" \
> op monitor interval="30" role="Slave"
> primitive p_fs_web1 ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" \
> directory="/srv/nfs/web1" fstype="ext4" \
> op monitor interval="10s"
> ms ms_drbd_web1 p_drbd_web1 \
> meta master-max="1" master-node-max="1" \
> clone-max="2" clone-node-max="1" notify="true"
> colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1

Above means: "colocate ms_drbd_web1:Master with p_fs_web1", or "promote
ms_drbd_web1 where p_fs_web1 is (or "is about to be")".

Probably that is not exactly what you want (although that is also valid,
but uses different logic internally). I usually place resources in a
different order in colocation and order constraints, and that works.

> order o_drbd_before_web1 inf: ms_drbd_web1:promote p_fs_web1
> property $id="cib-bootstrap-options" \
> dc-version="1.1.8-7.el6-394e906" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1364259713" \
> maintenance-mode="false"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
> 
> I cannot figure out what is wrong with this configuration.
> 
> Regards,
>   Dennis
> 
> On 25.03.2013 13:09, Dennis Jacobfeuerborn wrote:
>> I just found the following in the dmesg output which might or might not
>> add to understanding the problem:
>>
>> device-mapper: table: 253:2: linear: dm-linear: Device lookup failed
>> device-mapper: ioctl: error adding target to table
>>
>> Regards,
>>Dennis
>>
>> On 25.03.2013 13:04, Dennis Jacobfeuerborn wrote:
>>> Hi,
>>> I'm currently trying create a two node redundant NFS setup on CentOS 6.4
>>> using pacemaker and crmsh.
>>>
>>> I use this Document as a starting poing:
>>> https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html
>>>
>>>
>>>
>>>
>>> The first issue is that using these instructions I get the cluster up
>>> and running but the moment I try to stop the pacemaker service on the
>>> current master node several resources just fail and everything goes
>>> pear-shaped.
>>>
>>> Since the problem seems to relate to the nfs bits in the configuration I
>>> removed these in order to get to a minimal working setup and then add
>>> things piece by piece in order to find the source of the problem.
>>>
>>> Now I am at a point where I basically have only
>>> DRBD+LVM+Filesystems+IPAddr2 configured and now LVM seems to act up.
>>>
>>> I can start the cluster and everything is fine but the moment I stop
>>> pacemaker on the master i end up with this as a status:
>>>
>>> ===
>>> Node nfs2: standby
>>> Online: [ nfs1 ]
>>>
>>>   Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
>>>   Masters: [ nfs1 ]
>>>   Stopped: [ p_drbd_nfs:1 ]
>>>
>>> Failed actions:
>>>  p_lvm_nfs_start_0 (node=nfs1, call=505, rc=1, status=complete):
>>> unknown error
>>> ===
>>>
>>> and in the log on nfs1 I see:
>>> LVM(p_lvm_nfs)[7515]:2013/03/25_12:34:21 ERROR: device-mapper:
>>> reload ioctl on failed: Invalid argument device-mapper: reload ioctl on
>>> failed: Invalid argument 2 logical volume(s) in volume group "nfs" now
>>> active
>>>
>>> However a lvs in this state shows:
>>> [root@nfs1 ~]# lvs
>>>LV  VGAttr  LSize   Pool Origin Data%  Move Log
>>>web1nfs   -wi--   2,00g
>>>web2nfs   -wi--   2,00g
>>>lv_root vg_nfs1.local -wi-ao---   2,45g
>>>lv_swap vg_nfs1.local -wi-ao--- 256,00m
>>>
>>> So the volume group is present.
>>>
>>> My current configuration looks like this:
>>>
>>> node nfs1 \
>>>  attributes standby="off"
>>> node nfs2 \
>>>  attributes standby="on"
>>> primitive p_drbd_nfs ocf:linbit:drbd \
>>>  params drbd_resource="nfs" \
>>>  op monitor interval="15" role="Master" \
>>>  op monitor interval="30" role="Slave"
>>> primitive p_fs_web1 ocf:heartbeat:Filesystem \
>>>  params device="/dev/nfs/web1" \
>>>directory="/srv/nfs/web1" \
>>>fstype="ext4" \
>>>  op monitor interval="10s"
>>> primitive p_fs_web2 ocf:heartbeat:Filesystem \
>>>  params device="/dev/nfs/web2" \
>>>directory="/srv/nfs/web2" \
>>>fstype="ext4" \
>>>  op monitor interval="10s"
>>> primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
>>>  params ip="10.99.0.142" cidr_netmask="24"

Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-25 Thread Dennis Jacobfeuerborn
I have now reduced the configuration further and removed LVM from the 
picture. Still the cluster fails when I set the master node to standby.
What's interesting is that things get fixed when I issue a simple 
"cleanup" for the filesystem resource.


This is what my current config looks like:

node nfs1 \
attributes standby="off"
node nfs2
primitive p_drbd_web1 ocf:linbit:drbd \
params drbd_resource="web1" \
op monitor interval="15" role="Master" \
op monitor interval="30" role="Slave"
primitive p_fs_web1 ocf:heartbeat:Filesystem \
params device="/dev/drbd0" \
directory="/srv/nfs/web1" fstype="ext4" \
op monitor interval="10s"
ms ms_drbd_web1 p_drbd_web1 \
meta master-max="1" master-node-max="1" \
clone-max="2" clone-node-max="1" notify="true"
colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1
order o_drbd_before_web1 inf: ms_drbd_web1:promote p_fs_web1
property $id="cib-bootstrap-options" \
dc-version="1.1.8-7.el6-394e906" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1364259713" \
maintenance-mode="false"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

I cannot figure out what is wrong with this configuration.

Regards,
  Dennis

On 25.03.2013 13:09, Dennis Jacobfeuerborn wrote:

I just found the following in the dmesg output which might or might not
add to understanding the problem:

device-mapper: table: 253:2: linear: dm-linear: Device lookup failed
device-mapper: ioctl: error adding target to table

Regards,
   Dennis

On 25.03.2013 13:04, Dennis Jacobfeuerborn wrote:

Hi,
I'm currently trying create a two node redundant NFS setup on CentOS 6.4
using pacemaker and crmsh.

I use this Document as a starting poing:
https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html



The first issue is that using these instructions I get the cluster up
and running but the moment I try to stop the pacemaker service on the
current master node several resources just fail and everything goes
pear-shaped.

Since the problem seems to relate to the nfs bits in the configuration I
removed these in order to get to a minimal working setup and then add
things piece by piece in order to find the source of the problem.

Now I am at a point where I basically have only
DRBD+LVM+Filesystems+IPAddr2 configured and now LVM seems to act up.

I can start the cluster and everything is fine but the moment I stop
pacemaker on the master i end up with this as a status:

===
Node nfs2: standby
Online: [ nfs1 ]

  Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
  Masters: [ nfs1 ]
  Stopped: [ p_drbd_nfs:1 ]

Failed actions:
 p_lvm_nfs_start_0 (node=nfs1, call=505, rc=1, status=complete):
unknown error
===

and in the log on nfs1 I see:
LVM(p_lvm_nfs)[7515]:2013/03/25_12:34:21 ERROR: device-mapper:
reload ioctl on failed: Invalid argument device-mapper: reload ioctl on
failed: Invalid argument 2 logical volume(s) in volume group "nfs" now
active

However a lvs in this state shows:
[root@nfs1 ~]# lvs
   LV  VGAttr  LSize   Pool Origin Data%  Move Log
   web1nfs   -wi--   2,00g
   web2nfs   -wi--   2,00g
   lv_root vg_nfs1.local -wi-ao---   2,45g
   lv_swap vg_nfs1.local -wi-ao--- 256,00m

So the volume group is present.

My current configuration looks like this:

node nfs1 \
 attributes standby="off"
node nfs2 \
 attributes standby="on"
primitive p_drbd_nfs ocf:linbit:drbd \
 params drbd_resource="nfs" \
 op monitor interval="15" role="Master" \
 op monitor interval="30" role="Slave"
primitive p_fs_web1 ocf:heartbeat:Filesystem \
 params device="/dev/nfs/web1" \
   directory="/srv/nfs/web1" \
   fstype="ext4" \
 op monitor interval="10s"
primitive p_fs_web2 ocf:heartbeat:Filesystem \
 params device="/dev/nfs/web2" \
   directory="/srv/nfs/web2" \
   fstype="ext4" \
 op monitor interval="10s"
primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
 params ip="10.99.0.142" cidr_netmask="24" \
 op monitor interval="30s"
primitive p_lvm_nfs ocf:heartbeat:LVM \
 params volgrpname="nfs" \
 op monitor interval="30s"
group g_nfs p_lvm_nfs p_fs_web1 p_fs_web2 p_ip_nfs
ms ms_drbd_nfs p_drbd_nfs \
 meta master-max="1" \
   master-node-max="1" \
   clone-max="2" \
   clone-node-max="1" \
   notify="true"
colocation c_nfs_on_drbd inf: g_nfs ms_drbd_nfs:Master
property $id="cib-bootstrap-options" \
 dc-version="1.1.8-7.el6-394e906" \
 cluster-infrastructure="classic openais (with plugin)" \
 expected-quorum-votes="2" \
 stonith-enabled="false" \
 no-quorum-policy="ignore" \
 last-lrm-refresh="1364212090" \

Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-25 Thread Dennis Jacobfeuerborn
I just found the following in the dmesg output which might or might not 
add to understanding the problem:


device-mapper: table: 253:2: linear: dm-linear: Device lookup failed
device-mapper: ioctl: error adding target to table

Regards,
  Dennis

On 25.03.2013 13:04, Dennis Jacobfeuerborn wrote:

Hi,
I'm currently trying create a two node redundant NFS setup on CentOS 6.4
using pacemaker and crmsh.

I use this Document as a starting poing:
https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html


The first issue is that using these instructions I get the cluster up
and running but the moment I try to stop the pacemaker service on the
current master node several resources just fail and everything goes
pear-shaped.

Since the problem seems to relate to the nfs bits in the configuration I
removed these in order to get to a minimal working setup and then add
things piece by piece in order to find the source of the problem.

Now I am at a point where I basically have only
DRBD+LVM+Filesystems+IPAddr2 configured and now LVM seems to act up.

I can start the cluster and everything is fine but the moment I stop
pacemaker on the master i end up with this as a status:

===
Node nfs2: standby
Online: [ nfs1 ]

  Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
  Masters: [ nfs1 ]
  Stopped: [ p_drbd_nfs:1 ]

Failed actions:
 p_lvm_nfs_start_0 (node=nfs1, call=505, rc=1, status=complete):
unknown error
===

and in the log on nfs1 I see:
LVM(p_lvm_nfs)[7515]:2013/03/25_12:34:21 ERROR: device-mapper:
reload ioctl on failed: Invalid argument device-mapper: reload ioctl on
failed: Invalid argument 2 logical volume(s) in volume group "nfs" now
active

However a lvs in this state shows:
[root@nfs1 ~]# lvs
   LV  VGAttr  LSize   Pool Origin Data%  Move Log
   web1nfs   -wi--   2,00g
   web2nfs   -wi--   2,00g
   lv_root vg_nfs1.local -wi-ao---   2,45g
   lv_swap vg_nfs1.local -wi-ao--- 256,00m

So the volume group is present.

My current configuration looks like this:

node nfs1 \
 attributes standby="off"
node nfs2 \
 attributes standby="on"
primitive p_drbd_nfs ocf:linbit:drbd \
 params drbd_resource="nfs" \
 op monitor interval="15" role="Master" \
 op monitor interval="30" role="Slave"
primitive p_fs_web1 ocf:heartbeat:Filesystem \
 params device="/dev/nfs/web1" \
   directory="/srv/nfs/web1" \
   fstype="ext4" \
 op monitor interval="10s"
primitive p_fs_web2 ocf:heartbeat:Filesystem \
 params device="/dev/nfs/web2" \
   directory="/srv/nfs/web2" \
   fstype="ext4" \
 op monitor interval="10s"
primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
 params ip="10.99.0.142" cidr_netmask="24" \
 op monitor interval="30s"
primitive p_lvm_nfs ocf:heartbeat:LVM \
 params volgrpname="nfs" \
 op monitor interval="30s"
group g_nfs p_lvm_nfs p_fs_web1 p_fs_web2 p_ip_nfs
ms ms_drbd_nfs p_drbd_nfs \
 meta master-max="1" \
   master-node-max="1" \
   clone-max="2" \
   clone-node-max="1" \
   notify="true"
colocation c_nfs_on_drbd inf: g_nfs ms_drbd_nfs:Master
property $id="cib-bootstrap-options" \
 dc-version="1.1.8-7.el6-394e906" \
 cluster-infrastructure="classic openais (with plugin)" \
 expected-quorum-votes="2" \
 stonith-enabled="false" \
 no-quorum-policy="ignore" \
 last-lrm-refresh="1364212090" \
 maintenance-mode="false"
rsc_defaults $id="rsc_defaults-options" \
 resource-stickiness="100"

Any ideas why this isn't working?

Regards,
   Dennis

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] DRBD+LVM+NFS problems

2013-03-25 Thread Dennis Jacobfeuerborn

Hi,
I'm currently trying create a two node redundant NFS setup on CentOS 6.4 
using pacemaker and crmsh.


I use this Document as a starting poing:
https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html

The first issue is that using these instructions I get the cluster up 
and running but the moment I try to stop the pacemaker service on the 
current master node several resources just fail and everything goes 
pear-shaped.


Since the problem seems to relate to the nfs bits in the configuration I 
removed these in order to get to a minimal working setup and then add 
things piece by piece in order to find the source of the problem.


Now I am at a point where I basically have only 
DRBD+LVM+Filesystems+IPAddr2 configured and now LVM seems to act up.


I can start the cluster and everything is fine but the moment I stop 
pacemaker on the master i end up with this as a status:


===
Node nfs2: standby
Online: [ nfs1 ]

 Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
 Masters: [ nfs1 ]
 Stopped: [ p_drbd_nfs:1 ]

Failed actions:
p_lvm_nfs_start_0 (node=nfs1, call=505, rc=1, status=complete): 
unknown error

===

and in the log on nfs1 I see:
LVM(p_lvm_nfs)[7515]:	2013/03/25_12:34:21 ERROR: device-mapper: reload 
ioctl on failed: Invalid argument device-mapper: reload ioctl on failed: 
Invalid argument 2 logical volume(s) in volume group "nfs" now active


However a lvs in this state shows:
[root@nfs1 ~]# lvs
  LV  VGAttr  LSize   Pool Origin Data%  Move Log
  web1nfs   -wi--   2,00g
  web2nfs   -wi--   2,00g
  lv_root vg_nfs1.local -wi-ao---   2,45g
  lv_swap vg_nfs1.local -wi-ao--- 256,00m

So the volume group is present.

My current configuration looks like this:

node nfs1 \
attributes standby="off"
node nfs2 \
attributes standby="on"
primitive p_drbd_nfs ocf:linbit:drbd \
params drbd_resource="nfs" \
op monitor interval="15" role="Master" \
op monitor interval="30" role="Slave"
primitive p_fs_web1 ocf:heartbeat:Filesystem \
params device="/dev/nfs/web1" \
  directory="/srv/nfs/web1" \
  fstype="ext4" \
op monitor interval="10s"
primitive p_fs_web2 ocf:heartbeat:Filesystem \
params device="/dev/nfs/web2" \
  directory="/srv/nfs/web2" \
  fstype="ext4" \
op monitor interval="10s"
primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
params ip="10.99.0.142" cidr_netmask="24" \
op monitor interval="30s"
primitive p_lvm_nfs ocf:heartbeat:LVM \
params volgrpname="nfs" \
op monitor interval="30s"
group g_nfs p_lvm_nfs p_fs_web1 p_fs_web2 p_ip_nfs
ms ms_drbd_nfs p_drbd_nfs \
meta master-max="1" \
  master-node-max="1" \
  clone-max="2" \
  clone-node-max="1" \
  notify="true"
colocation c_nfs_on_drbd inf: g_nfs ms_drbd_nfs:Master
property $id="cib-bootstrap-options" \
dc-version="1.1.8-7.el6-394e906" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1364212090" \
maintenance-mode="false"
rsc_defaults $id="rsc_defaults-options" \
resource-stickiness="100"

Any ideas why this isn't working?

Regards,
  Dennis

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-06-10 Thread Florian Haas
On Sun, Jun 10, 2012 at 4:13 PM, Jake Smith  wrote:
> I ran into that (scheduler change) also after upgrading. I only accidentally
> stumbled onto that fact. I wish Ubuntu had made it a little clearer that not
> having a separate server kernel had more implications than just kernel!

It's correct that the default I/O scheduler was one of the things
where an Ubuntu "server" kernel would differ from a "desktop" kernel.
However, it's prudent in a system with a reasonably high-performance
I/O subsystem (whether that box is being used for DRBD or not) to
always set the scheduler to deadline either in your /etc/sysfs.conf
(as Ubuntu has been shipping with sysfsutils for a while, albeit in
universe) or in your bootloader configuration.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-06-10 Thread Jake Smith

I ran into that (scheduler change) also after upgrading. I only accidentally 
stumbled onto that fact. I wish Ubuntu had made it a little clearer that not 
having a separate server kernel had more implications than just kernel!

Jake

- Reply message -
From: "Christoph Bartoschek" 
To: 
Subject: [Pacemaker] DRBD < LVM < EXT4 < NFS performance
Date: Sun, Jun 10, 2012 8:59 am




Hi,

we did not solve the performance issue yet. However we could improve the 
responsiveness of the system. We no longer get timeouts and reenabled 
pacemaker.

The problem that lead to an unresponsive system was that Ubuntu 12.04 LTS 
uses the cfq I/O-Scheduler by default. Ubuntu 10.04 LTS used the dedline 
scheduler.

After changing the scheduler to deadline on 12.04 the load drastically went 
down and we get not timeouts.

Christoph


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-06-10 Thread Christoph Bartoschek
Hi,

we did not solve the performance issue yet. However we could improve the 
responsiveness of the system. We no longer get timeouts and reenabled 
pacemaker.

The problem that lead to an unresponsive system was that Ubuntu 12.04 LTS 
uses the cfq I/O-Scheduler by default. Ubuntu 10.04 LTS used the dedline 
scheduler.

After changing the scheduler to deadline on 12.04 the load drastically went 
down and we get not timeouts.

Christoph


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-06-02 Thread Christoph Bartoschek

> Dedicated replication link?
> 
> Maybe the additional latency is all that kills you.
> Do you have non-volatile write cache on your IO backend?
> Did you post your drbd configuration setings already?

There is a dedicated 10GB Ethernet replication link between both nodes.

There is also a cache on the IO backend. I have started some additional 
measurments with dd and oflag=direct.

On a remote host I get:

- With  enabled drbd link:   3 MBytes/s
- With disabled drbd link:   9 MBytes/s

On one of the machines locally:

- With  enabled drbd link:  24 MBytes/s
- With disabled drbd link:  74 MBytes/s

Same machine but a parition without Drbd and LVM:

- 90 MBytes/s

This is our current drbd.conf:

global {
usage-count yes;
}
common {
  syncer {
rate 500M;
  }
}
resource lfs {
  protocol C;

  startup {
wfc-timeout 0;
degr-wfc-timeout  120;
  }
  disk {
on-io-error detach;
fencing resource-only;
  }
  handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
  }
  net {
max-buffers8000;
max-epoch-size 8000;
  }
  on d1106i06 {
device /dev/drbd0;
disk   /dev/sda4;
address192.168.2.1:7788;
meta-disk  internal;
  }
  on d1106i07 {
device /dev/drbd0;
disk   /dev/sda4;
address192.168.2.2:7788;
meta-disk  internal;
  }
}

Thanks
Christoph


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-24 Thread Lars Ellenberg
On Thu, May 24, 2012 at 03:34:51PM +0300, Dan Frincu wrote:
> Hi,
> 
> On Mon, May 21, 2012 at 4:24 PM, Christoph Bartoschek  
> wrote:
> > Florian Haas wrote:
> >
> >>> Thus I would expect to have a write performance of about 100 MByte/s. But
> >>> dd gives me only 20 MByte/s.
> >>>
> >>> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
> >>> 1310720+0 records in
> >>> 1310720+0 records out
> >>> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
> >>
> >> If you used that same dd invocation for your local test that allegedly
> >> produced 450 MB/s, you've probably been testing only your page cache.
> >> Add oflag=dsync or oflag=direct (the latter will only work locally, as
> >> NFS doesn't support O_DIRECT).
> >>
> >> If your RAID is one of reasonably contemporary SAS or SATA drives,
> >> then a sustained to-disk throughput of 450 MB/s would require about
> >> 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
> >> got? Or are you writing to SSDs?
> >
> > I used the same invocation with different filenames each time. To which page
> > cache to you refer? To the one on the client or on the server side?
> >
> > We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10
> > times with different files in a row:
> >
> > for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192
> > count=1310720; done
> >
> > The resulting values on a system that is also used by other programs as
> > reported by dd are:
> >
> > 515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320
> > MB/s, 242 MB/s,  289 MB/s
> >
> > So I think that the system is capable of more than 200 MB/s which is way
> > more what can arrive over the network.
> 
> A bit off-topic maybe.
> 
> Whenever you do these kinds of tests regarding performance on disk
> (locally) to test actual speed and not some caching, as Florian said,
> you should use oflag=direct option to dd and also echo 3 >
> /proc/sys/vm/drop_caches and sync.
> 

You should sync before you drop caches,
or you won't drop those caches that have been dirty at that time.

> I usually use echo 3 > /proc/sys/vm/drop_caches && sync && date &&
> time dd if=/dev/zero of=whatever bs=1G count=x oflag=direct && sync &&
> date
> 
> You can assess if there is data being flushed if the results given by
> dd differ from those obtained by calculating the amount of data
> written between the two date calls. It also helps to push more data
> than the controller can store.

Also, dd is doing one bs sized chunk at a time.

fio with appropriate options can be more useful,
once you learned all those options, and how to interpret the results...

> Regards,
> Dan
> 
> >
> > I've done the measurements on the filesystem that sits on top of LVM and
> > DRBD. Thus I think that DRBD is not a problem.
> >
> > However the strange thing is that I get 108 MB/s on the clients as soon as I
> > disable the secondary node for DRBD. Maybe there is strange interaction
> > between DRBD and NFS.

Dedicated replication link?

Maybe the additional latency is all that kills you.
Do you have non-volatile write cache on your IO backend?
Did you post your drbd configuration setings already?

> >
> > After reenabling the secondary node the DRBD synchronization is quite slow.
> >
> >
> >>>
> >>> Has anyone an idea what could cause such problems? I have no idea for
> >>> further analysis.
> >>
> >> As a knee-jerk response, that might be the classic issue of NFS
> >> filling up the page cache until it hits the vm.dirty_ratio and then
> >> having a ton of stuff to write to disk, which the local I/O subsystem
> >> can't cope with.
> >
> > Sounds reasonable but shouldn't the I/O subsystem be capable to write
> > anything away that arrives?
> >
> > Christoph

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-24 Thread Dan Frincu
Hi,

On Mon, May 21, 2012 at 4:24 PM, Christoph Bartoschek  wrote:
> Florian Haas wrote:
>
>>> Thus I would expect to have a write performance of about 100 MByte/s. But
>>> dd gives me only 20 MByte/s.
>>>
>>> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
>>> 1310720+0 records in
>>> 1310720+0 records out
>>> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
>>
>> If you used that same dd invocation for your local test that allegedly
>> produced 450 MB/s, you've probably been testing only your page cache.
>> Add oflag=dsync or oflag=direct (the latter will only work locally, as
>> NFS doesn't support O_DIRECT).
>>
>> If your RAID is one of reasonably contemporary SAS or SATA drives,
>> then a sustained to-disk throughput of 450 MB/s would require about
>> 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
>> got? Or are you writing to SSDs?
>
> I used the same invocation with different filenames each time. To which page
> cache to you refer? To the one on the client or on the server side?
>
> We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10
> times with different files in a row:
>
> for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192
> count=1310720; done
>
> The resulting values on a system that is also used by other programs as
> reported by dd are:
>
> 515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320
> MB/s, 242 MB/s,  289 MB/s
>
> So I think that the system is capable of more than 200 MB/s which is way
> more what can arrive over the network.

A bit off-topic maybe.

Whenever you do these kinds of tests regarding performance on disk
(locally) to test actual speed and not some caching, as Florian said,
you should use oflag=direct option to dd and also echo 3 >
/proc/sys/vm/drop_caches and sync.

I usually use echo 3 > /proc/sys/vm/drop_caches && sync && date &&
time dd if=/dev/zero of=whatever bs=1G count=x oflag=direct && sync &&
date

You can assess if there is data being flushed if the results given by
dd differ from those obtained by calculating the amount of data
written between the two date calls. It also helps to push more data
than the controller can store.

Regards,
Dan

>
> I've done the measurements on the filesystem that sits on top of LVM and
> DRBD. Thus I think that DRBD is not a problem.
>
> However the strange thing is that I get 108 MB/s on the clients as soon as I
> disable the secondary node for DRBD. Maybe there is strange interaction
> between DRBD and NFS.
>
> After reenabling the secondary node the DRBD synchronization is quite slow.
>
>
>>>
>>> Has anyone an idea what could cause such problems? I have no idea for
>>> further analysis.
>>
>> As a knee-jerk response, that might be the classic issue of NFS
>> filling up the page cache until it hits the vm.dirty_ratio and then
>> having a ton of stuff to write to disk, which the local I/O subsystem
>> can't cope with.
>
> Sounds reasonable but shouldn't the I/O subsystem be capable to write
> anything away that arrives?
>
> Christoph
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-21 Thread Christoph Bartoschek
Florian Haas wrote:

>> Thus I would expect to have a write performance of about 100 MByte/s. But
>> dd gives me only 20 MByte/s.
>>
>> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
>> 1310720+0 records in
>> 1310720+0 records out
>> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
> 
> If you used that same dd invocation for your local test that allegedly
> produced 450 MB/s, you've probably been testing only your page cache.
> Add oflag=dsync or oflag=direct (the latter will only work locally, as
> NFS doesn't support O_DIRECT).
> 
> If your RAID is one of reasonably contemporary SAS or SATA drives,
> then a sustained to-disk throughput of 450 MB/s would require about
> 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
> got? Or are you writing to SSDs?

I used the same invocation with different filenames each time. To which page 
cache to you refer? To the one on the client or on the server side?

We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10 
times with different files in a row:

for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192  
count=1310720; done

The resulting values on a system that is also used by other programs as 
reported by dd are:

515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320 
MB/s, 242 MB/s,  289 MB/s

So I think that the system is capable of more than 200 MB/s which is way 
more what can arrive over the network.

I've done the measurements on the filesystem that sits on top of LVM and 
DRBD. Thus I think that DRBD is not a problem.

However the strange thing is that I get 108 MB/s on the clients as soon as I 
disable the secondary node for DRBD. Maybe there is strange interaction 
between DRBD and NFS.

After reenabling the secondary node the DRBD synchronization is quite slow.


>>
>> Has anyone an idea what could cause such problems? I have no idea for
>> further analysis.
> 
> As a knee-jerk response, that might be the classic issue of NFS
> filling up the page cache until it hits the vm.dirty_ratio and then
> having a ton of stuff to write to disk, which the local I/O subsystem
> can't cope with.

Sounds reasonable but shouldn't the I/O subsystem be capable to write 
anything away that arrives? 

Christoph


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-21 Thread Florian Haas
On Sun, May 20, 2012 at 12:05 PM, Christoph Bartoschek
 wrote:
> Hi,
>
> we have a two node setup with drbd below LVM and an Ext4 filesystem that is
> shared vi NFS. The system shows low performance and lots of timeouts
> resulting in unnecessary failovers from pacemaker.
>
> The connection between both nodes is capable of 1 GByte/s as shown by iperf.
> The network between the clients and the nodes is capable of 110 MByte/s. The
> RAID can be filled with 450 MByte/s.

No it can't (most likely); see below.

> Thus I would expect to have a write performance of about 100 MByte/s. But dd
> gives me only 20 MByte/s.
>
> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
> 1310720+0 records in
> 1310720+0 records out
> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s

If you used that same dd invocation for your local test that allegedly
produced 450 MB/s, you've probably been testing only your page cache.
Add oflag=dsync or oflag=direct (the latter will only work locally, as
NFS doesn't support O_DIRECT).

If your RAID is one of reasonably contemporary SAS or SATA drives,
then a sustained to-disk throughput of 450 MB/s would require about
7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
got? Or are you writing to SSDs?

> While the slow dd runs there are timeouts on the server resulting in a
> restart of some resources. In the logfile I also see:
>
> [329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds.
> [329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [329014.593273] nfsd            D 0007     0  2252      2
> 0x
> [329014.593278]  88060a847c40 0046 88060a847bf8
> 00030001
> [329014.593284]  88060a847fd8 88060a847fd8 88060a847fd8
> 00013780
> [329014.593290]  8806091416f0 8806085bc4d0 88060a847c50
> 88061870c800
> [329014.593295] Call Trace:
> [329014.593303]  [] schedule+0x3f/0x60
> [329014.593309]  [] jbd2_log_wait_commit+0xb5/0x130
> [329014.593315]  [] ? add_wait_queue+0x60/0x60
> [329014.593321]  [] ext4_sync_file+0x208/0x2d0
> [329014.593328]  [] vfs_fsync_range+0x1d/0x40
> [329014.593339]  [] nfsd_commit+0xb1/0xd0 [nfsd]
> [329014.593349]  [] nfsd3_proc_commit+0x9d/0x100 [nfsd]
> [329014.593356]  [] nfsd_dispatch+0xeb/0x230 [nfsd]
> [329014.593373]  [] svc_process_common+0x345/0x690
> [sunrpc]
> [329014.593379]  [] ? try_to_wake_up+0x200/0x200
> [329014.593391]  [] svc_process+0x102/0x150 [sunrpc]
> [329014.593397]  [] nfsd+0xbd/0x160 [nfsd]
> [329014.593403]  [] ? nfsd_startup+0xf0/0xf0 [nfsd]
> [329014.593407]  [] kthread+0x8c/0xa0
> [329014.593412]  [] kernel_thread_helper+0x4/0x10
> [329014.593416]  [] ? flush_kthread_worker+0xa0/0xa0
> [329014.593420]  [] ? gs_change+0x13/0x13
>
>
> Has anyone an idea what could cause such problems? I have no idea for
> further analysis.

As a knee-jerk response, that might be the classic issue of NFS
filling up the page cache until it hits the vm.dirty_ratio and then
having a ton of stuff to write to disk, which the local I/O subsystem
can't cope with.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-20 Thread Christoph Bartoschek
Raoul Bhatia [IPAX] wrote:

> i haven't seen such issue during my current tests.
> 
>> Is ext4 unsuitable for such a setup? Or is the linux nfs3 implementation
>> broken? Are buffers too large such that one has too wait too long for a
>> flush?
> 
> Maybe I'll have the time to switch form xfs to ext4 and retest
> during the next couple of days. But I cannot guarantee anything.
> 
> Maybe you could try switching to XFS instead?

Thanks for the numbers. I just see that a local dd on the server also 
achieves up to 450 MBytes/s. So it seems to be NFS related.

Christoph


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-20 Thread Raoul Bhatia [IPAX]

On 2012-05-20 12:05, Christoph Bartoschek wrote:

Hi,

we have a two node setup with drbd below LVM and an Ext4 filesystem that is
shared vi NFS. The system shows low performance and lots of timeouts
resulting in unnecessary failovers from pacemaker.

The connection between both nodes is capable of 1 GByte/s as shown by iperf.
The network between the clients and the nodes is capable of 110 MByte/s. The
RAID can be filled with 450 MByte/s.

Thus I would expect to have a write performance of about 100 MByte/s. But dd
gives me only 20 MByte/s.

dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s


to give you some numbers to compare:

I've got a small XFS file system, which i'm currently testing with.
Using a single thread and NFS4 only:

my configuration:
nfsserver:
# exportfs -v
/data/export 
192.168.100.0/24(rw,wdelay,no_root_squash,no_subtree_check,fsid=1000)



nfsclient mount
192.168.100.200:/data/export on /mnt type nfs 
(rw,nosuid,nodev,nodiratime,relatime,vers=4,addr=192.168.100.200,clientaddr=192.168.100.107)


via network (1gbit connection for both drbd sync and nfs)
  # dd if=/dev/zero of=bigfile.10G bs=6192  count=1310720
  1310720+0 records in
  1310720+0 records out
  8115978240 bytes (8.1 GB) copied, 140.279 s, 57.9 MB/s

on the same machine so that 1gbit is for drbd only:
  # dd if=/dev/zero of=bigfile.10G bs=6192  count=1310720
  1310720+0 records in
  1310720+0 records out
  8115978240 bytes (8.1 GB) copied, 70.9297 s, 114 MB/s

Maybe this numbers and configuration helps?

Cheers,
Raoul


While the slow dd runs there are timeouts on the server resulting in a
restart of some resources. In the logfile I also see:

[329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds.
[329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[329014.593273] nfsdD 0007 0  2252  2
0x

...

Has anyone an idea what could cause such problems? I have no idea for
further analysis.


i haven't seen such issue during my current tests.


Is ext4 unsuitable for such a setup? Or is the linux nfs3 implementation
broken? Are buffers too large such that one has too wait too long for a
flush?


Maybe I'll have the time to switch form xfs to ext4 and retest
during the next couple of days. But I cannot guarantee anything.

Maybe you could try switching to XFS instead?

Cheers;
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-20 Thread emmanuel segura
It's normal setup without LVM, EXT4 and NFS it works fine, you don't have 3
layers in more

2012/5/20 Christoph Bartoschek 

> emmanuel segura wrote:
>
> > Hello Christoph
> >
> > For make some tuning on drbd you can look this link
> >
> > http://www.drbd.org/users-guide/s-latency-tuning.html
> >
>
> Hi,
>
> I do not have the impression that drbd is the problem here because a
> similar
> setup without LVM, EXT4 and NFS above it works fine.
>
> However I tried the suggestions without success.
>
> Christoph
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-20 Thread Christoph Bartoschek
emmanuel segura wrote:

> Hello Christoph
> 
> For make some tuning on drbd you can look this link
> 
> http://www.drbd.org/users-guide/s-latency-tuning.html
> 

Hi,

I do not have the impression that drbd is the problem here because a similar 
setup without LVM, EXT4 and NFS above it works fine.

However I tried the suggestions without success.

Christoph


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-20 Thread emmanuel segura
Hello Christoph

For make some tuning on drbd you can look this link

http://www.drbd.org/users-guide/s-latency-tuning.html

2012/5/20 Christoph Bartoschek 

> Hi,
>
> we have a two node setup with drbd below LVM and an Ext4 filesystem that is
> shared vi NFS. The system shows low performance and lots of timeouts
> resulting in unnecessary failovers from pacemaker.
>
> The connection between both nodes is capable of 1 GByte/s as shown by
> iperf.
> The network between the clients and the nodes is capable of 110 MByte/s.
> The
> RAID can be filled with 450 MByte/s.
>
> Thus I would expect to have a write performance of about 100 MByte/s. But
> dd
> gives me only 20 MByte/s.
>
> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
> 1310720+0 records in
> 1310720+0 records out
> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
>
> While the slow dd runs there are timeouts on the server resulting in a
> restart of some resources. In the logfile I also see:
>
> [329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds.
> [329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [329014.593273] nfsdD 0007 0  2252  2
> 0x
> [329014.593278]  88060a847c40 0046 88060a847bf8
> 00030001
> [329014.593284]  88060a847fd8 88060a847fd8 88060a847fd8
> 00013780
> [329014.593290]  8806091416f0 8806085bc4d0 88060a847c50
> 88061870c800
> [329014.593295] Call Trace:
> [329014.593303]  [] schedule+0x3f/0x60
> [329014.593309]  [] jbd2_log_wait_commit+0xb5/0x130
> [329014.593315]  [] ? add_wait_queue+0x60/0x60
> [329014.593321]  [] ext4_sync_file+0x208/0x2d0
> [329014.593328]  [] vfs_fsync_range+0x1d/0x40
> [329014.593339]  [] nfsd_commit+0xb1/0xd0 [nfsd]
> [329014.593349]  [] nfsd3_proc_commit+0x9d/0x100 [nfsd]
> [329014.593356]  [] nfsd_dispatch+0xeb/0x230 [nfsd]
> [329014.593373]  [] svc_process_common+0x345/0x690
> [sunrpc]
> [329014.593379]  [] ? try_to_wake_up+0x200/0x200
> [329014.593391]  [] svc_process+0x102/0x150 [sunrpc]
> [329014.593397]  [] nfsd+0xbd/0x160 [nfsd]
> [329014.593403]  [] ? nfsd_startup+0xf0/0xf0 [nfsd]
> [329014.593407]  [] kthread+0x8c/0xa0
> [329014.593412]  [] kernel_thread_helper+0x4/0x10
> [329014.593416]  [] ? flush_kthread_worker+0xa0/0xa0
> [329014.593420]  [] ? gs_change+0x13/0x13
>
>
> Has anyone an idea what could cause such problems? I have no idea for
> further analysis.
>
> Is ext4 unsuitable for such a setup? Or is the linux nfs3 implementation
> broken? Are buffers too large such that one has too wait too long for a
> flush?
>
> Thanks
> Christoph Bartoschek
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-20 Thread Christoph Bartoschek
Hi,

we have a two node setup with drbd below LVM and an Ext4 filesystem that is 
shared vi NFS. The system shows low performance and lots of timeouts 
resulting in unnecessary failovers from pacemaker.

The connection between both nodes is capable of 1 GByte/s as shown by iperf. 
The network between the clients and the nodes is capable of 110 MByte/s. The 
RAID can be filled with 450 MByte/s.

Thus I would expect to have a write performance of about 100 MByte/s. But dd 
gives me only 20 MByte/s.

dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s

While the slow dd runs there are timeouts on the server resulting in a 
restart of some resources. In the logfile I also see:

[329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds.
[329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[329014.593273] nfsdD 0007 0  2252  2 
0x
[329014.593278]  88060a847c40 0046 88060a847bf8 
00030001
[329014.593284]  88060a847fd8 88060a847fd8 88060a847fd8 
00013780
[329014.593290]  8806091416f0 8806085bc4d0 88060a847c50 
88061870c800
[329014.593295] Call Trace:
[329014.593303]  [] schedule+0x3f/0x60
[329014.593309]  [] jbd2_log_wait_commit+0xb5/0x130
[329014.593315]  [] ? add_wait_queue+0x60/0x60
[329014.593321]  [] ext4_sync_file+0x208/0x2d0
[329014.593328]  [] vfs_fsync_range+0x1d/0x40
[329014.593339]  [] nfsd_commit+0xb1/0xd0 [nfsd]
[329014.593349]  [] nfsd3_proc_commit+0x9d/0x100 [nfsd]
[329014.593356]  [] nfsd_dispatch+0xeb/0x230 [nfsd]
[329014.593373]  [] svc_process_common+0x345/0x690 
[sunrpc]
[329014.593379]  [] ? try_to_wake_up+0x200/0x200
[329014.593391]  [] svc_process+0x102/0x150 [sunrpc]
[329014.593397]  [] nfsd+0xbd/0x160 [nfsd]
[329014.593403]  [] ? nfsd_startup+0xf0/0xf0 [nfsd]
[329014.593407]  [] kthread+0x8c/0xa0
[329014.593412]  [] kernel_thread_helper+0x4/0x10
[329014.593416]  [] ? flush_kthread_worker+0xa0/0xa0
[329014.593420]  [] ? gs_change+0x13/0x13


Has anyone an idea what could cause such problems? I have no idea for 
further analysis.

Is ext4 unsuitable for such a setup? Or is the linux nfs3 implementation 
broken? Are buffers too large such that one has too wait too long for a 
flush?

Thanks
Christoph Bartoschek



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org