Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-23 Thread Pranith Kumar Karampuri
Raised the following bug to track the issue: 
https://bugzilla.redhat.com/show_bug.cgi?id=842254

Thanks!!
Pranith
- Original Message -
From: Anand Avati 
To: Jake Grimmett 
Cc: Pranith Kumar Karampuri , gluster-users@gluster.org
Sent: Thu, 19 Jul 2012 11:13:36 -0400 (EDT)
Subject: Re: [Gluster-users] "Granular locking" - does this need to be enabled 
in 3.3.0 ?

On Thu, Jul 19, 2012 at 2:14 AM, Jake Grimmett wrote:

> Dear Pranith /Anand ,
>
> Update on our progress with using KVM & Gluster:
>
> We built a two server (Dell R710) cluster, each box has...
>  5 x 500 GB SATA RAID5 array (software raid)
>  an Intel 10GB ethernet HBA.
>  One box has 8GB RAM, the other 48GB
>  both have 2 x E5520 Xeon
>  Centos 6.3 installed
>  Gluster 3.3 installed from the rpm files on the gluster site
>
>
> 1) create a replicated gluster volume (on top of xfs)
> 2) setup qemu/kvm with a gluster volume (mounts localhost:/gluster-vol)
> 3) sanlock configured (this is evil!)
> 4) build a virtual machines with 30GB qcow2 image, 1GB RAM
> 5) clone this VM into 4 machines
> 6) check that live migration works (OK)
>
> Start basic test cycle:
> a) migrate all machines to host #1, then reboot host #2
> b) watch logs for self-heal to complete
> c) migrate VM's to host #2, reboot host #1
> d) check logs for self heal
>
> The above cycle can be repeated numerous times, and completes without
> error, provided that no (or little) load is on the VM.
>
>
> If I give the VM's a work load, such by running "bonnie++" on each VM,
> things start to break.
> 1) it becomes almost impossible to log in to each VM
> 2) the kernel on each VM starts giving timeout errors
> i.e. "echo 0 > /proc/sys/kernel/hung_task_**timeout_secs"
> 3) top / uptime on the hosts shows load average of up to 24
> 4) dd write speed (block size 1K) to gluster is around 3MB/s on the host
>
>
> While I agree that running bonnie++ on four VM's is possibly unfair, there
> are load spikes on quiet machines (yum updates etc). I suspect that the I/O
> of one VM starts blocking that of another VM, and the pressure builds up
> rapidly on gluster - which does not seem to cope well under pressure.
> Possibly this is the access pattern / block size of qcow2 disks?
>
> I'm (slightly) disappointed.
>
> Though it doesn't corrupt data, the I/O performance is < 1% of my
> hardwares capability. Hopefully work on buffering and other tuning will fix
> this ? Or maybe the work mentioned getting qemu talking directly to gluster
> will fix this?
>
>
Do you mean that the I/O is bad when you are performing the migration? Or
bad in general? If it is bad in general the qemu driver should help. Also
try presenting each VM a FUSE mount point of its own (we have seen that
help improve the overall system IOPs)
If it is slow performance only during failover/failback, we probably need
to do some more internal QoS tuning to de-prioritize self-heal traffic from
preempting VM traffic for resources.

Avati

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-23 Thread Pranith Kumar Karampuri
Samuli,
>>As a side note I have to say that I have seen similar problems with
>>RAID-5 systems even when using them as non-replicated iSCSI target. In 
>>my experience it's definetly not good for hosting VM images.
I think the performance problems he mentioned (I/O performance etc) were 
when a self-heal is triggered. If the replicate xlator is not loaded self-heal 
is never triggered. Could you raise bugs for the problems you are facing so 
that it will be improved in next releases. 

Pranith
- Original Message -
From: Samuli Heinonen 
To: gluster-users@gluster.org
Sent: Fri, 20 Jul 2012 11:48:28 -0400 (EDT)
Subject: Re: [Gluster-users] "Granular locking" - does this need to be enabled 
in 3.3.0 ?

> 3) sanlock configured (this is evil!)

Just out of curiosity, can you please tell more why it is evil? I just 
found it out after your first post and want to know if there's any 
gotchas :)

> Though it doesn't corrupt data, the I/O performance is < 1% of my
> hardwares capability. Hopefully work on buffering and other tuning
> will fix this ? Or maybe the work mentioned getting qemu talking
> directly to gluster will fix this?

Have you tried setting performance.client-io-threads on if it makes any 
difference?

As a side note I have to say that I have seen similar problems with 
RAID-5 systems even when using them as non-replicated iSCSI target. In 
my experience it's definetly not good for hosting VM images.

-samuli
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-20 Thread Samuli Heinonen

3) sanlock configured (this is evil!)


Just out of curiosity, can you please tell more why it is evil? I just 
found it out after your first post and want to know if there's any 
gotchas :)



Though it doesn't corrupt data, the I/O performance is < 1% of my
hardwares capability. Hopefully work on buffering and other tuning
will fix this ? Or maybe the work mentioned getting qemu talking
directly to gluster will fix this?


Have you tried setting performance.client-io-threads on if it makes any 
difference?


As a side note I have to say that I have seen similar problems with 
RAID-5 systems even when using them as non-replicated iSCSI target. In 
my experience it's definetly not good for hosting VM images.


-samuli
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-19 Thread Anand Avati
On Thu, Jul 19, 2012 at 2:14 AM, Jake Grimmett wrote:

> Dear Pranith /Anand ,
>
> Update on our progress with using KVM & Gluster:
>
> We built a two server (Dell R710) cluster, each box has...
>  5 x 500 GB SATA RAID5 array (software raid)
>  an Intel 10GB ethernet HBA.
>  One box has 8GB RAM, the other 48GB
>  both have 2 x E5520 Xeon
>  Centos 6.3 installed
>  Gluster 3.3 installed from the rpm files on the gluster site
>
>
> 1) create a replicated gluster volume (on top of xfs)
> 2) setup qemu/kvm with a gluster volume (mounts localhost:/gluster-vol)
> 3) sanlock configured (this is evil!)
> 4) build a virtual machines with 30GB qcow2 image, 1GB RAM
> 5) clone this VM into 4 machines
> 6) check that live migration works (OK)
>
> Start basic test cycle:
> a) migrate all machines to host #1, then reboot host #2
> b) watch logs for self-heal to complete
> c) migrate VM's to host #2, reboot host #1
> d) check logs for self heal
>
> The above cycle can be repeated numerous times, and completes without
> error, provided that no (or little) load is on the VM.
>
>
> If I give the VM's a work load, such by running "bonnie++" on each VM,
> things start to break.
> 1) it becomes almost impossible to log in to each VM
> 2) the kernel on each VM starts giving timeout errors
> i.e. "echo 0 > /proc/sys/kernel/hung_task_**timeout_secs"
> 3) top / uptime on the hosts shows load average of up to 24
> 4) dd write speed (block size 1K) to gluster is around 3MB/s on the host
>
>
> While I agree that running bonnie++ on four VM's is possibly unfair, there
> are load spikes on quiet machines (yum updates etc). I suspect that the I/O
> of one VM starts blocking that of another VM, and the pressure builds up
> rapidly on gluster - which does not seem to cope well under pressure.
> Possibly this is the access pattern / block size of qcow2 disks?
>
> I'm (slightly) disappointed.
>
> Though it doesn't corrupt data, the I/O performance is < 1% of my
> hardwares capability. Hopefully work on buffering and other tuning will fix
> this ? Or maybe the work mentioned getting qemu talking directly to gluster
> will fix this?
>
>
Do you mean that the I/O is bad when you are performing the migration? Or
bad in general? If it is bad in general the qemu driver should help. Also
try presenting each VM a FUSE mount point of its own (we have seen that
help improve the overall system IOPs)
If it is slow performance only during failover/failback, we probably need
to do some more internal QoS tuning to de-prioritize self-heal traffic from
preempting VM traffic for resources.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-19 Thread Jake Grimmett

Dear Pranith /Anand ,

Update on our progress with using KVM & Gluster:

We built a two server (Dell R710) cluster, each box has...
 5 x 500 GB SATA RAID5 array (software raid)
 an Intel 10GB ethernet HBA.
 One box has 8GB RAM, the other 48GB
 both have 2 x E5520 Xeon
 Centos 6.3 installed
 Gluster 3.3 installed from the rpm files on the gluster site


1) create a replicated gluster volume (on top of xfs)
2) setup qemu/kvm with a gluster volume (mounts localhost:/gluster-vol)
3) sanlock configured (this is evil!)
4) build a virtual machines with 30GB qcow2 image, 1GB RAM
5) clone this VM into 4 machines
6) check that live migration works (OK)

Start basic test cycle:
a) migrate all machines to host #1, then reboot host #2
b) watch logs for self-heal to complete
c) migrate VM's to host #2, reboot host #1
d) check logs for self heal

The above cycle can be repeated numerous times, and completes without 
error, provided that no (or little) load is on the VM.



If I give the VM's a work load, such by running "bonnie++" on each VM, 
things start to break.

1) it becomes almost impossible to log in to each VM
2) the kernel on each VM starts giving timeout errors
i.e. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
3) top / uptime on the hosts shows load average of up to 24
4) dd write speed (block size 1K) to gluster is around 3MB/s on the host


While I agree that running bonnie++ on four VM's is possibly unfair, 
there are load spikes on quiet machines (yum updates etc). I suspect 
that the I/O of one VM starts blocking that of another VM, and the 
pressure builds up rapidly on gluster - which does not seem to cope well 
under pressure. Possibly this is the access pattern / block size of 
qcow2 disks?


I'm (slightly) disappointed.

Though it doesn't corrupt data, the I/O performance is < 1% of my 
hardwares capability. Hopefully work on buffering and other tuning will 
fix this ? Or maybe the work mentioned getting qemu talking directly to 
gluster will fix this?


best wishes

Jake

--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Hills Road, Cambridge, CB2 0QH, UK.
Phone 01223 402219
Mobile 0776 9886539
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-10 Thread Pranith Kumar Karampuri
hi Jake,
  Thanks! for the info. I will also try to recreate the problem with the info 
so far.

Pranith
- Original Message -
From: "Jake Grimmett" 
To: "Pranith Kumar Karampuri" 
Cc: gluster-users@gluster.org, "Anand Avati" 
Sent: Tuesday, July 10, 2012 5:01:37 PM
Subject: Re: [Gluster-users] "Granular locking" - does this need to be enabled 
in 3.3.0 ?

Dear Pranith,

I've reduced the number of VM's on the cluster to 16, most have qcow2 
format image files of between 2GB and 8GB. The heaviest load comes from 
three bigger VM's:

1) 8.5G - a lightly loaded ldap server
2) 24G - a lightly loaded confluence server
3) 30G - a gridengine master server

Most I/O is read, but there are database writes going on here.

Typical CPU usage on the host server (Dell R720XD, 2 x E5-2643) is 5%
Memory use is 20GB / 47GB

I'm keen to help work the bugs out, but rather than risk upsetting 16 
live machines (...and their owners), I'll build a new VM cluster on our 
dev Dell R710's. Centos 6.3 is out, and this is a good opportunity to 
see how the latest RHEL / KVM / sanlock interacts with gluster 3.3.0.

I'll update the thread in a couple of days when the test servers are 
working...

regards,

Jake


On 07/10/2012 04:44 AM, Pranith Kumar Karampuri wrote:
> Jake,
>  Granular locking is the only way data-self-heal is performed at the 
> moment. Could you give us the steps to re-create this issue, so that we can 
> test this scenario locally. I will raise a bug with the info you provide.
> This is roughly the info I am looking for:
> 1) What is the size of each VM. (Number of VMs: 30 as per your mail)
> 2) What is the kind of load in the VM. You said small web-servers with low 
> traffic, What kind of traffic is it? Writes(Uploads of files), Reads etc.
> 3) Steps leading to the hang.
> 4) If you think you can re-create the issue, can you post the statedumps of 
> the brick processes and the mount process when the hangs appear.
>
> Pranith.
> - Original Message -
> From: "Jake Grimmett"
> To: "Anand Avati"
> Cc: "Jake Grimmett", gluster-users@gluster.org
> Sent: Monday, July 9, 2012 11:51:19 PM
> Subject: Re: [Gluster-users] "Granular locking" - does this need to be 
> enabled in 3.3.0 ?
>
> Hi Anand,
>
> This is one entry (of many) in the client log when bringing my second node
> of the cluster back up, the glustershd.log is completely silent at this
> point.
>
> If your interested in seeing the nodes split&  reconnect, the relevant
> glustershd.log section is at http://pastebin.com/0Va3RxDD
>
> many thanks!
>
> Jake
>
>> Was this the client log or the glustershd log?
>>
>> Thanks,
>> Avati
>>
>> On Mon, Jul 9, 2012 at 8:23 AM, Jake Grimmett
>> wrote:
>>
>>> Hi Fernando / Christian,
>>>
>>> Many thanks for getting back to me.
>>>
>>> Slow writes are acceptable; most of our VM's are small web servers with
>>> low traffic. My aim is to have a fully self-contained two server KVM
>>> cluster with live migration, no external storage and the ability to
>>> reboot
>>> either node with zero VM downtime.  We seem to be "almost there", bar a
>>> hiccup when the self-heal is in progress and some minor grumbles from
>>> sanlock (which might be fixed by the new sanlock in RHEL 6.3)
>>>
>>> Incidentally, the logs shows a "diff" self heal on a node reboot:
>>>
>>> [2012-07-09 16:04:06.743512] I
>>> [afr-self-heal-algorithm.c:**122:sh_loop_driver_done]
>>> 0-gluster-rep-replicate-0: diff self-heal on /box1-clone2.img:
>>> completed.
>>> (16 blocks of 16974 were different (0.09%))
>>>
>>> So, does this log show "Granular locking" occurring, or does it just
>>> happen transparently when a file exceeds a certain size?
>>>
>>> many thanks
>>>
>>> Jake
>>>
>>>
>>>
>>> On 07/09/2012 04:01 PM, Fernando Frediani (Qube) wrote:
>>>
>>>> Jake,
>>>>
>>>> I haven’t had a chanced to test with my KVM cluster yet but it should
>>>> be
>>>> a default things from 3.3.
>>>>
>>>> Just be in mind that running Virtual Machines is NOT a supported things
>>>> for Redhat Storage server according to Redhat Sales people. They said
>>>> towards the end of the year. As you might have observed performance
>>>> specially for write isn’t any near fantastic.
>>>>
>>>>
>>>> Fernando
>>>>
>>>> *Fro

Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-10 Thread Jake Grimmett

Dear Pranith,

I've reduced the number of VM's on the cluster to 16, most have qcow2 
format image files of between 2GB and 8GB. The heaviest load comes from 
three bigger VM's:


1) 8.5G - a lightly loaded ldap server
2) 24G - a lightly loaded confluence server
3) 30G - a gridengine master server

Most I/O is read, but there are database writes going on here.

Typical CPU usage on the host server (Dell R720XD, 2 x E5-2643) is 5%
Memory use is 20GB / 47GB

I'm keen to help work the bugs out, but rather than risk upsetting 16 
live machines (...and their owners), I'll build a new VM cluster on our 
dev Dell R710's. Centos 6.3 is out, and this is a good opportunity to 
see how the latest RHEL / KVM / sanlock interacts with gluster 3.3.0.


I'll update the thread in a couple of days when the test servers are 
working...


regards,

Jake


On 07/10/2012 04:44 AM, Pranith Kumar Karampuri wrote:

Jake,
 Granular locking is the only way data-self-heal is performed at the 
moment. Could you give us the steps to re-create this issue, so that we can 
test this scenario locally. I will raise a bug with the info you provide.
This is roughly the info I am looking for:
1) What is the size of each VM. (Number of VMs: 30 as per your mail)
2) What is the kind of load in the VM. You said small web-servers with low 
traffic, What kind of traffic is it? Writes(Uploads of files), Reads etc.
3) Steps leading to the hang.
4) If you think you can re-create the issue, can you post the statedumps of the 
brick processes and the mount process when the hangs appear.

Pranith.
- Original Message -
From: "Jake Grimmett"
To: "Anand Avati"
Cc: "Jake Grimmett", gluster-users@gluster.org
Sent: Monday, July 9, 2012 11:51:19 PM
Subject: Re: [Gluster-users] "Granular locking" - does this need to be enabled 
in 3.3.0 ?

Hi Anand,

This is one entry (of many) in the client log when bringing my second node
of the cluster back up, the glustershd.log is completely silent at this
point.

If your interested in seeing the nodes split&  reconnect, the relevant
glustershd.log section is at http://pastebin.com/0Va3RxDD

many thanks!

Jake


Was this the client log or the glustershd log?

Thanks,
Avati

On Mon, Jul 9, 2012 at 8:23 AM, Jake Grimmett
wrote:


Hi Fernando / Christian,

Many thanks for getting back to me.

Slow writes are acceptable; most of our VM's are small web servers with
low traffic. My aim is to have a fully self-contained two server KVM
cluster with live migration, no external storage and the ability to
reboot
either node with zero VM downtime.  We seem to be "almost there", bar a
hiccup when the self-heal is in progress and some minor grumbles from
sanlock (which might be fixed by the new sanlock in RHEL 6.3)

Incidentally, the logs shows a "diff" self heal on a node reboot:

[2012-07-09 16:04:06.743512] I
[afr-self-heal-algorithm.c:**122:sh_loop_driver_done]
0-gluster-rep-replicate-0: diff self-heal on /box1-clone2.img:
completed.
(16 blocks of 16974 were different (0.09%))

So, does this log show "Granular locking" occurring, or does it just
happen transparently when a file exceeds a certain size?

many thanks

Jake



On 07/09/2012 04:01 PM, Fernando Frediani (Qube) wrote:


Jake,

I haven’t had a chanced to test with my KVM cluster yet but it should
be
a default things from 3.3.

Just be in mind that running Virtual Machines is NOT a supported things
for Redhat Storage server according to Redhat Sales people. They said
towards the end of the year. As you might have observed performance
specially for write isn’t any near fantastic.


Fernando

*From:*gluster-users-bounces@**gluster.org
[mailto:gluster-users-bounces@**gluster.org]
*On Behalf Of *Christian Wittwer
*Sent:* 09 July 2012 15:51
*To:* Jake Grimmett
*Cc:* gluster-users@gluster.org
*Subject:* Re: [Gluster-users] "Granular locking" - does this need to
be

enabled in 3.3.0 ?

Hi Jake

I can confirm exact the same behaviour with gluster 3.3.0 on Ubuntu
12.04. During the self-heal process the VM gets 100% I/O wait and is
locked.

After the self-heal the root filesystem was read-only which forced me
to
do a reboot and fsck.

Cheers,

Christian

2012/7/9 Jake Grimmettmailto:j...@mrc-lmb.cam.ac.uk>**>


Dear All,

I have a pair of Scientific Linux 6.2 servers, acting as KVM
virtualisation hosts for ~30 VM's. The VM images are stored in a
replicated gluster volume shared between the two servers. Live
migration
works fine, and the sanlock prevents me from (stupidly) starting the
same VM on both machines. Each server has 10GB ethernet and a 10 disk
RAID5 array.

If I migrate all the VM's to server #1 and shutdown server #2, all
works
perfectly with no interruption. When I restart server #2, the VM's
freeze while the self-heal process is running - and this healing can
take a long time.

I'm not sure if "Granular 

Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-09 Thread Pranith Kumar Karampuri
Jake,
Granular locking is the only way data-self-heal is performed at the moment. 
Could you give us the steps to re-create this issue, so that we can test this 
scenario locally. I will raise a bug with the info you provide.
This is roughly the info I am looking for:
1) What is the size of each VM. (Number of VMs: 30 as per your mail)
2) What is the kind of load in the VM. You said small web-servers with low 
traffic, What kind of traffic is it? Writes(Uploads of files), Reads etc.
3) Steps leading to the hang.
4) If you think you can re-create the issue, can you post the statedumps of the 
brick processes and the mount process when the hangs appear.

Pranith.
- Original Message -
From: "Jake Grimmett" 
To: "Anand Avati" 
Cc: "Jake Grimmett" , gluster-users@gluster.org
Sent: Monday, July 9, 2012 11:51:19 PM
Subject: Re: [Gluster-users] "Granular locking" - does this need to be enabled 
in 3.3.0 ?

Hi Anand,

This is one entry (of many) in the client log when bringing my second node
of the cluster back up, the glustershd.log is completely silent at this
point.

If your interested in seeing the nodes split & reconnect, the relevant
glustershd.log section is at http://pastebin.com/0Va3RxDD

many thanks!

Jake

> Was this the client log or the glustershd log?
>
> Thanks,
> Avati
>
> On Mon, Jul 9, 2012 at 8:23 AM, Jake Grimmett 
> wrote:
>
>> Hi Fernando / Christian,
>>
>> Many thanks for getting back to me.
>>
>> Slow writes are acceptable; most of our VM's are small web servers with
>> low traffic. My aim is to have a fully self-contained two server KVM
>> cluster with live migration, no external storage and the ability to
>> reboot
>> either node with zero VM downtime.  We seem to be "almost there", bar a
>> hiccup when the self-heal is in progress and some minor grumbles from
>> sanlock (which might be fixed by the new sanlock in RHEL 6.3)
>>
>> Incidentally, the logs shows a "diff" self heal on a node reboot:
>>
>> [2012-07-09 16:04:06.743512] I
>> [afr-self-heal-algorithm.c:**122:sh_loop_driver_done]
>> 0-gluster-rep-replicate-0: diff self-heal on /box1-clone2.img:
>> completed.
>> (16 blocks of 16974 were different (0.09%))
>>
>> So, does this log show "Granular locking" occurring, or does it just
>> happen transparently when a file exceeds a certain size?
>>
>> many thanks
>>
>> Jake
>>
>>
>>
>> On 07/09/2012 04:01 PM, Fernando Frediani (Qube) wrote:
>>
>>> Jake,
>>>
>>> I haven’t had a chanced to test with my KVM cluster yet but it should
>>> be
>>> a default things from 3.3.
>>>
>>> Just be in mind that running Virtual Machines is NOT a supported things
>>> for Redhat Storage server according to Redhat Sales people. They said
>>> towards the end of the year. As you might have observed performance
>>> specially for write isn’t any near fantastic.
>>>
>>>
>>> Fernando
>>>
>>> *From:*gluster-users-bounces@**gluster.org
>>> [mailto:gluster-users-bounces@**gluster.org]
>>> *On Behalf Of *Christian Wittwer
>>> *Sent:* 09 July 2012 15:51
>>> *To:* Jake Grimmett
>>> *Cc:* gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] "Granular locking" - does this need to
>>> be
>>>
>>> enabled in 3.3.0 ?
>>>
>>> Hi Jake
>>>
>>> I can confirm exact the same behaviour with gluster 3.3.0 on Ubuntu
>>> 12.04. During the self-heal process the VM gets 100% I/O wait and is
>>> locked.
>>>
>>> After the self-heal the root filesystem was read-only which forced me
>>> to
>>> do a reboot and fsck.
>>>
>>> Cheers,
>>>
>>> Christian
>>>
>>> 2012/7/9 Jake Grimmett >> <mailto:j...@mrc-lmb.cam.ac.uk>**>
>>>
>>>
>>> Dear All,
>>>
>>> I have a pair of Scientific Linux 6.2 servers, acting as KVM
>>> virtualisation hosts for ~30 VM's. The VM images are stored in a
>>> replicated gluster volume shared between the two servers. Live
>>> migration
>>> works fine, and the sanlock prevents me from (stupidly) starting the
>>> same VM on both machines. Each server has 10GB ethernet and a 10 disk
>>> RAID5 array.
>>>
>>> If I migrate all the VM's to server #1 and shutdown server #2, all
>>> works
>>> perfectly with no interruption. When I restart server #2, the VM's
>>> freeze while the self-heal process is

Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-09 Thread Jake Grimmett
Hi Anand,

This is one entry (of many) in the client log when bringing my second node
of the cluster back up, the glustershd.log is completely silent at this
point.

If your interested in seeing the nodes split & reconnect, the relevant
glustershd.log section is at http://pastebin.com/0Va3RxDD

many thanks!

Jake

> Was this the client log or the glustershd log?
>
> Thanks,
> Avati
>
> On Mon, Jul 9, 2012 at 8:23 AM, Jake Grimmett 
> wrote:
>
>> Hi Fernando / Christian,
>>
>> Many thanks for getting back to me.
>>
>> Slow writes are acceptable; most of our VM's are small web servers with
>> low traffic. My aim is to have a fully self-contained two server KVM
>> cluster with live migration, no external storage and the ability to
>> reboot
>> either node with zero VM downtime.  We seem to be "almost there", bar a
>> hiccup when the self-heal is in progress and some minor grumbles from
>> sanlock (which might be fixed by the new sanlock in RHEL 6.3)
>>
>> Incidentally, the logs shows a "diff" self heal on a node reboot:
>>
>> [2012-07-09 16:04:06.743512] I
>> [afr-self-heal-algorithm.c:**122:sh_loop_driver_done]
>> 0-gluster-rep-replicate-0: diff self-heal on /box1-clone2.img:
>> completed.
>> (16 blocks of 16974 were different (0.09%))
>>
>> So, does this log show "Granular locking" occurring, or does it just
>> happen transparently when a file exceeds a certain size?
>>
>> many thanks
>>
>> Jake
>>
>>
>>
>> On 07/09/2012 04:01 PM, Fernando Frediani (Qube) wrote:
>>
>>> Jake,
>>>
>>> I haven’t had a chanced to test with my KVM cluster yet but it should
>>> be
>>> a default things from 3.3.
>>>
>>> Just be in mind that running Virtual Machines is NOT a supported things
>>> for Redhat Storage server according to Redhat Sales people. They said
>>> towards the end of the year. As you might have observed performance
>>> specially for write isn’t any near fantastic.
>>>
>>>
>>> Fernando
>>>
>>> *From:*gluster-users-bounces@**gluster.org
>>> [mailto:gluster-users-bounces@**gluster.org]
>>> *On Behalf Of *Christian Wittwer
>>> *Sent:* 09 July 2012 15:51
>>> *To:* Jake Grimmett
>>> *Cc:* gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] "Granular locking" - does this need to
>>> be
>>>
>>> enabled in 3.3.0 ?
>>>
>>> Hi Jake
>>>
>>> I can confirm exact the same behaviour with gluster 3.3.0 on Ubuntu
>>> 12.04. During the self-heal process the VM gets 100% I/O wait and is
>>> locked.
>>>
>>> After the self-heal the root filesystem was read-only which forced me
>>> to
>>> do a reboot and fsck.
>>>
>>> Cheers,
>>>
>>> Christian
>>>
>>> 2012/7/9 Jake Grimmett >> <mailto:j...@mrc-lmb.cam.ac.uk>**>
>>>
>>>
>>> Dear All,
>>>
>>> I have a pair of Scientific Linux 6.2 servers, acting as KVM
>>> virtualisation hosts for ~30 VM's. The VM images are stored in a
>>> replicated gluster volume shared between the two servers. Live
>>> migration
>>> works fine, and the sanlock prevents me from (stupidly) starting the
>>> same VM on both machines. Each server has 10GB ethernet and a 10 disk
>>> RAID5 array.
>>>
>>> If I migrate all the VM's to server #1 and shutdown server #2, all
>>> works
>>> perfectly with no interruption. When I restart server #2, the VM's
>>> freeze while the self-heal process is running - and this healing can
>>> take a long time.
>>>
>>> I'm not sure if "Granular Locking" is on. It's listed as a "technology
>>> preview" in the Redhat Storage server 2 notes - do I need to do
>>> anything
>>> to enable it?
>>>
>>> i.e. set "cluster.data-self-heal-**algorithm" to diff ?
>>> or edit "cluster.self-heal-window-**size" ?
>>>
>>> any tips from other people doing similar much appreciated!
>>>
>>> Many thanks,
>>>
>>> Jake
>>>
>>> jog <---at---> mrc-lmb.cam.ac.uk <http://mrc-lmb.cam.ac.uk>
>>> __**_
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> <mailto:Gluster-users@gluster.**org
>>> >
>>> http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>>>
>>>
>>
>> --
>> Dr Jake Grimmett
>> Head Of Scientific Computing
>> MRC Laboratory of Molecular Biology
>> Hills Road, Cambridge, CB2 0QH, UK.
>> Phone 01223 402219
>> Mobile 0776 9886539
>>
>> __**_
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>>
>


-- 
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Hills Road, Cambridge, CB2 0QH, UK.
Phone 01223 402219
Mobile 0776 9886539


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-09 Thread Anand Avati
Was this the client log or the glustershd log?

Thanks,
Avati

On Mon, Jul 9, 2012 at 8:23 AM, Jake Grimmett  wrote:

> Hi Fernando / Christian,
>
> Many thanks for getting back to me.
>
> Slow writes are acceptable; most of our VM's are small web servers with
> low traffic. My aim is to have a fully self-contained two server KVM
> cluster with live migration, no external storage and the ability to reboot
> either node with zero VM downtime.  We seem to be "almost there", bar a
> hiccup when the self-heal is in progress and some minor grumbles from
> sanlock (which might be fixed by the new sanlock in RHEL 6.3)
>
> Incidentally, the logs shows a "diff" self heal on a node reboot:
>
> [2012-07-09 16:04:06.743512] I 
> [afr-self-heal-algorithm.c:**122:sh_loop_driver_done]
> 0-gluster-rep-replicate-0: diff self-heal on /box1-clone2.img: completed.
> (16 blocks of 16974 were different (0.09%))
>
> So, does this log show "Granular locking" occurring, or does it just
> happen transparently when a file exceeds a certain size?
>
> many thanks
>
> Jake
>
>
>
> On 07/09/2012 04:01 PM, Fernando Frediani (Qube) wrote:
>
>> Jake,
>>
>> I haven’t had a chanced to test with my KVM cluster yet but it should be
>> a default things from 3.3.
>>
>> Just be in mind that running Virtual Machines is NOT a supported things
>> for Redhat Storage server according to Redhat Sales people. They said
>> towards the end of the year. As you might have observed performance
>> specially for write isn’t any near fantastic.
>>
>>
>> Fernando
>>
>> *From:*gluster-users-bounces@**gluster.org
>> [mailto:gluster-users-bounces@**gluster.org]
>> *On Behalf Of *Christian Wittwer
>> *Sent:* 09 July 2012 15:51
>> *To:* Jake Grimmett
>> *Cc:* gluster-users@gluster.org
>> *Subject:* Re: [Gluster-users] "Granular locking" - does this need to be
>>
>> enabled in 3.3.0 ?
>>
>> Hi Jake
>>
>> I can confirm exact the same behaviour with gluster 3.3.0 on Ubuntu
>> 12.04. During the self-heal process the VM gets 100% I/O wait and is
>> locked.
>>
>> After the self-heal the root filesystem was read-only which forced me to
>> do a reboot and fsck.
>>
>> Cheers,
>>
>> Christian
>>
>> 2012/7/9 Jake Grimmett > <mailto:j...@mrc-lmb.cam.ac.uk>**>
>>
>>
>> Dear All,
>>
>> I have a pair of Scientific Linux 6.2 servers, acting as KVM
>> virtualisation hosts for ~30 VM's. The VM images are stored in a
>> replicated gluster volume shared between the two servers. Live migration
>> works fine, and the sanlock prevents me from (stupidly) starting the
>> same VM on both machines. Each server has 10GB ethernet and a 10 disk
>> RAID5 array.
>>
>> If I migrate all the VM's to server #1 and shutdown server #2, all works
>> perfectly with no interruption. When I restart server #2, the VM's
>> freeze while the self-heal process is running - and this healing can
>> take a long time.
>>
>> I'm not sure if "Granular Locking" is on. It's listed as a "technology
>> preview" in the Redhat Storage server 2 notes - do I need to do anything
>> to enable it?
>>
>> i.e. set "cluster.data-self-heal-**algorithm" to diff ?
>> or edit "cluster.self-heal-window-**size" ?
>>
>> any tips from other people doing similar much appreciated!
>>
>> Many thanks,
>>
>> Jake
>>
>> jog <---at---> mrc-lmb.cam.ac.uk <http://mrc-lmb.cam.ac.uk>
>> __**_
>> Gluster-users mailing list
>> Gluster-users@gluster.org 
>> <mailto:Gluster-users@gluster.**org
>> >
>> http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>>
>>
>
> --
> Dr Jake Grimmett
> Head Of Scientific Computing
> MRC Laboratory of Molecular Biology
> Hills Road, Cambridge, CB2 0QH, UK.
> Phone 01223 402219
> Mobile 0776 9886539
>
> __**_
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-09 Thread Jake Grimmett

Hi Fernando / Christian,

Many thanks for getting back to me.

Slow writes are acceptable; most of our VM's are small web servers with 
low traffic. My aim is to have a fully self-contained two server KVM 
cluster with live migration, no external storage and the ability to 
reboot either node with zero VM downtime.  We seem to be "almost there", 
bar a hiccup when the self-heal is in progress and some minor grumbles 
from sanlock (which might be fixed by the new sanlock in RHEL 6.3)


Incidentally, the logs shows a "diff" self heal on a node reboot:

[2012-07-09 16:04:06.743512] I 
[afr-self-heal-algorithm.c:122:sh_loop_driver_done] 
0-gluster-rep-replicate-0: diff self-heal on /box1-clone2.img: 
completed. (16 blocks of 16974 were different (0.09%))


So, does this log show "Granular locking" occurring, or does it just 
happen transparently when a file exceeds a certain size?


many thanks

Jake


On 07/09/2012 04:01 PM, Fernando Frediani (Qube) wrote:

Jake,

I haven’t had a chanced to test with my KVM cluster yet but it should be
a default things from 3.3.

Just be in mind that running Virtual Machines is NOT a supported things
for Redhat Storage server according to Redhat Sales people. They said
towards the end of the year. As you might have observed performance
specially for write isn’t any near fantastic.


Fernando

*From:*gluster-users-boun...@gluster.org
[mailto:gluster-users-boun...@gluster.org] *On Behalf Of *Christian Wittwer
*Sent:* 09 July 2012 15:51
*To:* Jake Grimmett
*Cc:* gluster-users@gluster.org
*Subject:* Re: [Gluster-users] "Granular locking" - does this need to be
enabled in 3.3.0 ?

Hi Jake

I can confirm exact the same behaviour with gluster 3.3.0 on Ubuntu
12.04. During the self-heal process the VM gets 100% I/O wait and is locked.

After the self-heal the root filesystem was read-only which forced me to
do a reboot and fsck.

Cheers,

Christian

2012/7/9 Jake Grimmett mailto:j...@mrc-lmb.cam.ac.uk>>

Dear All,

I have a pair of Scientific Linux 6.2 servers, acting as KVM
virtualisation hosts for ~30 VM's. The VM images are stored in a
replicated gluster volume shared between the two servers. Live migration
works fine, and the sanlock prevents me from (stupidly) starting the
same VM on both machines. Each server has 10GB ethernet and a 10 disk
RAID5 array.

If I migrate all the VM's to server #1 and shutdown server #2, all works
perfectly with no interruption. When I restart server #2, the VM's
freeze while the self-heal process is running - and this healing can
take a long time.

I'm not sure if "Granular Locking" is on. It's listed as a "technology
preview" in the Redhat Storage server 2 notes - do I need to do anything
to enable it?

i.e. set "cluster.data-self-heal-algorithm" to diff ?
or edit "cluster.self-heal-window-size" ?

any tips from other people doing similar much appreciated!

Many thanks,

Jake

jog <---at---> mrc-lmb.cam.ac.uk <http://mrc-lmb.cam.ac.uk>
___
Gluster-users mailing list
Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users




--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Hills Road, Cambridge, CB2 0QH, UK.
Phone 01223 402219
Mobile 0776 9886539
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-09 Thread Fernando Frediani (Qube)
Jake,

I haven't had a chanced to test with my KVM cluster yet but it should be a 
default things from 3.3.
Just be in mind that running Virtual Machines is NOT a supported things for 
Redhat Storage server according to Redhat Sales people. They said towards the 
end of the year. As you might have observed performance specially for write 
isn't any near fantastic.

Fernando

From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Christian Wittwer
Sent: 09 July 2012 15:51
To: Jake Grimmett
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] "Granular locking" - does this need to be enabled 
in 3.3.0 ?

Hi Jake
I can confirm exact the same behaviour with gluster 3.3.0 on Ubuntu 12.04. 
During the self-heal process the VM gets 100% I/O wait and  is locked.
After the self-heal the root filesystem was read-only which forced me to do a 
reboot and fsck.

Cheers,
Christian
2012/7/9 Jake Grimmett mailto:j...@mrc-lmb.cam.ac.uk>>
Dear All,

I have a pair of Scientific Linux 6.2 servers, acting as KVM virtualisation 
hosts for ~30 VM's. The VM images are stored in a replicated gluster volume 
shared between the two servers. Live migration works fine, and the sanlock 
prevents me from (stupidly) starting the same VM on both machines. Each server 
has 10GB ethernet and a 10 disk RAID5 array.

If I migrate all the VM's to server #1 and shutdown server #2, all works 
perfectly with no interruption. When I restart server #2, the VM's freeze while 
the self-heal process is running - and this healing can take a long time.

I'm not sure if "Granular Locking" is on. It's listed as a "technology preview" 
in the Redhat Storage server 2 notes - do I need to do anything to enable it?

i.e. set "cluster.data-self-heal-algorithm" to diff ?
or edit "cluster.self-heal-window-size" ?

any tips from other people doing similar much appreciated!

Many thanks,

Jake

jog <---at---> mrc-lmb.cam.ac.uk<http://mrc-lmb.cam.ac.uk>
___
Gluster-users mailing list
Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?

2012-07-09 Thread Christian Wittwer
Hi Jake
I can confirm exact the same behaviour with gluster 3.3.0 on Ubuntu 12.04.
During the self-heal process the VM gets 100% I/O wait and  is locked.
After the self-heal the root filesystem was read-only which forced me to do
a reboot and fsck.

Cheers,
Christian

2012/7/9 Jake Grimmett 

> Dear All,
>
> I have a pair of Scientific Linux 6.2 servers, acting as KVM
> virtualisation hosts for ~30 VM's. The VM images are stored in a replicated
> gluster volume shared between the two servers. Live migration works fine,
> and the sanlock prevents me from (stupidly) starting the same VM on both
> machines. Each server has 10GB ethernet and a 10 disk RAID5 array.
>
> If I migrate all the VM's to server #1 and shutdown server #2, all works
> perfectly with no interruption. When I restart server #2, the VM's freeze
> while the self-heal process is running - and this healing can take a long
> time.
>
> I'm not sure if "Granular Locking" is on. It's listed as a "technology
> preview" in the Redhat Storage server 2 notes - do I need to do anything to
> enable it?
>
> i.e. set "cluster.data-self-heal-**algorithm" to diff ?
> or edit "cluster.self-heal-window-**size" ?
>
> any tips from other people doing similar much appreciated!
>
> Many thanks,
>
> Jake
>
> jog <---at---> mrc-lmb.cam.ac.uk
> __**_
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users