Re: [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-25 Thread Krist van Besien
On 25 August 2017 at 04:47, Vijay Bellur  wrote:

>
>
> On Thu, Aug 24, 2017 at 9:01 AM, Krist van Besien 
> wrote:
>
> Would it be possible to obtain a statedump of the native client when the
> application becomes completely unresponsive? A statedump can help in
> understanding operations within the gluster stack. Log file of the native
> client might also offer some clues.
>

I've increased logging to debug on both client and bricks, but didn't see
anything that hinted at problems.
Maybe we have to go for Ganesha after all.

But currently we are stuck at the customer having trouble actually
generating enough load to test the server with...

When I try to simulate the workload with a script that writes and renames
files at the same rate the the video recorders do I can run it without any
issue, and can ramp up to the point where I am hitting the network ceiling.
So the gluster cluster is up to the task.
But the recorder software itself is running in to issues. Which makes me
suspect that it may have to do with the way some aspects of it are coded.
And it is there I am looking for answers. Any hints, like "if you call
fopen() you should give these flags an not these flags or you get in to
trouble"...

Krist

-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. 

kr...@redhat.comM: +41-79-5936260

TRIED. TESTED. TRUSTED. 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-24 Thread Vijay Bellur
On Thu, Aug 24, 2017 at 9:01 AM, Krist van Besien  wrote:

> Hi
> This is gluster 3.8.4. Volume options are out of the box. Sharding is off
> (and I don't think enabling it would matter)
>
> I haven't done much performance tuning. For one thing, using a simple
> script that just creates files I can easily flood the network, so I don't
> expect a performance issue.
>
> The problem we see is that after a certain time the fuse clients
> completely stop accepting writes. Something is preventing the application
> to write after a while.
> We see this on the fuse client, but not when we use nfs. So the question I
> am interested in seeing an answer too is in what way is nfs different from
> fuse that could cause this.
>
> My suspicion is it is locking related.
>
>
Would it be possible to obtain a statedump of the native client when the
application becomes completely unresponsive? A statedump can help in
understanding operations within the gluster stack. Log file of the native
client might also offer some clues.

Regards,
Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-24 Thread Everton Brogliatto
Hi Krist,

In my setup, if I mount the Gluster storage using NFS, I have an
improvement of 3x in writes speed.

I believe the answer for your questions is here:
http://lists.gluster.org/pipermail/gluster-users/2015-July/022703.html
https://joejulian.name/blog/nfs-mount-for-glusterfs-gives-better-read-performance-for-small-files/

In my case, as I run VMs and have shard enabled, changing the shard block
size made a significant difference.

Does changing the number of Gluster threads make any difference in you
setup as you have multiple clients accessing it simultaneously?

Best regards,
Everton Brogliatto



On Thu, Aug 24, 2017 at 9:01 PM, Krist van Besien  wrote:

> Hi
> This is gluster 3.8.4. Volume options are out of the box. Sharding is off
> (and I don't think enabling it would matter)
>
> I haven't done much performance tuning. For one thing, using a simple
> script that just creates files I can easily flood the network, so I don't
> expect a performance issue.
>
> The problem we see is that after a certain time the fuse clients
> completely stop accepting writes. Something is preventing the application
> to write after a while.
> We see this on the fuse client, but not when we use nfs. So the question I
> am interested in seeing an answer too is in what way is nfs different from
> fuse that could cause this.
>
> My suspicion is it is locking related.
>
> Krist
>
>
>
> On 24 August 2017 at 14:36, Everton Brogliatto 
> wrote:
>
>> Hi Krist,
>>
>> What are your volume options on that setup? Have you tried tuning it for
>> the kind of workload and files size you have?
>>
>> I would definitely do some tests with feature.shard=on/off first. If
>> shard is on, try playing with features.shard-block-size.
>> Do you have jumbo frames (MTU=9000) enabled across the switch and nodes?
>> if you have concurrent clients writing/reading, it could be beneficial to
>> increase the number of client and server threads as well, try setting
>> higher values for client.event-threads and server.event-threads.
>>
>> Best regards,
>> Everton Brogliatto
>>
>>
>>
>> On Thu, Aug 24, 2017 at 7:48 PM, Krist van Besien 
>> wrote:
>>
>>> Hi all,
>>>
>>> I usualy advise clients to use the native client if at all possible, as
>>> it is very robust. But I am running in to problems here.
>>>
>>> In this case the gluster system is used to store video streams. Basicaly
>>> the setup is the following:
>>> - A gluster cluster of 3 nodes, with ample storage. They export several
>>> volumes.
>>> - The network is 10GB, switched.
>>> - A "recording server" which subscribes to multi cast video streams, and
>>> records them to disk. The recorder writes the streams in 10s blocks, so
>>> when it is for example recording 50 streams it is creating 5 files a
>>> second, each about 5M. it uses a write-then-rename process.
>>>
>>> I simulated that with a small script, that wrote 5M files and renamed
>>> them as fast as it could, and could easily create around 100 files/s (which
>>> abouts saturates the network). So I think the cluster is up to the task.
>>>
>>> However if we try the actualy workload we run in to trouble. Running the
>>> recorder software we can gradually ramp up the number of streams it records
>>> (and thus the number of files it creates), and at arou d 50 streams the
>>> recorder eventually stops writing files. According to the programmers that
>>> wrote it, it appears that it can no longer get the needed locks¸ and as a
>>> result just stops writing.
>>>
>>> We decided to test using the NFS client as well, and there the problem
>>> does not exist. But again, I (and the customer) would prefer not to use
>>> NFS, but use the native client in stead.
>>>
>>> So if the problem is file locking, and the problem exists with the
>>> native client, and not using NFS, what could be the cause?
>>>
>>> In what way do locking differ between the two different file systems,
>>> between NFS and Fuse, and how can the programmers work around any issues
>>> the fuse client might be causing?
>>>
>>> This video stream software is a bespoke solution, developped in house
>>> and it is thus possible to change the way it handles files so it works with
>>> the native client, but the programmers are looking at me for guidance.
>>>
>>> Any suggestions?
>>>
>>> Krist
>>>
>>>
>>>
>>>
>>> --
>>> Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
>>> --
>>>
>>> Krist van Besien
>>>
>>> senior architect, RHCE, RHCSA Open Stack
>>>
>>> Red Hat Red Hat Switzerland S.A. 
>>>
>>> kr...@redhat.comM: +41-79-5936260
>>> 
>>> TRIED. TESTED. TRUSTED. 
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
>
> --
> Vriendelijke Groet |  Best Regards | Freundliche 

Re: [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-24 Thread Krist van Besien
Hi
This is gluster 3.8.4. Volume options are out of the box. Sharding is off
(and I don't think enabling it would matter)

I haven't done much performance tuning. For one thing, using a simple
script that just creates files I can easily flood the network, so I don't
expect a performance issue.

The problem we see is that after a certain time the fuse clients completely
stop accepting writes. Something is preventing the application to write
after a while.
We see this on the fuse client, but not when we use nfs. So the question I
am interested in seeing an answer too is in what way is nfs different from
fuse that could cause this.

My suspicion is it is locking related.

Krist


On 24 August 2017 at 14:36, Everton Brogliatto  wrote:

> Hi Krist,
>
> What are your volume options on that setup? Have you tried tuning it for
> the kind of workload and files size you have?
>
> I would definitely do some tests with feature.shard=on/off first. If shard
> is on, try playing with features.shard-block-size.
> Do you have jumbo frames (MTU=9000) enabled across the switch and nodes?
> if you have concurrent clients writing/reading, it could be beneficial to
> increase the number of client and server threads as well, try setting
> higher values for client.event-threads and server.event-threads.
>
> Best regards,
> Everton Brogliatto
>
>
>
> On Thu, Aug 24, 2017 at 7:48 PM, Krist van Besien 
> wrote:
>
>> Hi all,
>>
>> I usualy advise clients to use the native client if at all possible, as
>> it is very robust. But I am running in to problems here.
>>
>> In this case the gluster system is used to store video streams. Basicaly
>> the setup is the following:
>> - A gluster cluster of 3 nodes, with ample storage. They export several
>> volumes.
>> - The network is 10GB, switched.
>> - A "recording server" which subscribes to multi cast video streams, and
>> records them to disk. The recorder writes the streams in 10s blocks, so
>> when it is for example recording 50 streams it is creating 5 files a
>> second, each about 5M. it uses a write-then-rename process.
>>
>> I simulated that with a small script, that wrote 5M files and renamed
>> them as fast as it could, and could easily create around 100 files/s (which
>> abouts saturates the network). So I think the cluster is up to the task.
>>
>> However if we try the actualy workload we run in to trouble. Running the
>> recorder software we can gradually ramp up the number of streams it records
>> (and thus the number of files it creates), and at arou d 50 streams the
>> recorder eventually stops writing files. According to the programmers that
>> wrote it, it appears that it can no longer get the needed locks¸ and as a
>> result just stops writing.
>>
>> We decided to test using the NFS client as well, and there the problem
>> does not exist. But again, I (and the customer) would prefer not to use
>> NFS, but use the native client in stead.
>>
>> So if the problem is file locking, and the problem exists with the native
>> client, and not using NFS, what could be the cause?
>>
>> In what way do locking differ between the two different file systems,
>> between NFS and Fuse, and how can the programmers work around any issues
>> the fuse client might be causing?
>>
>> This video stream software is a bespoke solution, developped in house and
>> it is thus possible to change the way it handles files so it works with the
>> native client, but the programmers are looking at me for guidance.
>>
>> Any suggestions?
>>
>> Krist
>>
>>
>>
>>
>> --
>> Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
>> --
>>
>> Krist van Besien
>>
>> senior architect, RHCE, RHCSA Open Stack
>>
>> Red Hat Red Hat Switzerland S.A. 
>>
>> kr...@redhat.comM: +41-79-5936260
>> 
>> TRIED. TESTED. TRUSTED. 
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>


-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. 

kr...@redhat.comM: +41-79-5936260

TRIED. TESTED. TRUSTED. 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-24 Thread Everton Brogliatto
Hi Krist,

What are your volume options on that setup? Have you tried tuning it for
the kind of workload and files size you have?

I would definitely do some tests with feature.shard=on/off first. If shard
is on, try playing with features.shard-block-size.
Do you have jumbo frames (MTU=9000) enabled across the switch and nodes? if
you have concurrent clients writing/reading, it could be beneficial to
increase the number of client and server threads as well, try setting
higher values for client.event-threads and server.event-threads.

Best regards,
Everton Brogliatto



On Thu, Aug 24, 2017 at 7:48 PM, Krist van Besien  wrote:

> Hi all,
>
> I usualy advise clients to use the native client if at all possible, as it
> is very robust. But I am running in to problems here.
>
> In this case the gluster system is used to store video streams. Basicaly
> the setup is the following:
> - A gluster cluster of 3 nodes, with ample storage. They export several
> volumes.
> - The network is 10GB, switched.
> - A "recording server" which subscribes to multi cast video streams, and
> records them to disk. The recorder writes the streams in 10s blocks, so
> when it is for example recording 50 streams it is creating 5 files a
> second, each about 5M. it uses a write-then-rename process.
>
> I simulated that with a small script, that wrote 5M files and renamed them
> as fast as it could, and could easily create around 100 files/s (which
> abouts saturates the network). So I think the cluster is up to the task.
>
> However if we try the actualy workload we run in to trouble. Running the
> recorder software we can gradually ramp up the number of streams it records
> (and thus the number of files it creates), and at arou d 50 streams the
> recorder eventually stops writing files. According to the programmers that
> wrote it, it appears that it can no longer get the needed locks¸ and as a
> result just stops writing.
>
> We decided to test using the NFS client as well, and there the problem
> does not exist. But again, I (and the customer) would prefer not to use
> NFS, but use the native client in stead.
>
> So if the problem is file locking, and the problem exists with the native
> client, and not using NFS, what could be the cause?
>
> In what way do locking differ between the two different file systems,
> between NFS and Fuse, and how can the programmers work around any issues
> the fuse client might be causing?
>
> This video stream software is a bespoke solution, developped in house and
> it is thus possible to change the way it handles files so it works with the
> native client, but the programmers are looking at me for guidance.
>
> Any suggestions?
>
> Krist
>
>
>
>
> --
> Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
> --
>
> Krist van Besien
>
> senior architect, RHCE, RHCSA Open Stack
>
> Red Hat Red Hat Switzerland S.A. 
>
> kr...@redhat.comM: +41-79-5936260
> 
> TRIED. TESTED. TRUSTED. 
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-24 Thread Krist van Besien
Hi all,

I usualy advise clients to use the native client if at all possible, as it
is very robust. But I am running in to problems here.

In this case the gluster system is used to store video streams. Basicaly
the setup is the following:
- A gluster cluster of 3 nodes, with ample storage. They export several
volumes.
- The network is 10GB, switched.
- A "recording server" which subscribes to multi cast video streams, and
records them to disk. The recorder writes the streams in 10s blocks, so
when it is for example recording 50 streams it is creating 5 files a
second, each about 5M. it uses a write-then-rename process.

I simulated that with a small script, that wrote 5M files and renamed them
as fast as it could, and could easily create around 100 files/s (which
abouts saturates the network). So I think the cluster is up to the task.

However if we try the actualy workload we run in to trouble. Running the
recorder software we can gradually ramp up the number of streams it records
(and thus the number of files it creates), and at arou d 50 streams the
recorder eventually stops writing files. According to the programmers that
wrote it, it appears that it can no longer get the needed locks¸ and as a
result just stops writing.

We decided to test using the NFS client as well, and there the problem does
not exist. But again, I (and the customer) would prefer not to use NFS, but
use the native client in stead.

So if the problem is file locking, and the problem exists with the native
client, and not using NFS, what could be the cause?

In what way do locking differ between the two different file systems,
between NFS and Fuse, and how can the programmers work around any issues
the fuse client might be causing?

This video stream software is a bespoke solution, developped in house and
it is thus possible to change the way it handles files so it works with the
native client, but the programmers are looking at me for guidance.

Any suggestions?

Krist




-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. 

kr...@redhat.comM: +41-79-5936260

TRIED. TESTED. TRUSTED. 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users