Re: [ceph-users] CephFS: concurrent access to the same file from multiple nodes

2017-08-07 Thread Andras Pataki

I've filed a tracker bug for this: http://tracker.ceph.com/issues/20938

Andras


On 08/01/2017 10:26 AM, Andras Pataki wrote:

Hi John,

Sorry for the delay, it took a bit of work to set up a luminous test 
environment.  I'm sorry to have to report that the 12.1.1 RC version 
also suffers from this problem - when two nodes open the same file for 
read/write, and read from it, the performance is awful (under 1 
operation/second).  The behavior is exactly the same as with the 
latest Jewel.


I'm running an all luminous setup (12.1.1 mon/mds/osds and fuse 
client).  My original mail has a small test program that easily 
reproduces the issue.  Let me know if there is anything I can help 
with for tracking the issue down further.


Andras


On 07/21/2017 05:41 AM, John Spray wrote:

On Thu, Jul 20, 2017 at 9:19 PM, Andras Pataki
 wrote:
We are having some difficulties with cephfs access to the same file 
from
multiple nodes concurrently.  After debugging some large-ish 
applications
with noticeable performance problems using CephFS (with the fuse 
client), I

have a small test program to reproduce the problem.

The core of the problem boils down to the following operation being 
run on

the same file on multiple nodes (in a loop in the test program):

 int fd = open(filename, mode);
 read(fd, buffer, 100);
 close(fd);

Here are some results on our cluster:

One node, mode=read-only: 7000 opens/second
One node, mode=read-write: 7000 opens/second
Two nodes, mode=read-only: 7000 opens/second/node
Two nodes, mode=read-write: around 0.5 opens/second/node (!!!)
Two nodes, one read-only, one read-write: around 0.5 
opens/second/node (!!!)
Two nodes, mode=read-write, but remove the 'read(fd, buffer,100)' 
line from

the code: 500 opens/second/node


So there seems to be some problems with opening the same file 
read/write and

reading from the file on multiple nodes.  That operation seems to be 3
orders of magnitude slower than other parallel access patterns to 
the same
file.  The 1 second time to open files almost seems like some 
timeout is

happening somewhere.  I have some suspicion that this has to do with
capability management between the fuse client and the MDS, but I 
don't know

enough about that protocol to make an educated assessment.

You're pretty much spot on.  Things happening at 0.5 per second is
characteristic of a particular class of bug where we are not flushing
the journal soon enough, and instead waiting for the next periodic
(every five second) flush.  Hence there is an average 2.5 second dely,
hence operations happening at approximately half an operation per
second.


[And an aside - how does this become a problem?  I.e. why open a file
read/write and read from it?  Well, it turns out gfortran compiled 
code does

this by default if the user doesn't explicitly says otherwise].

All the nodes in this test are very lightly loaded, so there does 
not seems
to be any noticeable performance bottleneck (network, CPU, etc.).  
The code
to reproduce the problem is attached.  Simply compile it, create a 
test file
with a few bytes of data in it, and run the test code on two 
separate nodes

on the same file.

We are running ceph 10.2.9 both on the server, and we use the 10.2.9 
fuse

client on the client nodes.

Any input/help would be greatly appreciated.

If you have a test/staging environment, it would be great if you could
re-test this on the 12.1.1 release candidate.  There have been MDS
fixes for similar slowdowns that were shown up in multi-mds testing,
so it's possible that the issue you're seeing here was fixed along the
way.

John


Andras


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: concurrent access to the same file from multiple nodes

2017-08-01 Thread Andras Pataki

Hi John,

Sorry for the delay, it took a bit of work to set up a luminous test 
environment.  I'm sorry to have to report that the 12.1.1 RC version 
also suffers from this problem - when two nodes open the same file for 
read/write, and read from it, the performance is awful (under 1 
operation/second).  The behavior is exactly the same as with the latest 
Jewel.


I'm running an all luminous setup (12.1.1 mon/mds/osds and fuse 
client).  My original mail has a small test program that easily 
reproduces the issue.  Let me know if there is anything I can help with 
for tracking the issue down further.


Andras


On 07/21/2017 05:41 AM, John Spray wrote:

On Thu, Jul 20, 2017 at 9:19 PM, Andras Pataki
 wrote:

We are having some difficulties with cephfs access to the same file from
multiple nodes concurrently.  After debugging some large-ish applications
with noticeable performance problems using CephFS (with the fuse client), I
have a small test program to reproduce the problem.

The core of the problem boils down to the following operation being run on
the same file on multiple nodes (in a loop in the test program):

 int fd = open(filename, mode);
 read(fd, buffer, 100);
 close(fd);

Here are some results on our cluster:

One node, mode=read-only: 7000 opens/second
One node, mode=read-write: 7000 opens/second
Two nodes, mode=read-only: 7000 opens/second/node
Two nodes, mode=read-write: around 0.5 opens/second/node (!!!)
Two nodes, one read-only, one read-write: around 0.5 opens/second/node (!!!)
Two nodes, mode=read-write, but remove the 'read(fd, buffer,100)' line from
the code: 500 opens/second/node


So there seems to be some problems with opening the same file read/write and
reading from the file on multiple nodes.  That operation seems to be 3
orders of magnitude slower than other parallel access patterns to the same
file.  The 1 second time to open files almost seems like some timeout is
happening somewhere.  I have some suspicion that this has to do with
capability management between the fuse client and the MDS, but I don't know
enough about that protocol to make an educated assessment.

You're pretty much spot on.  Things happening at 0.5 per second is
characteristic of a particular class of bug where we are not flushing
the journal soon enough, and instead waiting for the next periodic
(every five second) flush.  Hence there is an average 2.5 second dely,
hence operations happening at approximately half an operation per
second.


[And an aside - how does this become a problem?  I.e. why open a file
read/write and read from it?  Well, it turns out gfortran compiled code does
this by default if the user doesn't explicitly says otherwise].

All the nodes in this test are very lightly loaded, so there does not seems
to be any noticeable performance bottleneck (network, CPU, etc.).  The code
to reproduce the problem is attached.  Simply compile it, create a test file
with a few bytes of data in it, and run the test code on two separate nodes
on the same file.

We are running ceph 10.2.9 both on the server, and we use the 10.2.9 fuse
client on the client nodes.

Any input/help would be greatly appreciated.

If you have a test/staging environment, it would be great if you could
re-test this on the 12.1.1 release candidate.  There have been MDS
fixes for similar slowdowns that were shown up in multi-mds testing,
so it's possible that the issue you're seeing here was fixed along the
way.

John


Andras


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: concurrent access to the same file from multiple nodes

2017-07-21 Thread John Spray
On Thu, Jul 20, 2017 at 9:19 PM, Andras Pataki
 wrote:
> We are having some difficulties with cephfs access to the same file from
> multiple nodes concurrently.  After debugging some large-ish applications
> with noticeable performance problems using CephFS (with the fuse client), I
> have a small test program to reproduce the problem.
>
> The core of the problem boils down to the following operation being run on
> the same file on multiple nodes (in a loop in the test program):
>
> int fd = open(filename, mode);
> read(fd, buffer, 100);
> close(fd);
>
> Here are some results on our cluster:
>
> One node, mode=read-only: 7000 opens/second
> One node, mode=read-write: 7000 opens/second
> Two nodes, mode=read-only: 7000 opens/second/node
> Two nodes, mode=read-write: around 0.5 opens/second/node (!!!)
> Two nodes, one read-only, one read-write: around 0.5 opens/second/node (!!!)
> Two nodes, mode=read-write, but remove the 'read(fd, buffer,100)' line from
> the code: 500 opens/second/node
>
>
> So there seems to be some problems with opening the same file read/write and
> reading from the file on multiple nodes.  That operation seems to be 3
> orders of magnitude slower than other parallel access patterns to the same
> file.  The 1 second time to open files almost seems like some timeout is
> happening somewhere.  I have some suspicion that this has to do with
> capability management between the fuse client and the MDS, but I don't know
> enough about that protocol to make an educated assessment.

You're pretty much spot on.  Things happening at 0.5 per second is
characteristic of a particular class of bug where we are not flushing
the journal soon enough, and instead waiting for the next periodic
(every five second) flush.  Hence there is an average 2.5 second dely,
hence operations happening at approximately half an operation per
second.

> [And an aside - how does this become a problem?  I.e. why open a file
> read/write and read from it?  Well, it turns out gfortran compiled code does
> this by default if the user doesn't explicitly says otherwise].
>
> All the nodes in this test are very lightly loaded, so there does not seems
> to be any noticeable performance bottleneck (network, CPU, etc.).  The code
> to reproduce the problem is attached.  Simply compile it, create a test file
> with a few bytes of data in it, and run the test code on two separate nodes
> on the same file.
>
> We are running ceph 10.2.9 both on the server, and we use the 10.2.9 fuse
> client on the client nodes.
>
> Any input/help would be greatly appreciated.

If you have a test/staging environment, it would be great if you could
re-test this on the 12.1.1 release candidate.  There have been MDS
fixes for similar slowdowns that were shown up in multi-mds testing,
so it's possible that the issue you're seeing here was fixed along the
way.

John

>
> Andras
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS: concurrent access to the same file from multiple nodes

2017-07-20 Thread Andras Pataki
We are having some difficulties with cephfs access to the same file from 
multiple nodes concurrently.  After debugging some large-ish 
applications with noticeable performance problems using CephFS (with the 
fuse client), I have a small test program to reproduce the problem.


The core of the problem boils down to the following operation being run 
on the same file on multiple nodes (in a loop in the test program):


int fd = open(filename, mode);
read(fd, buffer, 100);
close(fd);

Here are some results on our cluster:

 * One node, mode=read-only: 7000 opens/second
 * One node, mode=read-write: 7000 opens/second
 * Two nodes, mode=read-only: 7000 opens/second/node
 * Two nodes, mode=read-write: around *0.5 opens/second/node* (!!!)
 * Two nodes, one read-only, one read-write: around *0.5
   opens/second/node* (!!!)
 * Two nodes, mode=read-write, but remove the 'read(fd, buffer,100)'
   line from the code: 500 opens/second/node


So there seems to be some problems with opening the same file read/write 
and reading from the file on multiple nodes.  That operation seems to be 
3 orders of magnitude slower than other parallel access patterns to the 
same file.  The 1 second time to open files almost seems like some 
timeout is happening somewhere.  I have some suspicion that this has to 
do with capability management between the fuse client and the MDS, but I 
don't know enough about that protocol to make an educated assessment.


[And an aside - how does this become a problem?  I.e. why open a file 
read/write and read from it?  Well, it turns out gfortran compiled code 
does this by default if the user doesn't explicitly says otherwise].


All the nodes in this test are very lightly loaded, so there does not 
seems to be any noticeable performance bottleneck (network, CPU, etc.).  
The code to reproduce the problem is attached.  Simply compile it, 
create a test file with a few bytes of data in it, and run the test code 
on two separate nodes on the same file.


We are running ceph 10.2.9 both on the server, and we use the 10.2.9 
fuse client on the client nodes.


Any input/help would be greatly appreciated.

Andras

#include 
#include 
#include 
#include 
#include 
#include 

#define INTERVAL 2


double now()
{
struct timeval tv;
gettimeofday(&tv, NULL);

return tv.tv_sec + tv.tv_usec / 1e6;
}


int main(int argc, char *argv[])
{
if (argc != 3) {
fprintf(stderr, "Usage: %s  r|rw\n", argv[0]);
exit(1);
}

const char *filename = argv[1];

int mode = 0;
if (strcmp(argv[2], "r") == 0) {
mode = O_RDONLY;
} else if (strcmp(argv[2], "rw") == 0) {
mode = O_RDWR;
} else {
fprintf(stderr, "Second argument must be 'r' or 'rw'\n");
exit(1);
}

while (1) {

char buffer[100];
double t0 = now();
double dt;
int count = 0;

while (1) {
dt = now() - t0;
if (dt > INTERVAL) {
break;
}

int fd = open(filename, mode);
if (fd < 0) {
printf("Could not open file '%s' for read/write", filename);
exit(1);
}

read(fd, buffer, 100);

close(fd);
count++;
}

printf("File open rate: %8.2f\n", count / dt);

}

return 0;
}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com