On 04/11/2013 07:25 PM, Ziemowit Pierzycki wrote:
No, I'm not using RDMA in this configuration since this will eventually
get deployed to production with 10G ethernet (yes RDMA is faster). I
would prefer Ceph because it has a storage drive built into OpenNebula
which my company is using and as you mentioned individual drives.
I'm not sure what the problem is but it appears to me that one of the
hosts may be holding up the rest... with Ceph if the performance of one
of the hosts is much faster than others could this potentially slow down
the cluster to this level?
Definitely! Even 1 slow OSD can cause dramatic slow downs. This is
because we (by default) try to distribute data evenly to every OSD in
the cluster. If even 1 OSD is really slow, it will accumulate more and
more outstanding operations while all of the other OSDs complete their
requests. What will happen is that eventually you will have all of your
outstanding operations waiting on that slow OSD, and all of the other
OSDs will sit idle waiting for new requests.
If you know that some OSDs are permanently slower than others, you can
re-weight them so that they receive fewer requests than the others which
can mitigate this, but that isn't always an optimal solution. Some
times a slow OSD can be a sign of other hardware problems too.
Mark
On Thu, Apr 11, 2013 at 7:42 AM, Mark Nelson <mark.nel...@inktank.com
<mailto:mark.nel...@inktank.com>> wrote:
With GlusterFS are you using the native RDMA support?
Ceph and Gluster tend to prefer pretty different disk setups too.
Afaik RH still recommends RAID6 beind each brick while we do
better with individual disks behind each OSD. You might want to
watch the OSD admin socket and see if operations are backing up on
any specific OSDs.
Mark
On 04/09/2013 12:54 PM, Ziemowit Pierzycki wrote:
Neither made a difference. I also have a glusterFS cluster with two
nodes in replicating mode residing on 1TB drives:
[root@triton speed]# dd conv=fdatasync if=/dev/zero
of=/mnt/speed/test.out bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 43.573 s, 120 MB/s
... and Ceph:
[root@triton temp]# dd conv=fdatasync if=/dev/zero
of=/mnt/temp/test.out
bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 366.911 s, 14.3 MB/s
On Mon, Apr 8, 2013 at 4:29 PM, Mark Nelson
<mark.nel...@inktank.com <mailto:mark.nel...@inktank.com>
<mailto:mark.nelson@inktank.__com
<mailto:mark.nel...@inktank.com>>> wrote:
On 04/08/2013 04:12 PM, Ziemowit Pierzycki wrote:
There is one SSD in each node. IPoIB performance is
about 7 gbps
between each host. CephFS is mounted via kernel
client. Ceph
version
is ceph-0.56.3-1. I have a 1GB journal on the same
drive as the
OSD but
on a seperate file system split via LVM.
Here is output of another test with fdatasync:
[root@triton temp]# dd conv=fdatasync if=/dev/zero
of=/mnt/temp/test.out
bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 359.307 s, 14.6 MB/s
[root@triton temp]# dd if=/mnt/temp/test.out
of=/dev/null bs=512k
count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 14.0521 s, 373 MB/s
Definitely seems off! How many SSDs are involved and how
fast are
they each? The MTU idea might have merit, but I honestly
don't know
enough about how well IPoIB handles giant MTUs like that.
One thing
I have noticed on other IPoIB setups is that TCP autotuning can
cause a ton of problems. You may want to try disabling it
on all of
the hosts involved:
echo 0 | tee /proc/sys/net/ipv4/tcp_____moderate_rcvbuf
If that doesn't work, maybe try setting MTU to 9000 or 1500
if possible.
Mark
The network traffic appears to match the transfer
speeds shown
here too.
Writing is very slow.
On Mon, Apr 8, 2013 at 3:04 PM, Mark Nelson
<mark.nel...@inktank.com
<mailto:mark.nel...@inktank.com>
<mailto:mark.nelson@inktank.__com <mailto:mark.nel...@inktank.com>>
<mailto:mark.nelson@inktank.
<mailto:mark.nelson@inktank.>____com
<mailto:mark.nelson@inktank.__com
<mailto:mark.nel...@inktank.com>>>> wrote:
Hi,
How many drives? Have you tested your IPoIB
performance
with iperf?
Is this CephFS with the kernel client? What
version of
Ceph? How
are your journals configured? etc. It's tough to
make any
recommendations without knowing more about what
you are doing.
Also, please use conv=fdatasync when doing buffered IO
writes with dd.
Thanks,
Mark
On 04/08/2013 03:00 PM, Ziemowit Pierzycki wrote:
Hi,
The first test was writing 500 mb file and was
clocked
at 1.2
GBps. The
second test was writing 5000 mb file at 17
MBps. The
third test was
reading the file at ~400 MBps.
On Mon, Apr 8, 2013 at 2:56 PM, Gregory Farnum
<g...@inktank.com <mailto:g...@inktank.com>
<mailto:g...@inktank.com <mailto:g...@inktank.com>>
<mailto:g...@inktank.com
<mailto:g...@inktank.com> <mailto:g...@inktank.com
<mailto:g...@inktank.com>>>
<mailto:g...@inktank.com
<mailto:g...@inktank.com> <mailto:g...@inktank.com
<mailto:g...@inktank.com>>
<mailto:g...@inktank.com <mailto:g...@inktank.com>
<mailto:g...@inktank.com <mailto:g...@inktank.com>>>>> wrote:
More details, please. You ran the same
test twice and
performance went
up from 17.5MB/s to 394MB/s? How many
drives in
each node,
and of what
kind?
-Greg
Software Engineer #42 @ http://inktank.com |
http://ceph.com
On Mon, Apr 8, 2013 at 12:38 PM, Ziemowit
Pierzycki
<ziemo...@pierzycki.com
<mailto:ziemo...@pierzycki.com>
<mailto:ziemo...@pierzycki.com
<mailto:ziemo...@pierzycki.com>__>
<mailto:ziemo...@pierzycki.com <mailto:ziemo...@pierzycki.com>
<mailto:ziemo...@pierzycki.com
<mailto:ziemo...@pierzycki.com>__>__>
<mailto:ziemo...@pierzycki.com
<mailto:ziemo...@pierzycki.com>
<mailto:ziemo...@pierzycki.com
<mailto:ziemo...@pierzycki.com>__>
<mailto:ziemo...@pierzycki.com
<mailto:ziemo...@pierzycki.com>
<mailto:ziemo...@pierzycki.com
<mailto:ziemo...@pierzycki.com>__>__>__>> wrote:
> Hi,
>
> I have a 3 node SSD-backed cluster
connected over
infiniband (16K
MTU) and
> here is the performance I am seeing:
>
> [root@triton temp]# !dd
> dd if=/dev/zero of=/mnt/temp/test.out
bs=512k
count=1000
> 1000+0 records in
> 1000+0 records out
> 524288000 bytes (524 MB) copied,
0.436249 s,
1.2 GB/s
> [root@triton temp]# dd if=/dev/zero
of=/mnt/temp/test.out bs=512k
> count=10000
> 10000+0 records in
> 10000+0 records out
> 5242880000 bytes (5.2 GB) copied,
299.077 s,
17.5 MB/s
> [root@triton temp]# dd
if=/mnt/temp/test.out
of=/dev/null bs=512k
> count=1000010000+0 records in
> 10000+0 records out
> 5242880000 bytes (5.2 GB) copied,
13.3015 s,
394 MB/s
>
> Does that look right? How do I check
this is
not a network
problem, because
> I remember seeing a kernel issue
related to
large MTU.
>
>
_____________________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
<mailto:ceph-users@lists.ceph.com>
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>
<mailto:ceph-us...@lists.ceph.
<mailto:ceph-us...@lists.ceph.>____com
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>>
<mailto:ceph-us...@lists.ceph
<mailto:ceph-us...@lists.ceph>.
<mailto:ceph-us...@lists.ceph
<mailto:ceph-us...@lists.ceph>.__>____com
<mailto:ceph-us...@lists.ceph.
<mailto:ceph-us...@lists.ceph.>____com
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>>>
>
http://lists.ceph.com/______listinfo.cgi/ceph-users-ceph.______com
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com>
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>>
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>>
>
_____________________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>
<mailto:ceph-us...@lists.ceph.
<mailto:ceph-us...@lists.ceph.>____com
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>>
http://lists.ceph.com/______listinfo.cgi/ceph-users-ceph.______com
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com>
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>>
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>>
_____________________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>
<mailto:ceph-us...@lists.ceph.
<mailto:ceph-us...@lists.ceph.>____com
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>>
http://lists.ceph.com/______listinfo.cgi/ceph-users-ceph.______com
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com>
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>>
<http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>>
___________________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>
http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>
___________________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
<mailto:ceph-us...@lists.ceph.__com
<mailto:ceph-users@lists.ceph.com>>
http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com