Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

2016-04-27 Thread Mike Miller

Nick, all,

fantastic, that did it!

I installed kernel 4.5.2 on the client, now the single threaded read 
performance using krbd mount is up to about 370 MB/s with standard 256 
readahead size, and touching 400 MB/s with larger readahead sizes.

All single threaded.

Multi-threaded krbd read on the same mount also improved, a 10 GBit/s 
network connection is easily saturated.


Thank you all so much for the discussion and the hints.

Regards,

Mike

On 4/23/16 9:51 AM, n...@fisk.me.uk wrote:

I've just looked through github for the Linux kernel and it looks like
that read ahead fix was introduced in 4.4, so I'm not sure if it's worth
trying a slightly newer kernel?

Sent from Nine
<http://xo4t.mj.am/link/xo4t/x07j8u1663zy/1/QWFy6wfcMj4vvr1tAIpuQA/aHR0cDovL3d3dy45Zm9sZGVycy5jb20v>

*From:* Mike Miller <millermike...@gmail.com>
*Sent:* 21 Apr 2016 2:20 pm
*To:* ceph-users@lists.ceph.com
*Subject:* Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

Hi Udo,

thanks, just to make sure, further increased the readahead:

$ sudo blockdev --getra /dev/rbd0
1048576

$ cat /sys/block/rbd0/queue/read_ahead_kb
524288

No difference here. First one is sectors (512 bytes), second one KB.

The second read (after drop cache) is somewhat faster (10%-20%) but not
much.

I also found this info
http://tracker.ceph.com/issues/9192

Maybe Ilya can help us, he knows probably best how this can be improved.

Thanks and cheers,

Mike


On 4/21/16 4:32 PM, Udo Lembke wrote:

Hi Mike,

Am 21.04.2016 um 09:07 schrieb Mike Miller:

Hi Nick and Udo,

thanks, very helpful, I tweaked some of the config parameters along
the line Udo suggests, but still only some 80 MB/s or so.

this mean you have reached factor 3 (this are round about the value I
see with single thread on RBD too). Better than nothing.



Kernel 4.3.4 running on the client machine and comfortable readahead
configured

$ sudo blockdev --getra /dev/rbd0
262144

Still not more than about 80-90 MB/s.

they are two possibilities for read-ahead.
Take a look here (and change with echo)
cat /sys/block/rbd0/queue/read_ahead_kb

Perhaps there are slightly differences?



For writing the parallelization is amazing and I see very impressive
speeds, but why is reading performance so much behind? Why is it not
parallelized the same way writing is? Is this something coming up in
the jewel release? Or is it planned further down the road?

If you read an big file and clear your cache ("echo 3 >
/proc/sys/vm/drop_caches") on the client, is the second read very fast?
I assume yes.
In this case the readed data is in the cache on the osd-nodes... so
tuning must be there (and I'm very interesting in improvements).

Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

2016-04-25 Thread Udo Lembke

Hi Mike,

Am 21.04.2016 um 15:20 schrieb Mike Miller:

Hi Udo,

thanks, just to make sure, further increased the readahead:

$ sudo blockdev --getra /dev/rbd0
1048576

$ cat /sys/block/rbd0/queue/read_ahead_kb
524288

No difference here. First one is sectors (512 bytes), second one KB.

oops, sorry! My fault. Sector/KB make sense...


The second read (after drop cache) is somewhat faster (10%-20%) but 
not much.
That's very strange! Looks like tuning possibilities. Has your OSD-Nodes 
enough RAM? Are they very very busy?


If I do single thread reading on a test-vm I got following results (very 
small test-cluster - 2 nodes with 10GB-Nic and one Node with 1GB-Nic):

support@upgrade-test:~/fio$ dd if=fiojo.0.0 of=/dev/null bs=1M
4096+0 Datensätze ein
4096+0 Datensätze aus
4294967296 Bytes (4,3 GB) kopiert, 62,0267 s, 69,2 MB/s

### as root "echo 3 > /proc/sys/vm/drop_caches" and the same on the VM-host

support@upgrade-test:~/fio$ dd if=fiojo.0.0 of=/dev/null bs=1M
4096+0 Datensätze ein
4096+0 Datensätze aus
4294967296 Bytes (4,3 GB) kopiert, 30,0987 s, 143 MB/s

# this is due to cached data on the osd-nodes
# with cleared cache on all nodes (vm, vm-host, osd-nodes)
# I got the value like on the first run:

support@upgrade-test:~/fio$ dd if=fiojo.0.0 of=/dev/null bs=1M
4096+0 Datensätze ein
4096+0 Datensätze aus
4294967296 Bytes (4,3 GB) kopiert, 61,8995 s, 69,4 MB/s

I don't know why this should not the same with krbd.


Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

2016-04-23 Thread nick
I've just looked through github for the Linux kernel and it looks like that 
read ahead fix was introduced in 4.4, so I'm not sure if it's worth trying a 
slightly newer kernel?

Sent from Nine

From: Mike Miller <millermike...@gmail.com>
Sent: 21 Apr 2016 2:20 pm
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

Hi Udo, 

thanks, just to make sure, further increased the readahead: 

$ sudo blockdev --getra /dev/rbd0 
1048576 

$ cat /sys/block/rbd0/queue/read_ahead_kb 
524288 

No difference here. First one is sectors (512 bytes), second one KB. 

The second read (after drop cache) is somewhat faster (10%-20%) but not 
much. 

I also found this info 
http://tracker.ceph.com/issues/9192 

Maybe Ilya can help us, he knows probably best how this can be improved. 

Thanks and cheers, 

Mike 


On 4/21/16 4:32 PM, Udo Lembke wrote: 
> Hi Mike, 
> 
> Am 21.04.2016 um 09:07 schrieb Mike Miller: 
>> Hi Nick and Udo, 
>> 
>> thanks, very helpful, I tweaked some of the config parameters along 
>> the line Udo suggests, but still only some 80 MB/s or so. 
> this mean you have reached factor 3 (this are round about the value I 
> see with single thread on RBD too). Better than nothing. 
> 
>> 
>> Kernel 4.3.4 running on the client machine and comfortable readahead 
>> configured 
>> 
>> $ sudo blockdev --getra /dev/rbd0 
>> 262144 
>> 
>> Still not more than about 80-90 MB/s. 
> they are two possibilities for read-ahead. 
> Take a look here (and change with echo) 
> cat /sys/block/rbd0/queue/read_ahead_kb 
> 
> Perhaps there are slightly differences? 
> 
>> 
>> For writing the parallelization is amazing and I see very impressive 
>> speeds, but why is reading performance so much behind? Why is it not 
>> parallelized the same way writing is? Is this something coming up in 
>> the jewel release? Or is it planned further down the road? 
> If you read an big file and clear your cache ("echo 3 > 
> /proc/sys/vm/drop_caches") on the client, is the second read very fast? 
> I assume yes. 
> In this case the readed data is in the cache on the osd-nodes... so 
> tuning must be there (and I'm very interesting in improvements). 
> 
> Udo 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

2016-04-21 Thread Mike Miller

Hi Udo,

thanks, just to make sure, further increased the readahead:

$ sudo blockdev --getra /dev/rbd0
1048576

$ cat /sys/block/rbd0/queue/read_ahead_kb
524288

No difference here. First one is sectors (512 bytes), second one KB.

The second read (after drop cache) is somewhat faster (10%-20%) but not 
much.


I also found this info
http://tracker.ceph.com/issues/9192

Maybe Ilya can help us, he knows probably best how this can be improved.

Thanks and cheers,

Mike


On 4/21/16 4:32 PM, Udo Lembke wrote:

Hi Mike,

Am 21.04.2016 um 09:07 schrieb Mike Miller:

Hi Nick and Udo,

thanks, very helpful, I tweaked some of the config parameters along
the line Udo suggests, but still only some 80 MB/s or so.

this mean you have reached factor 3 (this are round about the value I
see with single thread on RBD too). Better than nothing.



Kernel 4.3.4 running on the client machine and comfortable readahead
configured

$ sudo blockdev --getra /dev/rbd0
262144

Still not more than about 80-90 MB/s.

they are two possibilities for read-ahead.
Take a look here (and change with echo)
cat /sys/block/rbd0/queue/read_ahead_kb

Perhaps there are slightly differences?



For writing the parallelization is amazing and I see very impressive
speeds, but why is reading performance so much behind? Why is it not
parallelized the same way writing is? Is this something coming up in
the jewel release? Or is it planned further down the road?

If you read an big file and clear your cache ("echo 3 >
/proc/sys/vm/drop_caches") on the client, is the second read very fast?
I assume yes.
In this case the readed data is in the cache on the osd-nodes... so
tuning must be there (and I'm very interesting in improvements).

Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

2016-04-21 Thread Udo Lembke

Hi Mike,

Am 21.04.2016 um 09:07 schrieb Mike Miller:

Hi Nick and Udo,

thanks, very helpful, I tweaked some of the config parameters along 
the line Udo suggests, but still only some 80 MB/s or so.
this mean you have reached factor 3 (this are round about the value I 
see with single thread on RBD too). Better than nothing.




Kernel 4.3.4 running on the client machine and comfortable readahead 
configured


$ sudo blockdev --getra /dev/rbd0
262144

Still not more than about 80-90 MB/s.

they are two possibilities for read-ahead.
Take a look here (and change with echo)
cat /sys/block/rbd0/queue/read_ahead_kb

Perhaps there are slightly differences?



For writing the parallelization is amazing and I see very impressive 
speeds, but why is reading performance so much behind? Why is it not 
parallelized the same way writing is? Is this something coming up in 
the jewel release? Or is it planned further down the road?
If you read an big file and clear your cache ("echo 3 > 
/proc/sys/vm/drop_caches") on the client, is the second read very fast? 
I assume yes.
In this case the readed data is in the cache on the osd-nodes... so 
tuning must be there (and I'm very interesting in improvements).


Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

2016-04-21 Thread Mike Miller

Hi Nick and Udo,

thanks, very helpful, I tweaked some of the config parameters along the 
line Udo suggests, but still only some 80 MB/s or so.


Kernel 4.3.4 running on the client machine and comfortable readahead 
configured


$ sudo blockdev --getra /dev/rbd0
262144

Still not more than about 80-90 MB/s.

For writing the parallelization is amazing and I see very impressive 
speeds, but why is reading performance so much behind? Why is it not 
parallelized the same way writing is? Is this something coming up in the 
jewel release? Or is it planned further down the road?


Please let me know if there is a way to enable clients better single 
threaded read performance for large files.


Thanks and regards,

Mike

On 4/20/16 10:43 PM, Nick Fisk wrote:




-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Udo Lembke
Sent: 20 April 2016 07:21
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

Hi Mike,
I don't have experiences with RBD mounts, but see the same effect with
RBD.

You can do some tuning to get better results (disable debug and so on).

As hint some values from a ceph.conf:
[osd]
 debug asok = 0/0
 debug auth = 0/0
 debug buffer = 0/0
 debug client = 0/0
 debug context = 0/0
 debug crush = 0/0
 debug filer = 0/0
 debug filestore = 0/0
 debug finisher = 0/0
 debug heartbeatmap = 0/0
 debug journal = 0/0
 debug journaler = 0/0
 debug lockdep = 0/0
 debug mds = 0/0
 debug mds balancer = 0/0
 debug mds locker = 0/0
 debug mds log = 0/0
 debug mds log expire = 0/0
 debug mds migrator = 0/0
 debug mon = 0/0
 debug monc = 0/0
 debug ms = 0/0
 debug objclass = 0/0
 debug objectcacher = 0/0
 debug objecter = 0/0
 debug optracker = 0/0
 debug osd = 0/0
 debug paxos = 0/0
 debug perfcounter = 0/0
 debug rados = 0/0
 debug rbd = 0/0
 debug rgw = 0/0
 debug throttle = 0/0
 debug timer = 0/0
 debug tp = 0/0
 filestore_op_threads = 4
 osd max backfills = 1
 osd mount options xfs =
"rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
 osd mkfs options xfs = "-f -i size=2048"
 osd recovery max active = 1
 osd_disk_thread_ioprio_class = idle
 osd_disk_thread_ioprio_priority = 7
 osd_disk_threads = 1
 osd_enable_op_tracker = false
 osd_op_num_shards = 10
 osd_op_num_threads_per_shard = 1
 osd_op_threads = 4

Udo

On 19.04.2016 11:21, Mike Miller wrote:

Hi,

RBD mount
ceph v0.94.5
6 OSD with 9 HDD each
10 GBit/s public and private networks
3 MON nodes 1Gbit/s network

A rbd mounted with btrfs filesystem format performs really badly when
reading. Tried readahead in all combinations but that does not help in
any way.

Write rates are very good in excess of 600 MB/s up to 1200 MB/s,
average 800 MB/s Read rates on the same mounted rbd are about 10-30
MB/s !?


What kernel are you running, older kernels had an issue where readahead was
capped at 2MB. In order to get good read speeds you need readahead set to
about 32MB+.




Of course, both writes and reads are from a single client machine with
a single write/read command. So I am looking at single threaded
performance.
Actually, I was hoping to see at least 200-300 MB/s when reading, but
I am seeing 10% of that at best.

Thanks for your help.

Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

2016-04-20 Thread Nick Fisk


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Udo Lembke
> Sent: 20 April 2016 07:21
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
> 
> Hi Mike,
> I don't have experiences with RBD mounts, but see the same effect with
> RBD.
> 
> You can do some tuning to get better results (disable debug and so on).
> 
> As hint some values from a ceph.conf:
> [osd]
>  debug asok = 0/0
>  debug auth = 0/0
>  debug buffer = 0/0
>  debug client = 0/0
>  debug context = 0/0
>  debug crush = 0/0
>  debug filer = 0/0
>  debug filestore = 0/0
>  debug finisher = 0/0
>  debug heartbeatmap = 0/0
>  debug journal = 0/0
>  debug journaler = 0/0
>  debug lockdep = 0/0
>  debug mds = 0/0
>  debug mds balancer = 0/0
>  debug mds locker = 0/0
>  debug mds log = 0/0
>  debug mds log expire = 0/0
>  debug mds migrator = 0/0
>  debug mon = 0/0
>  debug monc = 0/0
>  debug ms = 0/0
>  debug objclass = 0/0
>  debug objectcacher = 0/0
>  debug objecter = 0/0
>  debug optracker = 0/0
>  debug osd = 0/0
>  debug paxos = 0/0
>  debug perfcounter = 0/0
>  debug rados = 0/0
>  debug rbd = 0/0
>  debug rgw = 0/0
>  debug throttle = 0/0
>  debug timer = 0/0
>  debug tp = 0/0
>  filestore_op_threads = 4
>  osd max backfills = 1
>  osd mount options xfs =
> "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
>  osd mkfs options xfs = "-f -i size=2048"
>  osd recovery max active = 1
>  osd_disk_thread_ioprio_class = idle
>  osd_disk_thread_ioprio_priority = 7
>  osd_disk_threads = 1
>  osd_enable_op_tracker = false
>  osd_op_num_shards = 10
>  osd_op_num_threads_per_shard = 1
>  osd_op_threads = 4
> 
> Udo
> 
> On 19.04.2016 11:21, Mike Miller wrote:
> > Hi,
> >
> > RBD mount
> > ceph v0.94.5
> > 6 OSD with 9 HDD each
> > 10 GBit/s public and private networks
> > 3 MON nodes 1Gbit/s network
> >
> > A rbd mounted with btrfs filesystem format performs really badly when
> > reading. Tried readahead in all combinations but that does not help in
> > any way.
> >
> > Write rates are very good in excess of 600 MB/s up to 1200 MB/s,
> > average 800 MB/s Read rates on the same mounted rbd are about 10-30
> > MB/s !?

What kernel are you running, older kernels had an issue where readahead was
capped at 2MB. In order to get good read speeds you need readahead set to
about 32MB+.


> >
> > Of course, both writes and reads are from a single client machine with
> > a single write/read command. So I am looking at single threaded
> > performance.
> > Actually, I was hoping to see at least 200-300 MB/s when reading, but
> > I am seeing 10% of that at best.
> >
> > Thanks for your help.
> >
> > Mike
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5

2016-04-20 Thread Udo Lembke
Hi Mike,
I don't have experiences with RBD mounts, but see the same effect with RBD.

You can do some tuning to get better results (disable debug and so on).

As hint some values from a ceph.conf:
[osd]
 debug asok = 0/0
 debug auth = 0/0
 debug buffer = 0/0
 debug client = 0/0
 debug context = 0/0
 debug crush = 0/0
 debug filer = 0/0
 debug filestore = 0/0
 debug finisher = 0/0
 debug heartbeatmap = 0/0
 debug journal = 0/0
 debug journaler = 0/0
 debug lockdep = 0/0
 debug mds = 0/0
 debug mds balancer = 0/0
 debug mds locker = 0/0
 debug mds log = 0/0
 debug mds log expire = 0/0
 debug mds migrator = 0/0
 debug mon = 0/0
 debug monc = 0/0
 debug ms = 0/0
 debug objclass = 0/0
 debug objectcacher = 0/0
 debug objecter = 0/0
 debug optracker = 0/0
 debug osd = 0/0
 debug paxos = 0/0
 debug perfcounter = 0/0
 debug rados = 0/0
 debug rbd = 0/0
 debug rgw = 0/0
 debug throttle = 0/0
 debug timer = 0/0
 debug tp = 0/0
 filestore_op_threads = 4
 osd max backfills = 1
 osd mount options xfs =
"rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
 osd mkfs options xfs = "-f -i size=2048"
 osd recovery max active = 1
 osd_disk_thread_ioprio_class = idle
 osd_disk_thread_ioprio_priority = 7
 osd_disk_threads = 1
 osd_enable_op_tracker = false
 osd_op_num_shards = 10
 osd_op_num_threads_per_shard = 1
 osd_op_threads = 4

Udo

On 19.04.2016 11:21, Mike Miller wrote:
> Hi,
>
> RBD mount
> ceph v0.94.5
> 6 OSD with 9 HDD each
> 10 GBit/s public and private networks
> 3 MON nodes 1Gbit/s network
>
> A rbd mounted with btrfs filesystem format performs really badly when
> reading. Tried readahead in all combinations but that does not help in
> any way.
>
> Write rates are very good in excess of 600 MB/s up to 1200 MB/s,
> average 800 MB/s
> Read rates on the same mounted rbd are about 10-30 MB/s !?
>
> Of course, both writes and reads are from a single client machine with
> a single write/read command. So I am looking at single threaded
> performance.
> Actually, I was hoping to see at least 200-300 MB/s when reading, but
> I am seeing 10% of that at best.
>
> Thanks for your help.
>
> Mike
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com