Re: [ceph-users] write performance per disk
hi, yes i did a test now with 16 instances with 16 and 32 threads each. the absolute maximum was 1100mb/sec but network was still not seturated. all disks had the same load with about 110mb/sec - the maximum of the disks i got using direct access was 170/mb sec writes... this is not a too bad value... i will make more tests with 10 and 20 virt machines at the same time. do you think 110 mb per disk is the ceph maximum? (for 170 theoretical per disk) 110 per disks inclues journals also... thanx philipp Von: ceph-users [ceph-users-boun...@lists.ceph.com]" im Auftrag von "Mark Nelson [mark.nel...@inktank.com] Gesendet: Freitag, 04. Juli 2014 16:10 Bis: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] write performance per disk On 07/03/2014 08:11 AM, VELARTIS Philipp Dürhammer wrote: > Hi, > > I have a ceph cluster setup (with 45 sata disk journal on disks) and get > only 450mb/sec writes seq (maximum playing around with threads in rados > bench) with replica of 2 > > Which is about ~20Mb writes per disk (what y see in atop also) > theoretically with replica2 and having journals on disk should be 45 X > 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 > satas in reality have 120mb/sec so the theoretical output should be more. > > I would expect to have between 40-50mb/sec for each sata disk > > Can somebody confirm that he can reach this speed with a setup with > journals on the satas (with journals on ssd speed should be 100mb per disk)? > or does ceph only give about ¼ of the speed for a disk? (and not the ½ > as expected because of journals) > > My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for > ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for > osd traffic > with reads I can saturate the network but writes is far away. And I > would expect at least to saturate the 10gig with sequential writes also In addition to the advice wido is providing (which I wholeheartedly agree with!), you might want to check your controller/disk configuration. If you have journals on the same disks as the data, some times putting the disks into single-disk RAID0 LUNs with writeback cache enabled can help keep journal and data writes from causing seek contention. This only works if you have a controller with cache and a battery though. > > Thank you > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] write performance per disk
On 07/03/2014 08:11 AM, VELARTIS Philipp Dürhammer wrote: Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also In addition to the advice wido is providing (which I wholeheartedly agree with!), you might want to check your controller/disk configuration. If you have journals on the same disks as the data, some times putting the disks into single-disk RAID0 LUNs with writeback cache enabled can help keep journal and data writes from causing seek contention. This only works if you have a controller with cache and a battery though. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] write performance per disk
On 04/07/14 02:32, VELARTIS Philipp Dürhammer wrote: Ceph.conf: rbd cache = true rbd cache size = 2147483648 rbd cache max dirty = 1073741824 Just a FYI - I posted a setting very like this in another thread, and remarked that it was "aggressive" - probably too much so for any but single purpose benchmarks using a handful of VMs - as it will add 2G to the memory footprint of *each* one [1]. If that was your intention, no worries - but I thought I should clarify it before I'm responsible for everyone eating all their RAM :-) Cheers Mark [1] e.g your qemu-system-x86_64 will expand its memory consumption by approx this amount. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] write performance per disk
On 07/04/2014 11:40 AM, VELARTIS Philipp Dürhammer wrote: I use between 1 and 128 in different steps... But 500mb write is the max playing around. I just mentioned it in a different thread, make sure you do parallel I/O! That's where Ceph really makes the difference. Run rados bench from multiple clients. Uff its so hard to tune ceph... so many people have problems... ;-) No, Ceph is simply different from any other storage. Distributed storage is a lot different in terms of performance from existing storage projects/products. Wido -Ursprüngliche Nachricht- Von: Wido den Hollander [mailto:w...@42on.com] Gesendet: Freitag, 04. Juli 2014 10:55 An: VELARTIS Philipp Dürhammer; ceph-users@lists.ceph.com Betreff: Re: AW: [ceph-users] write performance per disk On 07/03/2014 04:32 PM, VELARTIS Philipp Dürhammer wrote: HI, Ceph.conf: osd journal size = 15360 rbd cache = true rbd cache size = 2147483648 rbd cache max dirty = 1073741824 rbd cache max dirty age = 100 osd recovery max active = 1 osd max backfills = 1 osd mkfs options xfs = "-f -i size=2048" osd mount options xfs = "rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M" osd op threads = 8 so it should be 8 threads? How many threads are you using with rados bench? Don't touch the op threads from the start, usually the default is just fine. All 3 machines have more or less the same disk load at the same time. also the disks: sdb 35.5687.10 6849.09 617310 48540806 sdc 26.7572.62 5148.58 514701 36488992 sdd 35.1553.48 6802.57 378993 48211141 sde 31.0479.04 6208.48 560141 44000710 sdf 32.7938.35 6238.28 271805 44211891 sdg 31.6777.84 5987.45 551680 42434167 sdh 32.9551.29 6315.76 363533 44761001 sdi 31.6756.93 5956.29 403478 42213336 sdj 35.8377.82 6929.31 551501 49109354 sdk 36.8673.84 7291.00 523345 51672704 sdl 36.02 112.90 7040.47 800177 49897132 sdm 33.2538.02 6455.05 269446 45748178 sdn 33.5239.10 6645.19 277101 47095696 sdo 33.2646.22 6388.20 327541 45274394 sdp 33.3874.12 6480.62 525325 45929369 the question is: is this a poor performance to get max 500mb/write with 45 disks and replica 2 or should I expect this? You should be able to get more as long as the I/O is done in parallel. Wido -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Wido den Hollander Gesendet: Donnerstag, 03. Juli 2014 15:22 An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] write performance per disk On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote: Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 How many threads? Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) Did you verify how much each machine is doing? It could be that the data is not distributed evenly and that on a certain machine the drives are doing 50MB/sec. My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also Should be possible, but with 3 servers the data distribution might not be optimal causing a lower write performance. I've seen 10Gbit write performance on multiple clusters without any problems. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http
Re: [ceph-users] write performance per disk
I use between 1 and 128 in different steps... But 500mb write is the max playing around. Uff its so hard to tune ceph... so many people have problems... ;-) -Ursprüngliche Nachricht- Von: Wido den Hollander [mailto:w...@42on.com] Gesendet: Freitag, 04. Juli 2014 10:55 An: VELARTIS Philipp Dürhammer; ceph-users@lists.ceph.com Betreff: Re: AW: [ceph-users] write performance per disk On 07/03/2014 04:32 PM, VELARTIS Philipp Dürhammer wrote: > HI, > > Ceph.conf: > osd journal size = 15360 > rbd cache = true > rbd cache size = 2147483648 > rbd cache max dirty = 1073741824 > rbd cache max dirty age = 100 > osd recovery max active = 1 > osd max backfills = 1 > osd mkfs options xfs = "-f -i size=2048" > osd mount options xfs = > "rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M" > osd op threads = 8 > > so it should be 8 threads? > How many threads are you using with rados bench? Don't touch the op threads from the start, usually the default is just fine. > All 3 machines have more or less the same disk load at the same time. > also the disks: > sdb 35.5687.10 6849.09 617310 48540806 > sdc 26.7572.62 5148.58 514701 36488992 > sdd 35.1553.48 6802.57 378993 48211141 > sde 31.0479.04 6208.48 560141 44000710 > sdf 32.7938.35 6238.28 271805 44211891 > sdg 31.6777.84 5987.45 551680 42434167 > sdh 32.9551.29 6315.76 363533 44761001 > sdi 31.6756.93 5956.29 403478 42213336 > sdj 35.8377.82 6929.31 551501 49109354 > sdk 36.8673.84 7291.00 523345 51672704 > sdl 36.02 112.90 7040.47 800177 49897132 > sdm 33.2538.02 6455.05 269446 45748178 > sdn 33.5239.10 6645.19 277101 47095696 > sdo 33.2646.22 6388.20 327541 45274394 > sdp 33.3874.12 6480.62 525325 45929369 > > > the question is: is this a poor performance to get max 500mb/write with 45 > disks and replica 2 or should I expect this? > You should be able to get more as long as the I/O is done in parallel. Wido > > -Ursprüngliche Nachricht- > Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag > von Wido den Hollander > Gesendet: Donnerstag, 03. Juli 2014 15:22 > An: ceph-users@lists.ceph.com > Betreff: Re: [ceph-users] write performance per disk > > On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote: >> Hi, >> >> I have a ceph cluster setup (with 45 sata disk journal on disks) and >> get only 450mb/sec writes seq (maximum playing around with threads in >> rados >> bench) with replica of 2 >> > > How many threads? > >> Which is about ~20Mb writes per disk (what y see in atop also) >> theoretically with replica2 and having journals on disk should be 45 >> X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 >> satas in reality have 120mb/sec so the theoretical output should be more. >> >> I would expect to have between 40-50mb/sec for each sata disk >> >> Can somebody confirm that he can reach this speed with a setup with >> journals on the satas (with journals on ssd speed should be 100mb per disk)? >> or does ceph only give about ¼ of the speed for a disk? (and not the >> ½ as expected because of journals) >> > > Did you verify how much each machine is doing? It could be that the data is > not distributed evenly and that on a certain machine the drives are doing > 50MB/sec. > >> My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for >> ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig >> for osd traffic with reads I can saturate the network but writes is >> far away. And I would expect at least to saturate the 10gig with >> sequential writes also >> > > Should be possible, but with 3 servers the data distribution might not be > optimal causing a lower write performance. > > I've seen 10Gbit write performance on multiple clusters without any problems. > >> Thank you >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > Wido den
Re: [ceph-users] write performance per disk
On 07/03/2014 04:32 PM, VELARTIS Philipp Dürhammer wrote: HI, Ceph.conf: osd journal size = 15360 rbd cache = true rbd cache size = 2147483648 rbd cache max dirty = 1073741824 rbd cache max dirty age = 100 osd recovery max active = 1 osd max backfills = 1 osd mkfs options xfs = "-f -i size=2048" osd mount options xfs = "rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M" osd op threads = 8 so it should be 8 threads? How many threads are you using with rados bench? Don't touch the op threads from the start, usually the default is just fine. All 3 machines have more or less the same disk load at the same time. also the disks: sdb 35.5687.10 6849.09 617310 48540806 sdc 26.7572.62 5148.58 514701 36488992 sdd 35.1553.48 6802.57 378993 48211141 sde 31.0479.04 6208.48 560141 44000710 sdf 32.7938.35 6238.28 271805 44211891 sdg 31.6777.84 5987.45 551680 42434167 sdh 32.9551.29 6315.76 363533 44761001 sdi 31.6756.93 5956.29 403478 42213336 sdj 35.8377.82 6929.31 551501 49109354 sdk 36.8673.84 7291.00 523345 51672704 sdl 36.02 112.90 7040.47 800177 49897132 sdm 33.2538.02 6455.05 269446 45748178 sdn 33.5239.10 6645.19 277101 47095696 sdo 33.2646.22 6388.20 327541 45274394 sdp 33.3874.12 6480.62 525325 45929369 the question is: is this a poor performance to get max 500mb/write with 45 disks and replica 2 or should I expect this? You should be able to get more as long as the I/O is done in parallel. Wido -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Wido den Hollander Gesendet: Donnerstag, 03. Juli 2014 15:22 An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] write performance per disk On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote: Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 How many threads? Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) Did you verify how much each machine is doing? It could be that the data is not distributed evenly and that on a certain machine the drives are doing 50MB/sec. My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also Should be possible, but with 3 servers the data distribution might not be optimal causing a lower write performance. I've seen 10Gbit write performance on multiple clusters without any problems. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] write performance per disk
HI, Ceph.conf: osd journal size = 15360 rbd cache = true rbd cache size = 2147483648 rbd cache max dirty = 1073741824 rbd cache max dirty age = 100 osd recovery max active = 1 osd max backfills = 1 osd mkfs options xfs = "-f -i size=2048" osd mount options xfs = "rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M" osd op threads = 8 so it should be 8 threads? All 3 machines have more or less the same disk load at the same time. also the disks: sdb 35.5687.10 6849.09 617310 48540806 sdc 26.7572.62 5148.58 514701 36488992 sdd 35.1553.48 6802.57 378993 48211141 sde 31.0479.04 6208.48 560141 44000710 sdf 32.7938.35 6238.28 271805 44211891 sdg 31.6777.84 5987.45 551680 42434167 sdh 32.9551.29 6315.76 363533 44761001 sdi 31.6756.93 5956.29 403478 42213336 sdj 35.8377.82 6929.31 551501 49109354 sdk 36.8673.84 7291.00 523345 51672704 sdl 36.02 112.90 7040.47 800177 49897132 sdm 33.2538.02 6455.05 269446 45748178 sdn 33.5239.10 6645.19 277101 47095696 sdo 33.2646.22 6388.20 327541 45274394 sdp 33.3874.12 6480.62 525325 45929369 the question is: is this a poor performance to get max 500mb/write with 45 disks and replica 2 or should I expect this? -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Wido den Hollander Gesendet: Donnerstag, 03. Juli 2014 15:22 An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] write performance per disk On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote: > Hi, > > I have a ceph cluster setup (with 45 sata disk journal on disks) and > get only 450mb/sec writes seq (maximum playing around with threads in > rados > bench) with replica of 2 > How many threads? > Which is about ~20Mb writes per disk (what y see in atop also) > theoretically with replica2 and having journals on disk should be 45 X > 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 > satas in reality have 120mb/sec so the theoretical output should be more. > > I would expect to have between 40-50mb/sec for each sata disk > > Can somebody confirm that he can reach this speed with a setup with > journals on the satas (with journals on ssd speed should be 100mb per disk)? > or does ceph only give about ¼ of the speed for a disk? (and not the ½ > as expected because of journals) > Did you verify how much each machine is doing? It could be that the data is not distributed evenly and that on a certain machine the drives are doing 50MB/sec. > My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for > ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig > for osd traffic with reads I can saturate the network but writes is > far away. And I would expect at least to saturate the 10gig with > sequential writes also > Should be possible, but with 3 servers the data distribution might not be optimal causing a lower write performance. I've seen 10Gbit write performance on multiple clusters without any problems. > Thank you > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] write performance per disk
On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote: Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 How many threads? Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) Did you verify how much each machine is doing? It could be that the data is not distributed evenly and that on a certain machine the drives are doing 50MB/sec. My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also Should be possible, but with 3 servers the data distribution might not be optimal causing a lower write performance. I've seen 10Gbit write performance on multiple clusters without any problems. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] write performance per disk
Hi, I have a ceph cluster setup (with 45 sata disk journal on disks) and get only 450mb/sec writes seq (maximum playing around with threads in rados bench) with replica of 2 Which is about ~20Mb writes per disk (what y see in atop also) theoretically with replica2 and having journals on disk should be 45 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 satas in reality have 120mb/sec so the theoretical output should be more. I would expect to have between 40-50mb/sec for each sata disk Can somebody confirm that he can reach this speed with a setup with journals on the satas (with journals on ssd speed should be 100mb per disk)? or does ceph only give about ¼ of the speed for a disk? (and not the ½ as expected because of journals) My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for osd traffic with reads I can saturate the network but writes is far away. And I would expect at least to saturate the 10gig with sequential writes also Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com