Re: [ceph-users] Extremely slow small files rewrite performance
Any update? On Tue, Oct 21, 2014 at 3:32 PM, Sergey Nazarov wrote: > Ouch, I think client log is missing. > Here it goes: > https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0 > > On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov wrote: >> I enabled logging and performed same tests. >> Here is the link on archive with logs, they are only from one node >> (from the node where active MDS was sitting): >> https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0 >> >> Rados bench results: >> >> # rados bench -p test 10 write >> Maintaining 16 concurrent writes of 4194304 bytes for up to 10 >> seconds or 0 objects >> Object prefix: benchmark_data_atl-fs11_4630 >>sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >> 0 0 0 0 0 0 - 0 >> 1 164630 119.967 120 0.201327 0.348463 >> 2 168872 143.969 168 0.132983 0.353677 >> 3 16 124 108 143.972 144 0.930837 0.383018 >> 4 16 155 139 138.976 124 0.899468 0.426396 >> 5 16 203 187 149.575 192 0.236534 0.400806 >> 6 16 243 227 151.309 160 0.835213 0.397673 >> 7 16 276 260 148.549 132 0.905989 0.406849 >> 8 16 306 290 144.978 120 0.353279 0.422106 >> 9 16 335 319 141.757 116 1.12114 0.428268 >> 10 16 376 360143.98 164 0.418921 0.43351 >> 11 16 377 361 131.254 4 0.499769 0.433693 >> Total time run: 11.206306 >> Total writes made: 377 >> Write size: 4194304 >> Bandwidth (MB/sec): 134.567 >> >> Stddev Bandwidth: 60.0232 >> Max bandwidth (MB/sec): 192 >> Min bandwidth (MB/sec): 0 >> Average Latency:0.474923 >> Stddev Latency: 0.376038 >> Max latency:1.82171 >> Min latency:0.060877 >> >> >> # rados bench -p test 10 seq >>sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >> 0 0 0 0 0 0 - 0 >> 1 166145 179.957 180 0.010405 0.25243 >> 2 16 10993 185.962 192 0.908263 0.284303 >> 3 16 151 135 179.965 168 0.255312 0.297283 >> 4 16 191 175174.97 160 0.836727 0.330659 >> 5 16 236 220 175.971 180 0.009995 0.330832 >> 6 16 275 259 172.639 156 1.06855 0.345418 >> 7 16 311 295 168.545 144 0.907648 0.361689 >> 8 16 351 335 167.474 160 0.947688 0.363552 >> 9 16 390 374 166.196 156 0.140539 0.369057 >> Total time run:9.755367 >> Total reads made: 401 >> Read size:4194304 >> Bandwidth (MB/sec):164.422 >> >> Average Latency: 0.387705 >> Max latency: 1.33852 >> Min latency: 0.008064 >> >> # rados bench -p test 10 rand >>sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >> 0 0 0 0 0 0 - 0 >> 1 165539 155.938 156 0.773716 0.257267 >> 2 169377 153.957 152 0.006573 0.339199 >> 3 16 135 119 158.629 168 0.009851 0.359675 >> 4 16 171 155 154.967 144 0.892027 0.359015 >> 5 16 209 193 154.369 152 1.13945 0.378618 >> 6 16 256 240159.97 188 0.009965 0.368439 >> 7 16 295 279 159.4 156 0.195812 0.371259 >> 8 16 343 327 163.472 192 0.880587 0.370759 >> 9 16 380 364161.75 148 0.113111 0.377983 >> 10 16 424 408 163.173 176 0.772274 0.379497 >> Total time run:10.518482 >> Total reads made: 425 >> Read size:4194304 >> Bandwidth (MB/sec):161.620 >> >> Average Latency: 0.393978 >> Max latency: 1.36572 >> Min latency: 0.006448 >> >> On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum wr
Re: [ceph-users] Extremely slow small files rewrite performance
Ouch, I think client log is missing. Here it goes: https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0 On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov wrote: > I enabled logging and performed same tests. > Here is the link on archive with logs, they are only from one node > (from the node where active MDS was sitting): > https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0 > > Rados bench results: > > # rados bench -p test 10 write > Maintaining 16 concurrent writes of 4194304 bytes for up to 10 > seconds or 0 objects > Object prefix: benchmark_data_atl-fs11_4630 >sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 164630 119.967 120 0.201327 0.348463 > 2 168872 143.969 168 0.132983 0.353677 > 3 16 124 108 143.972 144 0.930837 0.383018 > 4 16 155 139 138.976 124 0.899468 0.426396 > 5 16 203 187 149.575 192 0.236534 0.400806 > 6 16 243 227 151.309 160 0.835213 0.397673 > 7 16 276 260 148.549 132 0.905989 0.406849 > 8 16 306 290 144.978 120 0.353279 0.422106 > 9 16 335 319 141.757 116 1.12114 0.428268 > 10 16 376 360143.98 164 0.418921 0.43351 > 11 16 377 361 131.254 4 0.499769 0.433693 > Total time run: 11.206306 > Total writes made: 377 > Write size: 4194304 > Bandwidth (MB/sec): 134.567 > > Stddev Bandwidth: 60.0232 > Max bandwidth (MB/sec): 192 > Min bandwidth (MB/sec): 0 > Average Latency:0.474923 > Stddev Latency: 0.376038 > Max latency:1.82171 > Min latency:0.060877 > > > # rados bench -p test 10 seq >sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 166145 179.957 180 0.010405 0.25243 > 2 16 10993 185.962 192 0.908263 0.284303 > 3 16 151 135 179.965 168 0.255312 0.297283 > 4 16 191 175174.97 160 0.836727 0.330659 > 5 16 236 220 175.971 180 0.009995 0.330832 > 6 16 275 259 172.639 156 1.06855 0.345418 > 7 16 311 295 168.545 144 0.907648 0.361689 > 8 16 351 335 167.474 160 0.947688 0.363552 > 9 16 390 374 166.196 156 0.140539 0.369057 > Total time run:9.755367 > Total reads made: 401 > Read size:4194304 > Bandwidth (MB/sec):164.422 > > Average Latency: 0.387705 > Max latency: 1.33852 > Min latency: 0.008064 > > # rados bench -p test 10 rand >sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 165539 155.938 156 0.773716 0.257267 > 2 169377 153.957 152 0.006573 0.339199 > 3 16 135 119 158.629 168 0.009851 0.359675 > 4 16 171 155 154.967 144 0.892027 0.359015 > 5 16 209 193 154.369 152 1.13945 0.378618 > 6 16 256 240159.97 188 0.009965 0.368439 > 7 16 295 279 159.4 156 0.195812 0.371259 > 8 16 343 327 163.472 192 0.880587 0.370759 > 9 16 380 364161.75 148 0.113111 0.377983 > 10 16 424 408 163.173 176 0.772274 0.379497 > Total time run:10.518482 > Total reads made: 425 > Read size:4194304 > Bandwidth (MB/sec):161.620 > > Average Latency: 0.393978 > Max latency: 1.36572 > Min latency: 0.006448 > > On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum wrote: >> Can you enable debugging on the client ("debug ms = 1", "debug client >> = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test >> again, and post them somewhere for me to look at? >> >> While you're at it, can you try rados bench and see what sort of >> results you get? >> -Greg >> Software Engineer #42 @ http:/
Re: [ceph-users] Extremely slow small files rewrite performance
I enabled logging and performed same tests. Here is the link on archive with logs, they are only from one node (from the node where active MDS was sitting): https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0 Rados bench results: # rados bench -p test 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_atl-fs11_4630 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 164630 119.967 120 0.201327 0.348463 2 168872 143.969 168 0.132983 0.353677 3 16 124 108 143.972 144 0.930837 0.383018 4 16 155 139 138.976 124 0.899468 0.426396 5 16 203 187 149.575 192 0.236534 0.400806 6 16 243 227 151.309 160 0.835213 0.397673 7 16 276 260 148.549 132 0.905989 0.406849 8 16 306 290 144.978 120 0.353279 0.422106 9 16 335 319 141.757 116 1.12114 0.428268 10 16 376 360143.98 164 0.418921 0.43351 11 16 377 361 131.254 4 0.499769 0.433693 Total time run: 11.206306 Total writes made: 377 Write size: 4194304 Bandwidth (MB/sec): 134.567 Stddev Bandwidth: 60.0232 Max bandwidth (MB/sec): 192 Min bandwidth (MB/sec): 0 Average Latency:0.474923 Stddev Latency: 0.376038 Max latency:1.82171 Min latency:0.060877 # rados bench -p test 10 seq sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 166145 179.957 180 0.010405 0.25243 2 16 10993 185.962 192 0.908263 0.284303 3 16 151 135 179.965 168 0.255312 0.297283 4 16 191 175174.97 160 0.836727 0.330659 5 16 236 220 175.971 180 0.009995 0.330832 6 16 275 259 172.639 156 1.06855 0.345418 7 16 311 295 168.545 144 0.907648 0.361689 8 16 351 335 167.474 160 0.947688 0.363552 9 16 390 374 166.196 156 0.140539 0.369057 Total time run:9.755367 Total reads made: 401 Read size:4194304 Bandwidth (MB/sec):164.422 Average Latency: 0.387705 Max latency: 1.33852 Min latency: 0.008064 # rados bench -p test 10 rand sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 165539 155.938 156 0.773716 0.257267 2 169377 153.957 152 0.006573 0.339199 3 16 135 119 158.629 168 0.009851 0.359675 4 16 171 155 154.967 144 0.892027 0.359015 5 16 209 193 154.369 152 1.13945 0.378618 6 16 256 240159.97 188 0.009965 0.368439 7 16 295 279 159.4 156 0.195812 0.371259 8 16 343 327 163.472 192 0.880587 0.370759 9 16 380 364161.75 148 0.113111 0.377983 10 16 424 408 163.173 176 0.772274 0.379497 Total time run:10.518482 Total reads made: 425 Read size:4194304 Bandwidth (MB/sec):161.620 Average Latency: 0.393978 Max latency: 1.36572 Min latency: 0.006448 On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum wrote: > Can you enable debugging on the client ("debug ms = 1", "debug client > = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test > again, and post them somewhere for me to look at? > > While you're at it, can you try rados bench and see what sort of > results you get? > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Tue, Oct 21, 2014 at 10:57 AM, Sergey Nazarov wrote: >> It is CephFS mounted via ceph-fuse. >> I am getting the same results not depending on how many other clients >> are having this fs mounted and their activity. >> Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64. >> >> On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum wrote: >>> Are these tests conducted using a local fs on RBD, or using CephFS? >>> If CephFS, do you have multiple clients mounting the FS, and what are >>> they do
Re: [ceph-users] Extremely slow small files rewrite performance
It is CephFS mounted via ceph-fuse. I am getting the same results not depending on how many other clients are having this fs mounted and their activity. Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64. On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum wrote: > Are these tests conducted using a local fs on RBD, or using CephFS? > If CephFS, do you have multiple clients mounting the FS, and what are > they doing? What client (kernel or ceph-fuse)? > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazarov wrote: >> Hi >> >> I just built a new cluster using this quickstart instructions: >> http://ceph.com/docs/master/start/ >> >> And here is what I am seeing: >> >> # time for i in {1..10}; do echo $i > $i.txt ; done >> real 0m0.081s >> user 0m0.000s >> sys 0m0.004s >> >> And if I try to repeat the same command (when files already created): >> >> # time for i in {1..10}; do echo $i > $i.txt ; done >> real 0m48.894s >> user 0m0.000s >> sys 0m0.004s >> >> I was very surprised and then just tried to rewrite a single file: >> >> # time echo 1 > 1.txt >> real 0m3.133s >> user 0m0.000s >> sys 0m0.000s >> >> BTW, I dont think it is the problem with OSD speed or network: >> >> # time sysbench --num-threads=1 --test=fileio --file-total-size=1G >> --file-test-mode=rndrw prepare >> 1073741824 bytes written in 23.52 seconds (43.54 MB/sec). >> >> Here is my ceph cluster status and verion: >> >> # ceph -w >> cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb >> health HEALTH_OK >> monmap e4: 4 mons at >> {atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0}, >> election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12 >> mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby >> osdmap e92: 4 osds: 4 up, 4 in >> pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects >> 881 GB used, 1683 GB / 2564 GB avail >> 192 active+clean >> client io 1820 B/s wr, 1 op/s >> >> # ceph -v >> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) >> >> All nodes connected with gigabit network. >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Extremely slow small files rewrite performance
Hi I just built a new cluster using this quickstart instructions: http://ceph.com/docs/master/start/ And here is what I am seeing: # time for i in {1..10}; do echo $i > $i.txt ; done real 0m0.081s user 0m0.000s sys 0m0.004s And if I try to repeat the same command (when files already created): # time for i in {1..10}; do echo $i > $i.txt ; done real 0m48.894s user 0m0.000s sys 0m0.004s I was very surprised and then just tried to rewrite a single file: # time echo 1 > 1.txt real 0m3.133s user 0m0.000s sys 0m0.000s BTW, I dont think it is the problem with OSD speed or network: # time sysbench --num-threads=1 --test=fileio --file-total-size=1G --file-test-mode=rndrw prepare 1073741824 bytes written in 23.52 seconds (43.54 MB/sec). Here is my ceph cluster status and verion: # ceph -w cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb health HEALTH_OK monmap e4: 4 mons at {atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0}, election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12 mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby osdmap e92: 4 osds: 4 up, 4 in pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects 881 GB used, 1683 GB / 2564 GB avail 192 active+clean client io 1820 B/s wr, 1 op/s # ceph -v ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) All nodes connected with gigabit network. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com