Re: [ceph-users] Extremely slow small files rewrite performance

2014-10-24 Thread Sergey Nazarov
Any update?

On Tue, Oct 21, 2014 at 3:32 PM, Sergey Nazarov  wrote:
> Ouch, I think client log is missing.
> Here it goes:
> https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0
>
> On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov  wrote:
>> I enabled logging and performed same tests.
>> Here is the link on archive with logs, they are only from one node
>> (from the node where active MDS was sitting):
>> https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0
>>
>> Rados bench results:
>>
>> # rados bench -p test 10 write
>>  Maintaining 16 concurrent writes of 4194304 bytes for up to 10
>> seconds or 0 objects
>>  Object prefix: benchmark_data_atl-fs11_4630
>>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>>  0   0 0 0 0 0 - 0
>>  1  164630   119.967   120  0.201327  0.348463
>>  2  168872   143.969   168  0.132983  0.353677
>>  3  16   124   108   143.972   144  0.930837  0.383018
>>  4  16   155   139   138.976   124  0.899468  0.426396
>>  5  16   203   187   149.575   192  0.236534  0.400806
>>  6  16   243   227   151.309   160  0.835213  0.397673
>>  7  16   276   260   148.549   132  0.905989  0.406849
>>  8  16   306   290   144.978   120  0.353279  0.422106
>>  9  16   335   319   141.757   116   1.12114  0.428268
>> 10  16   376   360143.98   164  0.418921   0.43351
>> 11  16   377   361   131.254 4  0.499769  0.433693
>>  Total time run: 11.206306
>> Total writes made:  377
>> Write size: 4194304
>> Bandwidth (MB/sec): 134.567
>>
>> Stddev Bandwidth:   60.0232
>> Max bandwidth (MB/sec): 192
>> Min bandwidth (MB/sec): 0
>> Average Latency:0.474923
>> Stddev Latency: 0.376038
>> Max latency:1.82171
>> Min latency:0.060877
>>
>>
>> # rados bench -p test 10 seq
>>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>>  0   0 0 0 0 0 - 0
>>  1  166145   179.957   180  0.010405   0.25243
>>  2  16   10993   185.962   192  0.908263  0.284303
>>  3  16   151   135   179.965   168  0.255312  0.297283
>>  4  16   191   175174.97   160  0.836727  0.330659
>>  5  16   236   220   175.971   180  0.009995  0.330832
>>  6  16   275   259   172.639   156   1.06855  0.345418
>>  7  16   311   295   168.545   144  0.907648  0.361689
>>  8  16   351   335   167.474   160  0.947688  0.363552
>>  9  16   390   374   166.196   156  0.140539  0.369057
>>  Total time run:9.755367
>> Total reads made: 401
>> Read size:4194304
>> Bandwidth (MB/sec):164.422
>>
>> Average Latency:   0.387705
>> Max latency:   1.33852
>> Min latency:   0.008064
>>
>> # rados bench -p test 10 rand
>>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>>  0   0 0 0 0 0 - 0
>>  1  165539   155.938   156  0.773716  0.257267
>>  2  169377   153.957   152  0.006573  0.339199
>>  3  16   135   119   158.629   168  0.009851  0.359675
>>  4  16   171   155   154.967   144  0.892027  0.359015
>>  5  16   209   193   154.369   152   1.13945  0.378618
>>  6  16   256   240159.97   188  0.009965  0.368439
>>  7  16   295   279 159.4   156  0.195812  0.371259
>>  8  16   343   327   163.472   192  0.880587  0.370759
>>  9  16   380   364161.75   148  0.113111  0.377983
>> 10  16   424   408   163.173   176  0.772274  0.379497
>>  Total time run:10.518482
>> Total reads made: 425
>> Read size:4194304
>> Bandwidth (MB/sec):161.620
>>
>> Average Latency:   0.393978
>> Max latency:   1.36572
>> Min latency:   0.006448
>>
>> On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum  wr

Re: [ceph-users] Extremely slow small files rewrite performance

2014-10-21 Thread Sergey Nazarov
Ouch, I think client log is missing.
Here it goes:
https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0

On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov  wrote:
> I enabled logging and performed same tests.
> Here is the link on archive with logs, they are only from one node
> (from the node where active MDS was sitting):
> https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0
>
> Rados bench results:
>
> # rados bench -p test 10 write
>  Maintaining 16 concurrent writes of 4194304 bytes for up to 10
> seconds or 0 objects
>  Object prefix: benchmark_data_atl-fs11_4630
>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>  0   0 0 0 0 0 - 0
>  1  164630   119.967   120  0.201327  0.348463
>  2  168872   143.969   168  0.132983  0.353677
>  3  16   124   108   143.972   144  0.930837  0.383018
>  4  16   155   139   138.976   124  0.899468  0.426396
>  5  16   203   187   149.575   192  0.236534  0.400806
>  6  16   243   227   151.309   160  0.835213  0.397673
>  7  16   276   260   148.549   132  0.905989  0.406849
>  8  16   306   290   144.978   120  0.353279  0.422106
>  9  16   335   319   141.757   116   1.12114  0.428268
> 10  16   376   360143.98   164  0.418921   0.43351
> 11  16   377   361   131.254 4  0.499769  0.433693
>  Total time run: 11.206306
> Total writes made:  377
> Write size: 4194304
> Bandwidth (MB/sec): 134.567
>
> Stddev Bandwidth:   60.0232
> Max bandwidth (MB/sec): 192
> Min bandwidth (MB/sec): 0
> Average Latency:0.474923
> Stddev Latency: 0.376038
> Max latency:1.82171
> Min latency:0.060877
>
>
> # rados bench -p test 10 seq
>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>  0   0 0 0 0 0 - 0
>  1  166145   179.957   180  0.010405   0.25243
>  2  16   10993   185.962   192  0.908263  0.284303
>  3  16   151   135   179.965   168  0.255312  0.297283
>  4  16   191   175174.97   160  0.836727  0.330659
>  5  16   236   220   175.971   180  0.009995  0.330832
>  6  16   275   259   172.639   156   1.06855  0.345418
>  7  16   311   295   168.545   144  0.907648  0.361689
>  8  16   351   335   167.474   160  0.947688  0.363552
>  9  16   390   374   166.196   156  0.140539  0.369057
>  Total time run:9.755367
> Total reads made: 401
> Read size:4194304
> Bandwidth (MB/sec):164.422
>
> Average Latency:   0.387705
> Max latency:   1.33852
> Min latency:   0.008064
>
> # rados bench -p test 10 rand
>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>  0   0 0 0 0 0 - 0
>  1  165539   155.938   156  0.773716  0.257267
>  2  169377   153.957   152  0.006573  0.339199
>  3  16   135   119   158.629   168  0.009851  0.359675
>  4  16   171   155   154.967   144  0.892027  0.359015
>  5  16   209   193   154.369   152   1.13945  0.378618
>  6  16   256   240159.97   188  0.009965  0.368439
>  7  16   295   279 159.4   156  0.195812  0.371259
>  8  16   343   327   163.472   192  0.880587  0.370759
>  9  16   380   364161.75   148  0.113111  0.377983
> 10  16   424   408   163.173   176  0.772274  0.379497
>  Total time run:10.518482
> Total reads made: 425
> Read size:4194304
> Bandwidth (MB/sec):161.620
>
> Average Latency:   0.393978
> Max latency:   1.36572
> Min latency:   0.006448
>
> On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum  wrote:
>> Can you enable debugging on the client ("debug ms = 1", "debug client
>> = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test
>> again, and post them somewhere for me to look at?
>>
>> While you're at it, can you try rados bench and see what sort of
>> results you get?
>> -Greg
>> Software Engineer #42 @ http:/

Re: [ceph-users] Extremely slow small files rewrite performance

2014-10-21 Thread Sergey Nazarov
I enabled logging and performed same tests.
Here is the link on archive with logs, they are only from one node
(from the node where active MDS was sitting):
https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0

Rados bench results:

# rados bench -p test 10 write
 Maintaining 16 concurrent writes of 4194304 bytes for up to 10
seconds or 0 objects
 Object prefix: benchmark_data_atl-fs11_4630
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
 1  164630   119.967   120  0.201327  0.348463
 2  168872   143.969   168  0.132983  0.353677
 3  16   124   108   143.972   144  0.930837  0.383018
 4  16   155   139   138.976   124  0.899468  0.426396
 5  16   203   187   149.575   192  0.236534  0.400806
 6  16   243   227   151.309   160  0.835213  0.397673
 7  16   276   260   148.549   132  0.905989  0.406849
 8  16   306   290   144.978   120  0.353279  0.422106
 9  16   335   319   141.757   116   1.12114  0.428268
10  16   376   360143.98   164  0.418921   0.43351
11  16   377   361   131.254 4  0.499769  0.433693
 Total time run: 11.206306
Total writes made:  377
Write size: 4194304
Bandwidth (MB/sec): 134.567

Stddev Bandwidth:   60.0232
Max bandwidth (MB/sec): 192
Min bandwidth (MB/sec): 0
Average Latency:0.474923
Stddev Latency: 0.376038
Max latency:1.82171
Min latency:0.060877


# rados bench -p test 10 seq
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
 1  166145   179.957   180  0.010405   0.25243
 2  16   10993   185.962   192  0.908263  0.284303
 3  16   151   135   179.965   168  0.255312  0.297283
 4  16   191   175174.97   160  0.836727  0.330659
 5  16   236   220   175.971   180  0.009995  0.330832
 6  16   275   259   172.639   156   1.06855  0.345418
 7  16   311   295   168.545   144  0.907648  0.361689
 8  16   351   335   167.474   160  0.947688  0.363552
 9  16   390   374   166.196   156  0.140539  0.369057
 Total time run:9.755367
Total reads made: 401
Read size:4194304
Bandwidth (MB/sec):164.422

Average Latency:   0.387705
Max latency:   1.33852
Min latency:   0.008064

# rados bench -p test 10 rand
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
 1  165539   155.938   156  0.773716  0.257267
 2  169377   153.957   152  0.006573  0.339199
 3  16   135   119   158.629   168  0.009851  0.359675
 4  16   171   155   154.967   144  0.892027  0.359015
 5  16   209   193   154.369   152   1.13945  0.378618
 6  16   256   240159.97   188  0.009965  0.368439
 7  16   295   279 159.4   156  0.195812  0.371259
 8  16   343   327   163.472   192  0.880587  0.370759
 9  16   380   364161.75   148  0.113111  0.377983
10  16   424   408   163.173   176  0.772274  0.379497
 Total time run:10.518482
Total reads made: 425
Read size:4194304
Bandwidth (MB/sec):161.620

Average Latency:   0.393978
Max latency:   1.36572
Min latency:   0.006448

On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum  wrote:
> Can you enable debugging on the client ("debug ms = 1", "debug client
> = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test
> again, and post them somewhere for me to look at?
>
> While you're at it, can you try rados bench and see what sort of
> results you get?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Oct 21, 2014 at 10:57 AM, Sergey Nazarov  wrote:
>> It is CephFS mounted via ceph-fuse.
>> I am getting the same results not depending on how many other clients
>> are having this fs mounted and their activity.
>> Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64.
>>
>> On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum  wrote:
>>> Are these tests conducted using a local fs on RBD, or using CephFS?
>>> If CephFS, do you have multiple clients mounting the FS, and what are
>>> they do

Re: [ceph-users] Extremely slow small files rewrite performance

2014-10-21 Thread Sergey Nazarov
It is CephFS mounted via ceph-fuse.
I am getting the same results not depending on how many other clients
are having this fs mounted and their activity.
Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64.

On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum  wrote:
> Are these tests conducted using a local fs on RBD, or using CephFS?
> If CephFS, do you have multiple clients mounting the FS, and what are
> they doing? What client (kernel or ceph-fuse)?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazarov  wrote:
>> Hi
>>
>> I just built a new cluster using this quickstart instructions:
>> http://ceph.com/docs/master/start/
>>
>> And here is what I am seeing:
>>
>> # time for i in {1..10}; do echo $i > $i.txt ; done
>> real 0m0.081s
>> user 0m0.000s
>> sys 0m0.004s
>>
>> And if I try to repeat the same command (when files already created):
>>
>> # time for i in {1..10}; do echo $i > $i.txt ; done
>> real 0m48.894s
>> user 0m0.000s
>> sys 0m0.004s
>>
>> I was very surprised and then just tried to rewrite a single file:
>>
>> # time echo 1 > 1.txt
>> real 0m3.133s
>> user 0m0.000s
>> sys 0m0.000s
>>
>> BTW, I dont think it is the problem with OSD speed or network:
>>
>> # time sysbench --num-threads=1 --test=fileio --file-total-size=1G
>> --file-test-mode=rndrw prepare
>> 1073741824 bytes written in 23.52 seconds (43.54 MB/sec).
>>
>> Here is my ceph cluster status and verion:
>>
>> # ceph -w
>> cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb
>>  health HEALTH_OK
>>  monmap e4: 4 mons at
>> {atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0},
>> election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12
>>  mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby
>>  osdmap e92: 4 osds: 4 up, 4 in
>>   pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects
>> 881 GB used, 1683 GB / 2564 GB avail
>>  192 active+clean
>>   client io 1820 B/s wr, 1 op/s
>>
>> # ceph -v
>> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>
>> All nodes connected with gigabit network.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Extremely slow small files rewrite performance

2014-10-21 Thread Sergey Nazarov
Hi

I just built a new cluster using this quickstart instructions:
http://ceph.com/docs/master/start/

And here is what I am seeing:

# time for i in {1..10}; do echo $i > $i.txt ; done
real 0m0.081s
user 0m0.000s
sys 0m0.004s

And if I try to repeat the same command (when files already created):

# time for i in {1..10}; do echo $i > $i.txt ; done
real 0m48.894s
user 0m0.000s
sys 0m0.004s

I was very surprised and then just tried to rewrite a single file:

# time echo 1 > 1.txt
real 0m3.133s
user 0m0.000s
sys 0m0.000s

BTW, I dont think it is the problem with OSD speed or network:

# time sysbench --num-threads=1 --test=fileio --file-total-size=1G
--file-test-mode=rndrw prepare
1073741824 bytes written in 23.52 seconds (43.54 MB/sec).

Here is my ceph cluster status and verion:

# ceph -w
cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb
 health HEALTH_OK
 monmap e4: 4 mons at
{atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0},
election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12
 mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby
 osdmap e92: 4 osds: 4 up, 4 in
  pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects
881 GB used, 1683 GB / 2564 GB avail
 192 active+clean
  client io 1820 B/s wr, 1 op/s

# ceph -v
ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

All nodes connected with gigabit network.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com