Re: read and write speed.

2011-05-31 Thread Henry C Chang
2011/5/31 djlee064 djlee...@gmail.com:
 From my (bad) memory, I've tried fsync, and the resulting was still
 the same in the end, i.e., write 100gb+ and all will be the same.
 Can't test it now as it is already running other tasks..

 I think the small bs (4-8k) results to ~ KB/s level using fsync. and
 large, 4m+ will get much higher, but at this large bs, eventually,
 results to the same as no-fsync when you continuously write stretch to
 100gb+
 Is this too obvious for anyone? (other than the shortstrkoing effect
 which should be at most 20% diff)

Caching effect decreases as the writing size increases. Sure, when you
write 100GB+ data, the difference may be subtle. (It still depends on
your RAM size.)


 how then journal-size (e.g., 1gb, 2gb, etc) set by the ceph actually
 effect in performance (other than reliability and latency, etc)
 If no reliability is ever needed and, say journal is turned off, how's
 the performance effect? from my bad memory again, no-journal worsens
 performance or had no effect . But I think this is again, tested 'not
 long-enough'
 e.g., other than journal flushing to disk (which shouldn't be a major
 bottleneck, as Collin said  stuffs in journal gets continuously
 flushing out to disk.

 so in other words, set journal to 10gb, and dd for just 10gb, i should
 get an unbelievable performance, then just change dd to 100gb, how
 much drop? I haven't try this,
 but from my very long test wrtites/read, testing many ranges up to 5TB
 actual content, the disk does about 12.5MB/s at most. For small sets,
 yes there I see high performance, etc.

I am not sure what you mean here. Perhaps somebody can answer it. Sorry.

--Henry
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read and write speed.

2011-05-31 Thread Fyodor Ustinov

On 05/31/2011 01:50 AM, Fyodor Ustinov wrote:

Hi!

Fresh 0.28.2 cluster.

Why reading two times slower than the writing by dd, but rados show 
different.

(Second question - why rados bench crash on read test?)


root@gate0:/mnt# dd if=/dev/zero of=aaa bs=1024000 count=1
1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 64.5598 s, 159 MB/s

root@gate0:/mnt# dd if=aaa of=/dev/null bs=1024000
1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 122.513 s, 83.6 MB/s
root@gate0:/mnt#

Additional. fuse vs kernel mount

fuse:

root@gate0:/mnt# dd if=/dev/zero of=bbb bs=1024000 count=1 
conv=fdatasync

1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 75.7654 s, 135 MB/s

root@gate0:/mnt# dd if=bbb of=/dev/null bs=1024000
1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 101.613 s, 101 MB/s


kernel:

root@gate0:/mnt# rm bbb
root@gate0:/mnt# dd if=/dev/zero of=bbb bs=1024000 count=1 
conv=fdatasync

1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 73.143 s, 140 MB/s

root@gate0:/mnt# dd if=bbb of=/dev/null bs=1024000
1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 150.84 s, 67.9 MB/s
root@gate0:/mnt#

What? Ok. reboot. After reboot:

root@gate0:/mnt# rm bbb
root@gate0:/mnt# dd if=/dev/zero of=bbb bs=1024000 count=1 
conv=fdatasync

1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 68.9618 s, 148 MB/s

root@gate0:/mnt# dd if=bbb of=/dev/null bs=1024000
1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 165.564 s, 61.8 MB/s
root@gate0:/mnt#


Hmm...  unmount kernel and mount fuse...

root@gate0:/mnt# cd
root@gate0:~# umount /mnt
root@gate0:~# cfuse -m 10.5.51.230:/ /mnt
 ** WARNING: Ceph is still under development.  Any feedback can be 
directed  **
 **  at ceph-devel@vger.kernel.org or 
http://ceph.newdream.net/. **

cfuse[1102]: starting ceph client
cfuse[1102]: starting fuse
root@gate0:~# cd /mnt/
root@gate0:/mnt# dd if=bbb of=/dev/null bs=1024000
1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 93.076 s, 110 MB/s
root@gate0:/mnt#


What the hell?



WBR,
Fyodor.

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


read and write speed.

2011-05-30 Thread Fyodor Ustinov

Hi!

Fresh 0.28.2 cluster.

Why reading two times slower than the writing by dd, but rados show 
different.

(Second question - why rados bench crash on read test?)


root@gate0:/mnt# dd if=/dev/zero of=aaa bs=1024000 count=1
1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 64.5598 s, 159 MB/s

root@gate0:/mnt# dd if=aaa of=/dev/null bs=1024000
1+0 records in
1+0 records out
1024000 bytes (10 GB) copied, 122.513 s, 83.6 MB/s
root@gate0:/mnt#

root@gate0:/etc/ceph# rados -p test bench 20 write
Maintaining 16 concurrent writes of 4194304 bytes for at least 20 seconds.
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
0   0 0 0 0 0 - 0
1  164731   123.966   124  0.445371  0.360663
2  168266   131.967   140  0.160564   0.32065
3  16   11195   126.637   116  0.116967  0.349943
4  16   142   126125.97   124  0.176179  0.369969
5  15   169   154123.17   112   0.12449  0.411138
6  16   202   186   123.971   128  0.175033  0.442003
7  16   241   225   128.541   156  0.163481  0.421575
8  16   271   255127.47   120  0.162152  0.444525
9  16   305   289   128.415   136  0.100893  0.456108
   10  16   337   321   128.371   128  0.107163  0.467081
   11  16   370   354   128.698   132  0.147602  0.455438
   12  16   400   384   127.971   120  0.163287  0.454927
   13  16   433   417   128.279   132  0.176282  0.451909
   14  16   459   443   126.544   104   3.02971  0.465092
   15  16   492   476   126.906   132  0.183307  0.473582
   16  16   523   507   126.722   124  0.170459  0.465038
   17  16   544   528   124.20884  0.160463  0.462053
   18  16   574   558   123.973   120   0.10411  0.478344
   19  16   607   591   124.395   132  0.126514   0.48624
min lat: 0.095185 max lat: 3.9695 avg lat: 0.488688
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   20  16   638   622   124.372   124   2.85047  0.488688
Total time run:20.547165
Total writes made: 639
Write size:4194304
Bandwidth (MB/sec):124.397

Average Latency:   0.513493
Max latency:   3.9695
Min latency:   0.095185
root@gate0:/etc/ceph#

root@gate0:/etc/ceph# rados -p test bench 20 seq
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
0   0 0 0 0 0 - 0
1  165842   167.966   168  0.085676  0.279929
2  16   10185169.97   172  0.785072  0.323728
3  16   145   129   171.969   176  0.141833  0.331852
4  16   193   177   176.969   192   0.75847  0.335484
5  15   240   225179.97   192  0.114137  0.332022
6  16   288   272   181.303   188   0.54563  0.339292
7  16   335   319   182.256   188  0.531714  0.341969
8  16   380   364181.97   180  0.101676  0.339337
9  16   427   411   182.634   188  0.216583  0.339264
   10  16   471   455   181.968   176  0.803917  0.341281
   11  16   515   499   181.422   176  0.112194  0.343552
   12  16   559   543   180.968   176  0.241835  0.345668
   13  16   600   584179.66   164  0.03  0.347034
read got -2
error during benchmark: -5
error 5: Input/output error
root@gate0:/etc/ceph#


WBR,
Fyodor.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read and write speed.

2011-05-30 Thread Henry C Chang
2011/5/31 djlee064 djlee...@gmail.com:
 A bit off, but can you Fyodor, and all of devs run

 dd if=/dev/null of=/cephmount/file bs=1M count=1   (10gb)
 dd if=/dev/null of=/cephmount/file bs=1M count=5   (50gb)
 dd if=/dev/null of=/cephmount/file bs=1M count=10 (100gb)

 and continue to 200gb,... 500gb

 see the MB/s difference,  expecting an enormous difference, e.g.,
 starts ~200MB/s and drops down to well less than 50MB/s or even less
 (Depending on the #disk, but about 12MB/s per-disk is what I have
 analyzed)
 I feel that Fyodor's and the rest of you are testing only a very small
 part, the high rate at the start is likely to be due to the journal
 size.

It's client-side caching effect. Without adding the option
conv=fdatasync (or fsync), dd returns the throughput without flushing
the data.

--Henry
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html