Re: [ceph-users] using ssds with ceph
On 03/17/2013 06:14 PM, Gregory Farnum wrote: On Sunday, March 17, 2013 at 4:03 PM, Mark Nelson wrote: On 03/17/2013 05:40 PM, Matthieu Patou wrote: Hello all, Our dev environment are quite I/O intensive but didn't require much space (~20G per dev environment), for the moment our dev machines are served by VMWare and the storage is done in NFS appliances with SAS or SATA drives. After some testing with consumer grade SSD we discovered that built speed could be greatly improved by using SSD but having SSD in NFS appliances is very costly. So I'm thinking of using consumer grade or even intel s3700 SSD and ceph as the backend storage. Is there any cons using SSD for data storage (apart from the price per GB that is higher than SAS drives) ? Were you thinking of using CephFS? If so, be aware that it's not really recommended for production use yet. If you were thinking RBD, that's fine, but you should be aware that you may need to do some tweaking and have a lot of concurrency to get high IOPS. I'd highly recommend testing out your use case on a small scale (maybe a 1 or 2 nodes with a couple of SSDs before diving in head first. There are users doing this who seem quite happy with it. Not sure how many or if they want their names on the list… I'm also wondering how ceph will play with trim if a given rbd device is used at 50% of its capacity but some several blocks (from the rbd client point of view) are removed and then reallocated. Basically what I'm not sure is that new objects will be created (thus allowing the space used by the old one to be trimed) or if it will just update existing objects that form the rdb device. Matthieu. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to get latest (non-point release) debs
Okay, ignore this. I just can't read. the path is clearly supposed to me: deb http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/bobtailprecise main my apologies. Not sure why I got that so wrong. - Travis On Mon, Mar 18, 2013 at 2:49 PM, Travis Rhoden wrote: > Hey folks, > > There are some changes in Bobtail queued up for 0.56.4 that I am really > anxious to get, but that build hasn't been released yet. Is there an apt > repo I can point out that will get the latest build off of the bobtail > branch? > > based on the docs [1] I tried this: > > deb http://gitbuilder.ceph.com/ceph-deb-main-x86_64/ref/bobtail precise > main > > But that was not found. > > - Travis > > [1] > http://ceph.com/docs/master/install/debian/#development-testing-packages > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to get latest (non-point release) debs
On Mon, 18 Mar 2013, Travis Rhoden wrote: > deb http://gitbuilder.ceph.com/ceph-deb-main-x86_64/ref/bobtail precise main deb http://gitbuilder.ceph.com/ceph-deb-precise-x86_64/ref/bobtail precise main sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to get latest (non-point release) debs
Hey folks, There are some changes in Bobtail queued up for 0.56.4 that I am really anxious to get, but that build hasn't been released yet. Is there an apt repo I can point out that will get the latest build off of the bobtail branch? based on the docs [1] I tried this: deb http://gitbuilder.ceph.com/ceph-deb-main-x86_64/ref/bobtail precise main But that was not found. - Travis [1] http://ceph.com/docs/master/install/debian/#development-testing-packages ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph availability test & recovering question
Hello, I`m experiencing same long-lasting problem - during recovery ops, some percentage of read I/O remains in-flight for seconds, rendering upper-level filesystem on the qemu client very slow and almost unusable. Different striping has almost no effect on visible delays and reads may be non-intensive at all but they still are very slow. Here is some fio results on randread with small blocks, so it is not affected by readahead as linear one: Intensive reads during recovery: lat (msec) : 2=0.01%, 4=0.08%, 10=1.87%, 20=4.17%, 50=8.34% lat (msec) : 100=13.93%, 250=2.77%, 500=1.19%, 750=25.13%, 1000=0.41% lat (msec) : 2000=15.45%, >=2000=26.66% same on healthy cluster: lat (msec) : 20=0.33%, 50=9.17%, 100=23.35%, 250=25.47%, 750=6.53% lat (msec) : 1000=0.42%, 2000=34.17%, >=2000=0.56% On Sun, Mar 17, 2013 at 8:18 AM, wrote: > Hi, all > > I have some problem after availability test > > Setup: > Linux kernel: 3.2.0 > OS: Ubuntu 12.04 > Storage server : 11 HDD (each storage server has 11 osd, 7200 rpm, 1T) + > 10GbE NIC > RAID card: LSI MegaRAID SAS 9260-4i For every HDD: RAID0, Write Policy: > Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct > Storage server number : 2 > > Ceph version : 0.48.2 > Replicas : 2 > Monitor number:3 > > > We have two storage server as a cluter, then use ceph client create 1T RBD > image for testing, the client also > has 10GbE NIC , Linux kernel 3.2.0 , Ubuntu 12.04 > > We also use FIO to produce workload > > fio command: > [Sequencial Read] > fio --iodepth = 32 --numjobs=1 --runtime=120 --bs = 65536 --rw = read > --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 > --thinktime=10 > > [Sequencial Write] > fio --iodepth = 32 --numjobs=1 --runtime=120 --bs = 65536 --rw = write > --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 > --thinktime=10 > > > Now I want observe to ceph state when one storage server is crash, so I turn > off one storage server networking. > We expect that data write and data read operation can be quickly resume or > even not be suspended in ceph recovering time, but the experimental results > show > the data write and data read operation will pause for about 20~30 seconds in > ceph recovering time. > > My question is: > 1.The state of I/O pause is normal when ceph recovering ? > 2.The pause time of I/O that can not be avoided when ceph recovering ? > 3.How to reduce the I/O pause time ? > > > Thanks!! > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] I/O Speed Comparisons
On 03/13/2013 06:38 PM, Josh Durgin wrote: > Anyone seeing this problem, could you try the wip-rbd-cache-aio branch? Hi, just compiled and tested it out, unfortunately there's no big change: ceph --version ceph version 0.58-375-ga4a6075 (a4a60758b7a10d51419c1961bb12f26b87672fd5) libvirt-config: within the VM: cat /dev/zero > /bigfile again trashes the virtual machine. reading seems quick(er) though (i can't go back to bobtail without some hassles on this machine (leveldb on monitor & such), so this is just a guess): dd if=/dev/vda of=/dev/null ... 82.0 MB/s One thing that I've noticed is that the virsh start command returns very quickly, this used to take longer on bobtail. > Thanks, > Josh you're welcome. I'm really eager helping you out now, I've got a testbed, if you apply a patch in git I can probably test within 24 hours. Wolfgang -- DI (FH) Wolfgang Hennerbichler Software Development Unit Advanced Computing Technologies RISC Software GmbH A company of the Johannes Kepler University Linz IT-Center Softwarepark 35 4232 Hagenberg Austria Phone: +43 7236 3343 245 Fax: +43 7236 3343 250 wolfgang.hennerbich...@risc-software.at http://www.risc-software.at ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cephfs - feature set mismatch - socket error on read
Hi, I have a new empty cluster, v0.56.3 - Ubuntu 12.04 with kernel 3.4.36 I have just run mkcephfs and inject the crushmap (http://pastebin.com/F1nYKZZR) I have modified the rulesets, sice I want to distribute the data on a per-room basis This is the osd tree: http://pastebin.com/JsEsWjTx I have another machine to mount the Cephfs and re-export it via Samba. If I run the command "ceph osd crush tunables optimal" or "ceph osd crush tunables bobtail" on my cluster, the mount fails with "mount error 5 = Input/output error" and in syslog I have libceph: mon0 192.168.21.11:6789 feature set mismatch, my 8a < server's 204008a, missing 204 libceph: mon0 192.168.21.11:6789 socket error on read Otherwise, if I run "ceph osd crush tunables default" the mount success. Is advisable to run "ceph osd crush tunables bobtail" on a Bobtail cluster? Thanks -- Marco Aroldi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Help fixing clobbered OSD's
Hi all, I haven't had much sleep and have accidentally started an OSD on a mount point mapped to two disks containing OSD data. I think this was the case, I'm unable to explain how it happened or if this was even the cause. Yeh.. that tired... What I think happened was OSD.9's disk was mounted over OSD.15 disk. OSD.15 may or may not have been running at the time. OSD.15 now has the error - -33> 2013-03-18 21:42:38.610114 7f8048773760 5 filestore(/srv/ceph/osd/osd.15) test_mount basedir /srv/ceph/osd/osd.15 journal /dev/sdd4 -32> 2013-03-18 21:42:38.610153 7f8048773760 1 -- 0.0.0.0:6860/22287messenger.start -31> 2013-03-18 21:42:38.610181 7f8048773760 1 -- :/0 messenger.start -30> 2013-03-18 21:42:38.610196 7f8048773760 1 -- 0.0.0.0:6862/22287messenger.start -29> 2013-03-18 21:42:38.610207 7f8048773760 1 -- 0.0.0.0:6861/22287messenger.start -28> 2013-03-18 21:42:38.610299 7f8048773760 2 osd.15 0 mounting /srv/ceph/osd/osd.15 /dev/sdd4 -27> 2013-03-18 21:42:38.610309 7f8048773760 5 filestore(/srv/ceph/osd/osd.15) basedir /srv/ceph/osd/osd.15 journal /dev/sdd4 -26> 2013-03-18 21:42:38.610325 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) mount fsid is 71dbf00f-ae22-4366-b610-064107e26697 -25> 2013-03-18 21:42:38.727408 7f8048773760 0 filestore(/srv/ceph/osd/osd.15) mount FIEMAP ioctl is supported and appears to work -24> 2013-03-18 21:42:38.727423 7f8048773760 0 filestore(/srv/ceph/osd/osd.15) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option -23> 2013-03-18 21:42:38.727829 7f8048773760 0 filestore(/srv/ceph/osd/osd.15) mount did NOT detect btrfs -22> 2013-03-18 21:42:38.852287 7f8048773760 0 filestore(/srv/ceph/osd/osd.15) mount syscall(SYS_syncfs, fd) fully supported -21> 2013-03-18 21:42:38.852379 7f8048773760 0 filestore(/srv/ceph/osd/osd.15) mount found snaps <> -20> 2013-03-18 21:42:38.852401 7f8048773760 5 filestore(/srv/ceph/osd/osd.15) mount op_seq is 25638742 -19> 2013-03-18 21:42:38.986099 7f8048773760 20 filestore (init)dbobjectmap: seq is 1 -18> 2013-03-18 21:42:38.986123 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) open_journal at /dev/sdd4 -17> 2013-03-18 21:42:38.986150 7f8048773760 0 filestore(/srv/ceph/osd/osd.15) mount: enabling WRITEAHEAD journal mode: btrfs not detected -16> 2013-03-18 21:42:38.986154 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) list_collections -15> 2013-03-18 21:42:38.989878 7f8044d1a700 20 filestore(/srv/ceph/osd/osd.15) sync_entry waiting for max_interval 5.00 -14> 2013-03-18 21:42:38.993422 7f8048773760 0 journal kernel version is 3.6.9 -13> 2013-03-18 21:42:39.012659 7f8048773760 0 journal kernel version is 3.6.9 -12> 2013-03-18 21:42:39.060070 7f80277fe700 20 filestore(/srv/ceph/osd/osd.15) flusher_entry start -11> 2013-03-18 21:42:39.060091 7f80277fe700 20 filestore(/srv/ceph/osd/osd.15) flusher_entry sleeping -10> 2013-03-18 21:42:39.060091 7f8048773760 2 osd.15 0 boot -9> 2013-03-18 21:42:39.060104 7f8048773760 15 filestore(/srv/ceph/osd/osd.15) read meta/23c2fcde/osd_superblock/0//-1 0~0 -8> 2013-03-18 21:42:39.060503 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) FileStore::read meta/23c2fcde/osd_superblock/0//-1 0~332/332 -7> 2013-03-18 21:42:39.060587 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) stat meta/16ef7597/infos/head//-1 = 0 (size 0) -6> 2013-03-18 21:42:39.060622 7f8048773760 15 filestore(/srv/ceph/osd/osd.15) read meta/4edc6dd9/osdmap.33122/0//-1 0~0 -5> 2013-03-18 21:42:39.061131 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) FileStore::read meta/4edc6dd9/osdmap.33122/0//-1 0~117079/117079 -4> 2013-03-18 21:42:39.063115 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) list_collections -3> 2013-03-18 21:42:39.064753 7f8048773760 15 filestore(/srv/ceph/osd/osd.15) collection_getattr /srv/ceph/osd/osd.15/current/0.39_head 'info' -2> 2013-03-18 21:42:39.064780 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) collection_getattr /srv/ceph/osd/osd.15/current/0.39_head 'info' = 1 -1> 2013-03-18 21:42:39.064798 7f8048773760 15 filestore(/srv/ceph/osd/osd.15) omap_get_values meta/16ef7597/infos/head//-1 0> 2013-03-18 21:42:39.066873 7f8048773760 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::bufferlist*)' t$ osd/PG.cc: 2393: FAILED assert(values.size() == 1) ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x469) [0x680199] 2: (OSD::load_pgs()+0x1909) [0x6219d9] 3: (OSD::init()+0xd07) [0x634e27] 4: (main()+0x2deb) [0x5640cb] 5: (__libc_start_main()+0xfd) [0x308921ecdd] 6: ceph-osd() [0x560f29] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. OSD.9's error is now - 013-03-18 21:44:08.652017 7f3547fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry start 2013-03-18 21:44:08.652147 7f3547fff700 20 filestore(/s