Re: [ceph-users] using ssds with ceph

2013-03-18 Thread Matthieu Patou

On 03/17/2013 06:14 PM, Gregory Farnum wrote:

On Sunday, March 17, 2013 at 4:03 PM, Mark Nelson wrote:

On 03/17/2013 05:40 PM, Matthieu Patou wrote:

Hello all,
  
Our dev environment are quite I/O intensive but didn't require much

space (~20G per dev environment), for the moment our dev machines are
served by VMWare and the storage is done in NFS appliances with SAS or
SATA drives.
After some testing with consumer grade SSD we discovered that built
speed could be greatly improved by using SSD but having SSD in NFS
appliances is very costly.
So I'm thinking of using consumer grade or even intel s3700 SSD and ceph
as the backend storage. Is there any cons using SSD for data storage
(apart from the price per GB that is higher than SAS drives) ?
  
  
  
Were you thinking of using CephFS? If so, be aware that it's not really

recommended for production use yet. If you were thinking RBD, that's
fine, but you should be aware that you may need to do some tweaking and
have a lot of concurrency to get high IOPS. I'd highly recommend
testing out your use case on a small scale (maybe a 1 or 2 nodes with a
couple of SSDs before diving in head first.
  

There are users doing this who seem quite happy with it. Not sure how many or 
if they want their names on the list…

I'm also wondering how ceph will play with trim if a given rbd device is 
used at 50% of its capacity but some several blocks (from the rbd client 
point of view) are removed and then reallocated. Basically what I'm not 
sure is that new objects will be created (thus allowing the space used 
by the old one to be trimed) or if it will just update existing objects 
that form the rdb device.


Matthieu.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to get latest (non-point release) debs

2013-03-18 Thread Travis Rhoden
Okay, ignore this.  I just can't read.  the path is clearly supposed to me:

deb http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/bobtailprecise
main

my apologies.  Not sure why I got that so wrong.

 - Travis

On Mon, Mar 18, 2013 at 2:49 PM, Travis Rhoden  wrote:

> Hey folks,
>
> There are some changes in Bobtail queued up for 0.56.4 that I am really
> anxious to get, but that build hasn't been released yet.  Is there an apt
> repo I can point out that will get the latest build off of the bobtail
> branch?
>
> based on the docs [1] I tried this:
>
> deb http://gitbuilder.ceph.com/ceph-deb-main-x86_64/ref/bobtail precise
> main
>
> But that was not found.
>
>  - Travis
>
> [1]
> http://ceph.com/docs/master/install/debian/#development-testing-packages
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to get latest (non-point release) debs

2013-03-18 Thread Sage Weil
On Mon, 18 Mar 2013, Travis Rhoden wrote:
> deb http://gitbuilder.ceph.com/ceph-deb-main-x86_64/ref/bobtail precise main

deb http://gitbuilder.ceph.com/ceph-deb-precise-x86_64/ref/bobtail precise main

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to get latest (non-point release) debs

2013-03-18 Thread Travis Rhoden
Hey folks,

There are some changes in Bobtail queued up for 0.56.4 that I am really
anxious to get, but that build hasn't been released yet.  Is there an apt
repo I can point out that will get the latest build off of the bobtail
branch?

based on the docs [1] I tried this:

deb http://gitbuilder.ceph.com/ceph-deb-main-x86_64/ref/bobtail precise main

But that was not found.

 - Travis

[1] http://ceph.com/docs/master/install/debian/#development-testing-packages
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph availability test & recovering question

2013-03-18 Thread Andrey Korolyov
Hello,

I`m experiencing same long-lasting problem - during recovery ops, some
percentage of read I/O remains in-flight for seconds, rendering
upper-level filesystem on the qemu client very slow and almost
unusable. Different striping has almost no effect on visible delays
and reads may be non-intensive at all but they still are very slow.

Here is some fio results on randread with small blocks, so it is not
affected by readahead as linear one:

Intensive reads during recovery:
lat (msec) : 2=0.01%, 4=0.08%, 10=1.87%, 20=4.17%, 50=8.34%
lat (msec) : 100=13.93%, 250=2.77%, 500=1.19%, 750=25.13%, 1000=0.41%
lat (msec) : 2000=15.45%, >=2000=26.66%

same on healthy cluster:
lat (msec) : 20=0.33%, 50=9.17%, 100=23.35%, 250=25.47%, 750=6.53%
lat (msec) : 1000=0.42%, 2000=34.17%, >=2000=0.56%


On Sun, Mar 17, 2013 at 8:18 AM,   wrote:
> Hi, all
>
> I have some problem after availability test
>
> Setup:
> Linux kernel: 3.2.0
> OS: Ubuntu 12.04
> Storage server : 11 HDD (each storage server has 11 osd, 7200 rpm, 1T) + 
> 10GbE NIC
> RAID card: LSI MegaRAID SAS 9260-4i  For every HDD: RAID0, Write Policy: 
> Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct
> Storage server number : 2
>
> Ceph version : 0.48.2
> Replicas : 2
> Monitor number:3
>
>
> We have two storage server as a cluter, then use ceph client create 1T RBD 
> image for testing, the client also
> has 10GbE NIC , Linux kernel 3.2.0 , Ubuntu 12.04
>
> We also use FIO to produce workload
>
> fio command:
> [Sequencial Read]
> fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = read 
> --ioengine=libaio --group_reporting --direct=1 --eta=always  --ramp_time=10 
> --thinktime=10
>
> [Sequencial Write]
> fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = write 
> --ioengine=libaio --group_reporting --direct=1 --eta=always  --ramp_time=10 
> --thinktime=10
>
>
> Now I want observe to ceph state when one storage server is crash, so I turn 
> off one storage server networking.
> We expect that data write and data read operation can be quickly resume or 
> even not be suspended in ceph recovering time, but the experimental results 
> show
> the data write and data read operation will pause for about 20~30 seconds in 
> ceph recovering time.
>
> My question is:
> 1.The state of I/O pause is normal when ceph recovering ?
> 2.The pause time of I/O that can not be avoided when ceph recovering ?
> 3.How to reduce the I/O pause time ?
>
>
> Thanks!!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O Speed Comparisons

2013-03-18 Thread Wolfgang Hennerbichler


On 03/13/2013 06:38 PM, Josh Durgin wrote:
> Anyone seeing this problem, could you try the wip-rbd-cache-aio branch?

Hi,

just compiled and tested it out, unfortunately there's no big change:

 ceph --version
ceph version 0.58-375-ga4a6075 (a4a60758b7a10d51419c1961bb12f26b87672fd5)

libvirt-config:


  
  

  
  
  


within the VM:
cat /dev/zero > /bigfile

again trashes the virtual machine.

reading seems quick(er) though (i can't go back to bobtail without some
hassles on this machine (leveldb on monitor & such), so this is just a
guess):
dd if=/dev/vda of=/dev/null
... 82.0 MB/s

One thing that I've noticed is that the virsh start command returns very
quickly, this used to take longer on bobtail.

> Thanks,
> Josh

you're welcome. I'm really eager helping you out now, I've got a
testbed, if you apply a patch in git I can probably test within 24 hours.

Wolfgang



-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephfs - feature set mismatch - socket error on read

2013-03-18 Thread Marco Aroldi

Hi,
I have a new empty cluster, v0.56.3 - Ubuntu 12.04 with kernel 3.4.36
I have just run mkcephfs and inject the crushmap 
(http://pastebin.com/F1nYKZZR)
I have modified the rulesets, sice I want to distribute the data on a 
per-room basis

This is the osd tree: http://pastebin.com/JsEsWjTx
I have another machine to mount the Cephfs and re-export it via Samba.

If I run the command "ceph osd crush tunables optimal" or "ceph osd 
crush tunables bobtail" on my cluster, the mount fails with "mount error 
5 = Input/output error" and in syslog I have
libceph: mon0 192.168.21.11:6789 feature set mismatch, my 8a < server's 
204008a, missing 204

libceph: mon0 192.168.21.11:6789 socket error on read

Otherwise, if I run "ceph osd crush tunables default" the mount success.
Is advisable to run "ceph osd crush tunables bobtail" on a Bobtail cluster?

Thanks
--
Marco Aroldi
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help fixing clobbered OSD's

2013-03-18 Thread Matthew Anderson
Hi all,

I haven't had much sleep and have accidentally started an OSD on a mount
point mapped to two disks containing OSD data. I think this was the case,
I'm unable to explain how it happened or if this was even the cause. Yeh..
that tired...

What I think happened was OSD.9's disk was mounted over OSD.15 disk. OSD.15
may or may not have been running at the time. OSD.15 now has the error -

   -33> 2013-03-18 21:42:38.610114 7f8048773760  5
filestore(/srv/ceph/osd/osd.15) test_mount basedir /srv/ceph/osd/osd.15
journal /dev/sdd4
   -32> 2013-03-18 21:42:38.610153 7f8048773760  1 --
0.0.0.0:6860/22287messenger.start
   -31> 2013-03-18 21:42:38.610181 7f8048773760  1 -- :/0 messenger.start
   -30> 2013-03-18 21:42:38.610196 7f8048773760  1 --
0.0.0.0:6862/22287messenger.start
   -29> 2013-03-18 21:42:38.610207 7f8048773760  1 --
0.0.0.0:6861/22287messenger.start
   -28> 2013-03-18 21:42:38.610299 7f8048773760  2 osd.15 0 mounting
/srv/ceph/osd/osd.15 /dev/sdd4
   -27> 2013-03-18 21:42:38.610309 7f8048773760  5
filestore(/srv/ceph/osd/osd.15) basedir /srv/ceph/osd/osd.15 journal
/dev/sdd4
   -26> 2013-03-18 21:42:38.610325 7f8048773760 10
filestore(/srv/ceph/osd/osd.15) mount fsid is
71dbf00f-ae22-4366-b610-064107e26697
   -25> 2013-03-18 21:42:38.727408 7f8048773760  0
filestore(/srv/ceph/osd/osd.15) mount FIEMAP ioctl is supported and appears
to work
   -24> 2013-03-18 21:42:38.727423 7f8048773760  0
filestore(/srv/ceph/osd/osd.15) mount FIEMAP ioctl is disabled via
'filestore fiemap' config option
   -23> 2013-03-18 21:42:38.727829 7f8048773760  0
filestore(/srv/ceph/osd/osd.15) mount did NOT detect btrfs
   -22> 2013-03-18 21:42:38.852287 7f8048773760  0
filestore(/srv/ceph/osd/osd.15) mount syscall(SYS_syncfs, fd) fully
supported
   -21> 2013-03-18 21:42:38.852379 7f8048773760  0
filestore(/srv/ceph/osd/osd.15) mount found snaps <>
   -20> 2013-03-18 21:42:38.852401 7f8048773760  5
filestore(/srv/ceph/osd/osd.15) mount op_seq is 25638742
   -19> 2013-03-18 21:42:38.986099 7f8048773760 20 filestore
(init)dbobjectmap: seq is 1
   -18> 2013-03-18 21:42:38.986123 7f8048773760 10
filestore(/srv/ceph/osd/osd.15) open_journal at /dev/sdd4
   -17> 2013-03-18 21:42:38.986150 7f8048773760  0
filestore(/srv/ceph/osd/osd.15) mount: enabling WRITEAHEAD journal mode:
btrfs not detected
   -16> 2013-03-18 21:42:38.986154 7f8048773760 10
filestore(/srv/ceph/osd/osd.15) list_collections
   -15> 2013-03-18 21:42:38.989878 7f8044d1a700 20
filestore(/srv/ceph/osd/osd.15) sync_entry waiting for max_interval 5.00
   -14> 2013-03-18 21:42:38.993422 7f8048773760  0 journal  kernel version
is 3.6.9
   -13> 2013-03-18 21:42:39.012659 7f8048773760  0 journal  kernel version
is 3.6.9
   -12> 2013-03-18 21:42:39.060070 7f80277fe700 20
filestore(/srv/ceph/osd/osd.15) flusher_entry start
   -11> 2013-03-18 21:42:39.060091 7f80277fe700 20
filestore(/srv/ceph/osd/osd.15) flusher_entry sleeping
   -10> 2013-03-18 21:42:39.060091 7f8048773760  2 osd.15 0 boot
-9> 2013-03-18 21:42:39.060104 7f8048773760 15
filestore(/srv/ceph/osd/osd.15) read meta/23c2fcde/osd_superblock/0//-1 0~0
-8> 2013-03-18 21:42:39.060503 7f8048773760 10
filestore(/srv/ceph/osd/osd.15) FileStore::read
meta/23c2fcde/osd_superblock/0//-1 0~332/332
-7> 2013-03-18 21:42:39.060587 7f8048773760 10
filestore(/srv/ceph/osd/osd.15) stat meta/16ef7597/infos/head//-1 = 0 (size
0)
-6> 2013-03-18 21:42:39.060622 7f8048773760 15
filestore(/srv/ceph/osd/osd.15) read meta/4edc6dd9/osdmap.33122/0//-1 0~0
-5> 2013-03-18 21:42:39.061131 7f8048773760 10
filestore(/srv/ceph/osd/osd.15) FileStore::read
meta/4edc6dd9/osdmap.33122/0//-1 0~117079/117079
-4> 2013-03-18 21:42:39.063115 7f8048773760 10
filestore(/srv/ceph/osd/osd.15) list_collections
-3> 2013-03-18 21:42:39.064753 7f8048773760 15
filestore(/srv/ceph/osd/osd.15) collection_getattr
/srv/ceph/osd/osd.15/current/0.39_head 'info'
-2> 2013-03-18 21:42:39.064780 7f8048773760 10
filestore(/srv/ceph/osd/osd.15) collection_getattr
/srv/ceph/osd/osd.15/current/0.39_head 'info' = 1
-1> 2013-03-18 21:42:39.064798 7f8048773760 15
filestore(/srv/ceph/osd/osd.15) omap_get_values meta/16ef7597/infos/head//-1
 0> 2013-03-18 21:42:39.066873 7f8048773760 -1 osd/PG.cc: In function
'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::bufferlist*)' t$
osd/PG.cc: 2393: FAILED assert(values.size() == 1)

 ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::buffer::list*)+0x469) [0x680199]
 2: (OSD::load_pgs()+0x1909) [0x6219d9]
 3: (OSD::init()+0xd07) [0x634e27]
 4: (main()+0x2deb) [0x5640cb]
 5: (__libc_start_main()+0xfd) [0x308921ecdd]
 6: ceph-osd() [0x560f29]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.


OSD.9's error is now -

013-03-18 21:44:08.652017 7f3547fff700 20 filestore(/srv/ceph/osd/osd.9)
flusher_entry start
2013-03-18 21:44:08.652147 7f3547fff700 20 filestore(/s