[ceph-users] RGW: Get object ops performance problem
hi, everyone! I test RGW get obj ops, when I use 100 threads get one and the same object , I find that performance is very good, meadResponseTime is 0.1s. But when I use 150 threads get one and the same object, performace is very bad, meadResponseTime is 1s. and I observe the osd log and rgw log, rgw log: 2014-07-15 10:36:42.999719 7f45596fb700 1 -- 10.0.1.61:0/1022376 -- 10.0.0.21:6835/24201 -- osd_op(client.6167.0:22721 default.5632.8_ws1411.jpg [getxattrs,stat,read 0~524288] 4.5210f70b ack+read e657) 2014-07-15 10:36:44.064720 7f467efdd700 1 -- 10.0.1.61:0/1022376 == osd.7 10.0.0.21:6835/24201 22210 osd_op_reply(22721 osd log: 10:36:43.001895 7f6cdb24c700 1 -- 10.0.0.21:6835/24201 == client.6167 10.0.1.61:0/1022376 22436 osd_op(client.6167.0:22721 default.5632.8_ws1411.jpg 2014-07-15 10:36:43.031762 7f6cbf01f700 1 -- 10.0.0.21:6835/24201 -- 10.0.1.61:0/1022376 -- osd_op_reply(22721 default.5632.8_ws1411.jpg so I think the problem is not happened in the osd, why osd send op replay in 10:36:43.031762 , but rgw receive in 10:36:44.064720 ? baijia...@126.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] basic questions about pool
Hi Pragya Let me try to answer these. 1# The decisions is based on your use case ( performance , reliability ) .If you need high performance out of your cluster , the deployer will create a pool on SSD and assign this pool to applications which require higher I/O. For Ex : if you integrate openstack with Ceph , you can instruct openstack configuration files to write data to a specific ceph pool. (http://ceph.com/docs/master/rbd/rbd-openstack/#configuring-glance) , similarly you can instruct CephFS and RadosGW with pool to use for data storage. 2# Usually the end user (client to ceph cluster) does not bother about where the data is getting stored , which pool its using , and what is the real physical locate of data. End user will demand for specific performance , reliability and availability. It is the job of Ceph admin to fulfil their storage requirements, out of Ceph functionalities of SSD , Erausre codes , replication level etc. Block Device :- End user will instruct the application ( Qemu / KVM , OpenStack etc ) , which pool it should for data storage. rbd is the default pool for block device. CephFS :- End user will mount this pool as filesystem and can use further. Default pool are data and metadata . RadosGW :- End user will storage objects using S3 or Swift API. - Karan Singh - On 15 Jul 2014, at 07:42, pragya jain prag_2...@yahoo.co.in wrote: thank you very much, Craig, for your clear explanation against my questions. Now I am very clear about the concept of pools in ceph. But I have two small questions: 1. How does the deployer decide that a particular type of information will be stored in a particular pool? Are there any settings at the time of creation of pool that a deployer should make to ensure that which type of data will be stored in which pool? 2. How does an end-user specify that his/her data will be stored in which pool? how can an end-user come to know which pools are stored on SSDs or on HDDs, what are the properties of a particular pool? Thanks again, Please help to clear these confusions also. Regards Pragya Jain On Sunday, 13 July 2014 5:04 AM, Craig Lewis cle...@centraldesktop.com wrote: I'll answer out of order. #2: rdb is used for RDB images. data and metadata are used by CephFS. RadosGW's default pools will be created the first time radosgw starts up. If you aren't using RDB or CephFS, you can ignore those pools. #1: RadosGW will use several pools to segregate it's data. There are a couple pools for store user/subuser information, as well as pools for storing the actual data. I'm using federation, and I have a total of 18 pools that RadosGW is using in some form. Pools are a way to logically separate your data, and pools can also have different replication/storage settings. For example, I could say that the .rgw.buckets.index pool needs 4x replication and is only stored on SSDs, while .rgw.bucket is 3x replication on HDDs. #3: In addition to #1, you can setup different pools to actually store user data in RadosGW. For example, an end user may have some very important data that you want replicated 4 times, and some other data that needs to be stored on SSDs for low latency. Using CRUSH, you would create the some rados pools with those specs. Then you'd setup some placement targets in RadosGW that use those pools. A user that cares will specify a placement target when they create a bucket. That way they can decide what the storage requirements are. If they don't care, then they can just use the default. Does that help? On Thu, Jul 10, 2014 at 11:34 PM, pragya jain prag_2...@yahoo.co.in wrote: hi all, I have some very basic questions about pools in ceph. According to ceph documentation, as we deploy a ceph cluster with radosgw instance over it, ceph creates pool by default to store the data or the deployer can also create pools according to the requirement. Now, my question is: 1. what is the relevance of multiple pools in a cluster? i.e. why should a deployer create multiple pools in a cluster? what should be the benefits of creating multiple pools? 2. according to the docs, the default pools are data, metadata, and rbd. what is the difference among these three types of pools? 3. when a system deployer has deployed a ceph cluster with radosgw interface and start providing services to the end-user, such as, end-user can create their account on the ceph cluster and can store/retrieve their data to/from the cluster, then Is the end user has any concern about the pools created in the cluster? Please somebody help me to clear these confusions. regards Pragya Jain ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list
Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time
Hi Sage, since this problem is tunables-related, do we need to expect same behavior or not when we do regular data rebalancing caused by adding new/removing OSD? I guess not, but would like your confirmation. I'm already on optimal tunables, but I'm afraid to test this by i.e. shuting down 1 OSD. Thanks, Andrija On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote: I've added some additional notes/warnings to the upgrade and release notes: https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451 If there is somewhere else where you think a warning flag would be useful, let me know! Generally speaking, we want to be able to cope with huge data rebalances without interrupting service. It's an ongoing process of improving the recovery vs client prioritization, though, and removing sources of overhead related to rebalancing... and it's clearly not perfect yet. :/ sage On Sun, 13 Jul 2014, Andrija Panic wrote: Hi, after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush tunables optimal and after only few minutes I have added 2 more OSDs to the CEPH cluster... So these 2 changes were more or a less done at the same time - rebalancing because of tunables optimal, and rebalancing because of adding new OSD... Result - all VMs living on CEPH storage have gone mad, no disk access efectively, blocked so to speak. Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... Did I do wrong by causing 2 rebalancing to happen at the same time ? Is this behaviour normal, to cause great load on all VMs because they are unable to access CEPH storage efectively ? Thanks for any input... -- Andrija Pani? -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph with Multipath ISCSI
Hello guys, I was wondering if there has been any progress on getting multipath iscsi play nicely with ceph? I've followed the how to and created a single path iscsi over ceph rbd with XenServer. However, it would be nice to have a built in failover using iscsi multipath to another ceph mon or osd server. Cheers Andrei -- Andrei Mikhailovsky Director Arhont Information Security Web: http://www.arhont.com http://www.wi-foo.com Tel: +44 (0)870 4431337 Fax: +44 (0)208 429 3111 PGP: Key ID - 0x2B3438DE PGP: Server - keyserver.pgp.com DISCLAIMER The information contained in this email is intended only for the use of the person(s) to whom it is addressed and may be confidential or contain legally privileged information. If you are not the intended recipient you are hereby notified that any perusal, use, distribution, copying or disclosure is strictly prohibited. If you have received this email in error please immediately advise us by return email at and...@arhont.com and delete and purge the email and any attachments without making a copy. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] qemu image create failed
Can you connect to your Ceph cluster? You can pass options to the cmd line like this: $ qemu-img create -f rbd rbd:instances/vmdisk01:id=leseb:conf=/etc/ceph/ceph-leseb.conf 2G Cheers. Sébastien Han Cloud Engineer Always give 100%. Unless you're giving blood. Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 11 bis, rue Roquépine - 75008 Paris Web : www.enovance.com - Twitter : @enovance On 12 Jul 2014, at 03:06, Yonghua Peng sys...@mail2000.us wrote: Anybody knows this issue? thanks. Fri, 11 Jul 2014 10:26:47 +0800 from Yonghua Peng sys...@mail2000.us: Hi, I try to create a qemu image, but got failed. ceph@ceph:~/my-cluster$ qemu-img create -f rbd rbd:rbd/qemu 2G Formatting 'rbd:rbd/qemu', fmt=rbd size=2147483648 cluster_size=0 qemu-img: error connecting qemu-img: rbd:rbd/qemu: error while creating rbd: Input/output error Can you tell what's the problem? Thanks. -- We are hiring cloud Dev/Ops, more details please see: YY Cloud Jobs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Placing different pools on different OSDs in the same physical servers
Hi, to avoid confusion I would name the host entries in the crush map differently. Make sure these host names can be resolved to the correct boxes though (/etc/hosts on all the nodes). You're also missing a new rule entry (also shown in the link you mentioned). Lastly, and this is *extremely* important: You need to set [global] osd crush update on start = false in your ceph.conf because there is currently no logic for OSDs to detect their location with different roots present as documented here: http://tracker.ceph.com/issues/6227 If you don't set this, whenever you start an OSD belonging to your SSD root, it will move the OSD over to the default root. Side note: this is really unfortunate since with cache pools it is now common to have platters and SSDs on the same physical hosts and also multiple parallel roots. On 10/07/2014 17:04, Nikola Pajtic wrote: Hello to all, I was wondering is it possible to place different pools on different OSDs, but using only two physical servers? I was thinking about this: http://tinypic.com/r/30tgt8l/8 I would like to use osd.0 and osd.1 for Cinder/RBD pool, and osd.2 and osd.3 for Nova instances. I was following the howto from ceph documentation: http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds , but it assumed that there are 4 physical servers: 2 for Platter pool and 2 for SSD pool. What I was concerned about is how the CRUSH map should be written and how the CRUSH will decide where it will send the data? Because of the the same hostnames in cinder and nova pools. For example, is it possible to do something like this: # buckets host cephosd1 { id -2 # do not change unnecessarily # weight 0.010 alg straw hash 0 # rjenkins1 item osd.0 weight 0.000 } host cephosd1 { id -3 # do not change unnecessarily # weight 0.010 alg straw hash 0 # rjenkins1 item osd.2 weight 0.010 } host cephosd2 { id -4 # do not change unnecessarily # weight 0.010 alg straw hash 0 # rjenkins1 item osd.1 weight 0.000 } host cephosd2 { id -5 # do not change unnecessarily # weight 0.010 alg straw hash 0 # rjenkins1 item osd.3 weight 0.010 } root cinder { id -1 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 item cephosd1 weight 0.000 item cephosd2 weight 0.000 } root nova { id -6 # do not change unnecessarily # weight 0.020 alg straw hash 0 # rjenkins1 item cephosd1 weight 0.010 item cephosd2 weight 0.010 } If not, could you share an idea how this scenario could be achieved? Thanks in advance!! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Working ISCSI target guide
Does anyone have a guide or re-producible method of getting multipath ISCSI working infront of ceph? Even if it just means having two front-end ISCSI targets each with access to the same underlying Ceph volume? This seems like a super popular topic. Thanks, -Drew ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Working ISCSI target guide
One other question, if you are going to be using Ceph as a storage system for KVM virtual machines does it even matter if you use ISCSI or not? Meaning that if you are just going to use LVM and have several hypervisors sharing that same VG then using ISCSI isn't really a requirement unless you are using a Hypervisor like ESXi which only works with ISCSI/NFS correct? Thanks, -Drew From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Drew Weaver Sent: Tuesday, July 15, 2014 9:03 AM To: 'ceph-users@lists.ceph.com' Subject: [ceph-users] Working ISCSI target guide Does anyone have a guide or re-producible method of getting multipath ISCSI working infront of ceph? Even if it just means having two front-end ISCSI targets each with access to the same underlying Ceph volume? This seems like a super popular topic. Thanks, -Drew ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] create a image that stores information in erasure-pool failed
hi,all: I created an erasure-code pool ,I create a 1GB image named fool that stores information in the erasure-code pool,however the action failed,with tips as follows: root@mon1:~# rbd create foo --size 1024 --pool ecpool rbd: create error: (95) Operation not supported2014-07-13 10:32:55.311330 7f1b6563f780 -1 librbd: error adding image to directory: (95) Operation not supported since I created another image in general pool ,the action succeed.so I wonder wheter it's my mistake or it's a problem in erasure-code pool.Could you help me solving this puzzing problem?thank you very much! my cluster are deployed as follows: one monitor six osds EC pool :ecpool with 100pgs,profile:jerasure,k=4,m=2,reed_sol_van yours sincerely, ifstillfly qixiaof...@chinacloud.com.cn___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Working ISCSI target guide
Drew, I would not use iscsi with ivm. instead, I would use built in rbd support. However, you would use something like nfs/iscsi if you were to connect other hypervisors to ceph backend. Having failover capabilities is important here )) Andrei -- Andrei Mikhailovsky Director Arhont Information Security Web: http://www.arhont.com http://www.wi-foo.com Tel: +44 (0)870 4431337 Fax: +44 (0)208 429 3111 PGP: Key ID - 0x2B3438DE PGP: Server - keyserver.pgp.com DISCLAIMER The information contained in this email is intended only for the use of the person(s) to whom it is addressed and may be confidential or contain legally privileged information. If you are not the intended recipient you are hereby notified that any perusal, use, distribution, copying or disclosure is strictly prohibited. If you have received this email in error please immediately advise us by return email at and...@arhont.com and delete and purge the email and any attachments without making a copy. - Original Message - From: Drew Weaver drew.wea...@thenap.com To: ceph-users@lists.ceph.com ceph-users@lists.ceph.com Sent: Tuesday, 15 July, 2014 2:18:53 PM Subject: Re: [ceph-users] Working ISCSI target guide One other question, if you are going to be using Ceph as a storage system for KVM virtual machines does it even matter if you use ISCSI or not? Meaning that if you are just going to use LVM and have several hypervisors sharing that same VG then using ISCSI isn’t really a requirement unless you are using a Hypervisor like ESXi which only works with ISCSI/NFS correct? Thanks, -Drew From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Drew Weaver Sent: Tuesday, July 15, 2014 9:03 AM To: 'ceph-users@lists.ceph.com' Subject: [ceph-users] Working ISCSI target guide Does anyone have a guide or re-producible method of getting multipath ISCSI working infront of ceph? Even if it just means having two front-end ISCSI targets each with access to the same underlying Ceph volume? This seems like a super popular topic. Thanks, -Drew ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-fuse couldn't be connect.
Hi All, I am using ceph 0.80.1 on Ubuntu 14.04 on KVM. However, I cannot connect to the MON from a client using ceph-fuse. On the client, I installed the ceph-fuse 0.80.1 and added fuse. But, I think it is wrong. The result is # modprobe fuse (Any output was nothing) # lsmod | grep fuse (Any output was nothing) # ceph-fuse -m 192.168.122.106:6789 /mnt ceph-fuse[1905]: starting ceph client (at this point, ceph-fuse fell into an infinite while loop) ^C # What problem is it? My cluster like the follow: Host OS (Ubuntu 14.04) --- VM-1 (Ubuntu 14.04) -- MON-0 -- MDS-0 --- VM-2 (Ubuntu 14.04) -- OSD-0 --- VM-3 (Ubuntu 14.04) -- OSD-1 -- OSD-2 -- OSD-3 --- VM-4 (Ubuntu 14.04) -- it's for client. the result of # ceph -s on VM-1, which is MON, is # ceph -s cluster 1ae5585d-03c6-4a57-ba79-c65f4ed9e69f health HEALTH_OK monmap e1: 1 mons at {csA=192.168.122.106:6789/0}, election epoch 1, quorum 0 csA osdmap e37: 4 osds: 4 up, 4 in pgmap v678: 192 pgs, 3 pools, 0 bytes data, 0 objects 20623 MB used, 352 GB / 372 GB avail 192 active+clean # Regards, Jae -- 이재면 Jaemyoun Lee E-mail : jaemy...@gmail.com Homepage : http://jaemyoun.com Facebook : https://www.facebook.com/jaemyoun ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time
On Tue, 15 Jul 2014, Andrija Panic wrote: Hi Sage, since this problem is tunables-related, do we need to expect same behavior or not when we do regular data rebalancing caused by adding new/removing OSD? I guess not, but would like your confirmation. I'm already on optimal tunables, but I'm afraid to test this by i.e. shuting down 1 OSD. When you shut down a single OSD it is a relativey small amount of data that needs to move to do the recovery. The issue with the tunables is just that a huge fraction of the data stored needs to move, and the performance impact is much higher. sage Thanks, Andrija On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote: I've added some additional notes/warnings to the upgrade and release notes: https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca77328324 51 If there is somewhere else where you think a warning flag would be useful, let me know! Generally speaking, we want to be able to cope with huge data rebalances without interrupting service. It's an ongoing process of improving the recovery vs client prioritization, though, and removing sources of overhead related to rebalancing... and it's clearly not perfect yet. :/ sage On Sun, 13 Jul 2014, Andrija Panic wrote: Hi, after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush tunables optimal and after only few minutes I have added 2 more OSDs to the CEPH cluster... So these 2 changes were more or a less done at the same time - rebalancing because of tunables optimal, and rebalancing because of adding new OSD... Result - all VMs living on CEPH storage have gone mad, no disk access efectively, blocked so to speak. Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... Did I do wrong by causing 2 rebalancing to happen at the same time ? Is this behaviour normal, to cause great load on all VMs because they are unable to access CEPH storage efectively ? Thanks for any input... -- Andrija Pani? -- Andrija Pani? -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Working ISCSI target guide
hi. there may be 2 ways. but cephfs is not product-ready. 1. you can use a file stored in cephfs as a target. 2.there is a rbd.ko which map a rbd device as a block device, which you can assign to target. i have not tested yet. good luck At 2014-07-15 09:18:53, Drew Weaver drew.wea...@thenap.com wrote: One other question, if you are going to be using Ceph as a storage system for KVM virtual machines does it even matter if you use ISCSI or not? Meaning that if you are just going to use LVM and have several hypervisors sharing that same VG then using ISCSI isn’t really a requirement unless you are using a Hypervisor like ESXi which only works with ISCSI/NFS correct? Thanks, -Drew From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Drew Weaver Sent: Tuesday, July 15, 2014 9:03 AM To: 'ceph-users@lists.ceph.com' Subject: [ceph-users] Working ISCSI target guide Does anyone have a guide or re-producible method of getting multipath ISCSI working infront of ceph? Even if it just means having two front-end ISCSI targets each with access to the same underlying Ceph volume? This seems like a super popular topic. Thanks, -Drew ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] the differences between snap and clone in terms of implement
hi,all i take a glance at ceph code of cls_rbd.cc. it seems that snap and clone both do not R/W any data, they just add some keys and values, even rbds in different pools. am i missing something? or could you explain deeper about the implemention of snap and clone. thanks very much. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to plan the ceph storage architecture when i reuse old PC Server
It's generally recommended that you use disks in JBOD mode rather than involving RAID. -Greg On Monday, July 14, 2014, 不坏阿峰 onlydeb...@gmail.com wrote: I have installed and test Ceph on VMs before, i know a bit about configuration and install. Now i want to use physic PC Server to install Ceph and do some Test, i think the prefermance will better than VMs. I have some question about how to plan the ceph storage architecture. what do i have as below: 1. only one Intel SSD 520 120G, plan used for Ceph journal, to speed the Ceph prefermance. 2. some old PC Server with Array Controller, include 4 x 300G SAS HDD 3. Gigabit Switch what do i plan: 1. Ceph cluster such as: A: osd+mon, B: osd+mon , two PC server Ceph cluster 2. the only one SSD install on A, use for Ceph journal 3. use Nic bind(LACP) on Server, and Switch port configure LACP 4. i plan to use Debian 7 testing what shall i ask: 1. shall i use Array controller config Raid or not ? or how to use the HDD better, use HDD directly, no need raid ? 2. i have only one SSD HDD for journal, so what shall i modify in my plan ?? and how to config journal to SSD ? hope some one can give me some guide. Many thanks in advanced. -- Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 0.80.1 to 0.80.3: strange osd log messages
After upgrading 0.80.1 to 0.80.3 I see many regular messages on every OSD log: 2014-07-15 19:44:48.292839 7fa5a659f700 0 osd.5 62377 crush map has features 2199057072128, adjusting msgr requires for mons (constant part: crush map has features 2199057072128, adjusting msgr requires for mons) HEALTH_OK, tunables optimal. What is it? -- WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] scrub error on firefly
Hi Randy, This is the same kernel we reproduced the issue on as well. Sam traced this down to the XFS allocation hint ioctl we recently started using for RBD. We've just pushed out a v0.80.4 firefly release that disables the hint by default. It should stop the inconsistencies from popping up, although you will need to use ceph pg repair pgid to fix the existing inconsistencies. sage On Mon, 14 Jul 2014, Randy Smith wrote: $ lsb_release -a LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd 64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-n oarch Distributor ID: Ubuntu Description: Ubuntu 12.04.4 LTS Release: 12.04 Codename: precise $ uname -a Linux droopy 3.2.0-64-generic #97-Ubuntu SMP Wed Jun 4 22:04:21 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux On Sat, Jul 12, 2014 at 3:21 PM, Samuel Just sam.j...@inktank.com wrote: Also, what distribution and kernel version are you using? -Sam On Jul 12, 2014 10:46 AM, Samuel Just sam.j...@inktank.com wrote: When you see another one, can you include the xattrs on the files as well (you can use the attr(1) utility)? -Sam On Sat, Jul 12, 2014 at 9:51 AM, Randy Smith rbsm...@adams.edu wrote: That image is the root file system for a linux ldap server. -- Randall Smith Adams State University www.adams.edu 719-587-7741 On Jul 12, 2014 10:34 AM, Samuel Just sam.j...@inktank.com wrote: Here's a diff of the two files. One of the two files appears to contain ceph leveldb keys? Randy, do you have an idea of what this rbd image is being used for (rb.0.b0ce3.238e1f29, that is). -Sam On Fri, Jul 11, 2014 at 7:25 PM, Randy Smith rbsm...@adams.edu wrote: Greetings, Well it happened again with two pgs this time, still in the same rbd image. They are at http://people.adams.edu/~rbsmith/osd.tar. I think I grabbed the files correctly. If not, let me know and I'll try again on the next failure. It certainly is happening often enough. On Fri, Jul 11, 2014 at 3:39 PM, Samuel Just sam.j...@inktank.com wrote: And grab the xattrs as well. -Sam On Fri, Jul 11, 2014 at 2:39 PM, Samuel Just sam.j...@inktank.com wrote: Right. -Sam On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith rbsm...@adams.edu wrote: Greetings, I'm using xfs. Also, when, in a previous email, you asked if I could send the object, do you mean the files from each server named something like this: ./3.c6_head/DIR_6/DIR_C/DIR_5/rb.0.b0ce3.238e1f29.000b__head_34DC35 C6__3 ? On Fri, Jul 11, 2014 at 2:00 PM, Samuel Just sam.j...@inktank.com wrote: Also, what filesystem are you using? -Sam On Fri, Jul 11, 2014 at 10:37 AM, Sage Weil sw...@redhat.com wrote: One other thing we might also try is catching this earlier (on first read of corrupt data) instead of waiting for scrub. If you are not super performance sensitive, you can add filestore sloppy crc = true filestore sloppy crc block size = 524288 That will track and verify CRCs on any large (512k) writes. Smaller block sizes will give more precision and more checks, but will generate larger xattrs and have a bigger impact on performance... sage On Fri, 11 Jul 2014, Samuel Just wrote: When you get the next inconsistency, can you copy the actual
Re: [ceph-users] basic questions about pool
thank you very much, Karan, for your explanation. Regards Pragya Jain On Tuesday, 15 July 2014 1:53 PM, Karan Singh karan.si...@csc.fi wrote: Hi Pragya Let me try to answer these. 1# The decisions is based on your use case ( performance , reliability ) .If you need high performance out of your cluster , the deployer will create a pool on SSD and assign this pool to applications which require higher I/O. For Ex : if you integrate openstack with Ceph , you can instruct openstack configuration files to write data to a specific ceph pool. (http://ceph.com/docs/master/rbd/rbd-openstack/#configuring-glance) , similarly you can instruct CephFS and RadosGW with pool to use for data storage. 2# Usually the end user (client to ceph cluster) does not bother about where the data is getting stored , which pool its using , and what is the real physical locate of data. End user will demand for specific performance , reliability and availability. It is the job of Ceph admin to fulfil their storage requirements, out of Ceph functionalities of SSD , Erausre codes , replication level etc. Block Device :- End user will instruct the application ( Qemu / KVM , OpenStack etc ) , which pool it should for data storage. rbd is the default pool for block device. CephFS :- End user will mount this pool as filesystem and can use further. Default pool are data and metadata . RadosGW :- End user will storage objects using S3 or Swift API. - Karan Singh - On 15 Jul 2014, at 07:42, pragya jain prag_2...@yahoo.co.in wrote: thank you very much, Craig, for your clear explanation against my questions. Now I am very clear about the concept of pools in ceph. But I have two small questions: 1. How does the deployer decide that a particular type of information will be stored in a particular pool? Are there any settings at the time of creation of pool that a deployer should make to ensure that which type of data will be stored in which pool? 2. How does an end-user specify that his/her data will be stored in which pool? how can an end-user come to know which pools are stored on SSDs or on HDDs, what are the properties of a particular pool? Thanks again, Please help to clear these confusions also. Regards Pragya Jain On Sunday, 13 July 2014 5:04 AM, Craig Lewis cle...@centraldesktop.com wrote: I'll answer out of order. #2: rdb is used for RDB images. data and metadata are used by CephFS. RadosGW's default pools will be created the first time radosgw starts up. If you aren't using RDB or CephFS, you can ignore those pools. #1: RadosGW will use several pools to segregate it's data. There are a couple pools for store user/subuser information, as well as pools for storing the actual data. I'm using federation, and I have a total of 18 pools that RadosGW is using in some form. Pools are a way to logically separate your data, and pools can also have different replication/storage settings. For example, I could say that the .rgw.buckets.index pool needs 4x replication and is only stored on SSDs, while .rgw.bucket is 3x replication on HDDs. #3: In addition to #1, you can setup different pools to actually store user data in RadosGW. For example, an end user may have some very important data that you want replicated 4 times, and some other data that needs to be stored on SSDs for low latency. Using CRUSH, you would create the some rados pools with those specs. Then you'd setup some placement targets in RadosGW that use those pools. A user that cares will specify a placement target when they create a bucket. That way they can decide what the storage requirements are. If they don't care, then they can just use the default. Does that help? On Thu, Jul 10, 2014 at 11:34 PM, pragya jain prag_2...@yahoo.co.in wrote: hi all, I have some very basic questions about pools in ceph. According to ceph documentation, as we deploy a ceph cluster with radosgw instance over it, ceph creates pool by default to store the data or the deployer can also create pools according to the requirement. Now, my question is: 1. what is the relevance of multiple pools in a cluster? i.e. why should a deployer create multiple pools in a cluster? what should be the benefits of creating multiple pools? 2. according to the docs, the default pools are data, metadata, and rbd. what is the difference among these three types of pools? 3. when a system deployer has deployed a ceph cluster with radosgw interface and start providing services to the end-user, such as, end-user can create their account on the ceph cluster and can store/retrieve their data to/from the cluster, then Is the end user has any concern about the pools created in the cluster? Please somebody help me to clear these confusions. regards Pragya Jain ___ ceph-users mailing list ceph-users@lists.ceph.com