Re: [ceph-users] Intel 520/530 SSD for ceph
Hi Marcus, Am 18.11.2013 23:51, schrieb m...@linuxbox.com: On Mon, Nov 18, 2013 at 02:38:42PM +0100, Stefan Priebe - Profihost AG wrote: You may actually be doing O_SYNC - recent kernels implement O_DSYNC, but glibc maps O_DSYNC into O_SYNC. But since you're writing to the block device this won't matter much. No difference regarding O_DSYNC or O_SYNC the values are the same. Also I'm using 3.10.19 as a kernel so it is recent enough. I believe the effect of O_DIRECT by itself is just to bypass the buffer cache, which is not going to make much difference for your dd case. (It will mainly affect other applications that are also using the buffer cache...) O_SYNC should be causing the writes to block until a response is received from the disk. Without O_SYNC, the writes will just queue operations and return - potentially very fast. Your dd is probably writing enough data that there is some throttling by the system as it runs out of disk buffers and has to wait for some previous data to be written to the drive, but the delay for any individual block is not likely to matter. With O_SYNC, you are measuring the delay for each block directly, and you have absolutely removed the ability for the disk to perform any sort of parallism. That's correct but ceph uses O_DSYNC for his journal and may be other stuff so it is important to have devices performing well with O_DSYNC. Sounds like the intel 530 is has a much larger block write latency, but can make up for it by performing more overlapped operations. You might be able to vary this behavior by experimenting with sdparm, smartctl or other tools, or possibly with different microcode in the drive. Which values or which settings do you think of? Greets Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping
Hello David and Chris, Thank you for your replies in this thread. The automatic repair should handle getting an EIO during read of the object replica. I think when osd tries to read object from primary disk, which inside of bad sector, the controller does not respond with EIO but something else. If you can help me how can I debug response code I try to know. Thank you, Mihaly 2013/11/19 David Zafman david.zaf...@inktank.com I looked at the code. The automatic repair should handle getting an EIO during read of the object replica. It does NOT require removing the object as I said before, so it doesn’t matter which copy has bad sectors. It will copy from a good replica to the primary, if necessary. By default a deep-scrub which would catch this case is performed weekly. A repair must be initiated by administrative action. When replicas differ due to comparison of checksums, we currently don’t have a way to determine which copy(s) are corrupt. This is where a manual intervention may be necessary if the administrator can determine which copy(s) are bad. David Zafman Senior Developer http://www.inktank.com On Nov 18, 2013, at 1:11 PM, Chris Dunlop ch...@onthe.net.au wrote: OK, that's good (as far is it goes, being a manual process). So then, back to what I think was Mihály's original issue: pg repair or deep-scrub can not fix this issue. But if I understand correctly, osd has to known it can not retrieve object from osd.0 and need to be replicate an another osd because there is no 3 working replicas now. Given a bad checksum and/or read error tells ceph that an object is corrupt, it would seem to be a natural step to then have ceph automatically use another good-checksum copy, and even rewrite the corrupt object, either in normal operation or under a scub or repair. Is there a reason this isn't done, apart from lack of tuits? Cheers, Chris On Mon, Nov 18, 2013 at 11:43:46AM -0800, David Zafman wrote: No, you wouldn’t need to re-replicate the whole disk for a single bad sector. The way to deal with that if the object is on the primary is to remove the file manually from the OSD’s filesystem and perform a repair of the PG that holds that object. This will copy the object back from one of the replicas. David On Nov 17, 2013, at 10:46 PM, Chris Dunlop ch...@onthe.net.au wrote: Hi David, On Fri, Nov 15, 2013 at 10:00:37AM -0800, David Zafman wrote: Replication does not occur until the OSD is “out.” This creates a new mapping in the cluster of where the PGs should be and thus data begins to move and/or create sufficient copies. This scheme lets you control how and when you want the replication to occur. If you have plenty of space and you aren’t going to replace the drive immediately, just mark the OSD “down AND “out.. If you are going to replace the drive immediately, set the “noout” flag. Take the OSD “down” and replace drive. Assuming it is mounted in the same place as the bad drive, bring the OSD back up. This will replicate exactly the same PGs the bad drive held back to the replacement drive. As was stated before don’t forget to “ceph osd unset noout Keep in mind that in the case of a machine that has a hardware failure and takes OSD(s) down there is an automatic timeout which will mark them “out for unattended operation. Unless you are monitoring the cluster 24/7 you should have enough disk space available to handle failures. Related info in: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ David Zafman Senior Developer http://www.inktank.com Are you saying, if a disk suffers from a bad sector in an object for which it's primary, and for which good data exists on other replica PGs, there's no way for ceph to recover other than by (re-)replicating the whole disk? I.e., even if the disk is able to remap the bad sector using a spare, so the disk is ok (albeit missing a sector's worth of object data), the only way to recover is to basically blow away all the data on that disk and start again, replicating everything back to the disk (or to other disks)? Cheers, Chris. -- Best regards, Mihály Árva-Tóth System Engineer Virtual Call Center GmbH Address: 23-33 Csalogány Street, Budapest 1027, Hungary Tel: +36 1 999 7400 Mobile: +36 30 473 9256 Fax: +36 1 999 7401 E-mail: mihaly.arva-t...@virtual-call-center.eu Web: www.virtual-call-center.eu http://www.virtual-call-center.hu/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] alternative approaches to CEPH-FS
2) Can't grow once you reach the hard limit of 14TB, and if you have multiple of such machines, then fragmentation becomes a problem 3) might have the risk of 14TB partition corruption wiping out all your shares 14TB limit is due to EXT(3/4) recommendation(/implementation)? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] about start a osd daemon
On Tue, Nov 19, 2013 at 1:29 AM, Dnsbed Ops o...@dnsbed.com wrote: Hi, When an osd node server restarted, I found the osd daemon doesn't get started. I must run these two commands from the deploy node to restart them: ceph-deploy osd prepare ceph3.anycast.net:/tmp/osd2 ceph-deploy osd activate ceph3.anycast.net:/tmp/osd2 That really doesn't restart them, it creates them. I think ceph-deploy is not destroying anything here so it appears like it restarts when in fact is re-doing the whole process again. My questions are, #1, can it be setup to startup automacially from the system reboot? I would think they should come up on boot. Could you share some log output? #2, each time we activate it, we must prepare it firstly? This is only required for paths, not for devices. For example, if you had your OSD on the sdb device you could just call `create` directly. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Librados Error Codes
Hi, when using the librados c library, the documentation of the different functions just tells that it returns a negative error code on failure, e.g. the rados_read function (http://ceph.com/docs/master/rados/api/librados/#rados_read). Is there anywhere any further documentation which error code is returned under which condition and how to know _why_ the operation has failed? Thanks! Regards, Behar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Size of RBD images
Hi every one, In the course of playing with RBD, I noticed a few things: * The RBD images are so thin-provisioned, you can create arbitrarily large ones. On my 0.72.1 freshly-installed empty 200TB cluster, I was able to create a 1PB image: $ rbd create --image-format 2 --size 1073741824 test_img This command is successful, and I can check the image status: $ rbd info test rbd image 'test': size 1024 TB in 268435456 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.19f76b8b4567 format: 2 features: layering * Such an oversized image seems unmountable on my 3.2.46 kernel. However the error message is not very explicit: $ rbd map test_img rbd: add failed: (2) No such file or directory There is no error or explanation to be seen anywhere in the logs. dmesg reports the connection to the cluster through RBD as usual, and that's it. Using the exact same commands with image size 32GB will successfully map the device. * Such an oversize image takes an awfully long time to shrink or remove. However, the image has just been created and is empty. In RADOS, I only see the corresponding rbd_id and rbd_header, but no data object at all. Still, removing the 1PB image takes roughly 8 hours. Cluster config: 3 mons, 8nodes * 72osds, about 4800pgs (2400pgs in pool rbd) cluster and public network are 10GbE, each node has 8 cores and 64GB mem Oh, so my questions: - why is it possible to create an image five times the size of the cluster without warning? - where could this No such file error come from? - why does it take long to shrink/delete a large-but-empty-and-thin-provisioned image? I know that 1PB is oversized (No such file when trying to map), and 32GB is not, so I am currently looking for the oversize threshold. More info coming soon. Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
Hi Nicolas just fyi rbd format 2 is not supported yet by the linux kernel (module) it only can be used as a target for virtual machines using librbd see: man rbd -- --image-format shrinking time: same happend to me, rbd (v1) device took about a week to shrink from 1PB to 10TB the good news: I had already about 5TB data on it and ongoing processes using the device and neither was there any data loss nor was there a significant performance issue. (3mons + 4machines with different amount of OSDs each) Bernhard EDIT: sorry about the No such file error Now, it seems this is a separate issue: the system I was using was apparently unable to map devices to images in format 2. I will be investigating that further before mentioning it again. I would still appreciate answers about the 1PB image and the time to shrink it. Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) On 11/19/2013 03:20 PM, nicolasc wrote: Hi every one, In the course of playing with RBD, I noticed a few things: * The RBD images are so thin-provisioned, you can create arbitrarily large ones. On my 0.72.1 freshly-installed empty 200TB cluster, I was able to create a 1PB image: $ rbd create --image-format 2 --size 1073741824 test_img This command is successful, and I can check the image status: $ rbd info test rbd image 'test': size 1024 TB in 268435456 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.19f76b8b4567 format: 2 features: layering * Such an oversized image seems unmountable on my 3.2.46 kernel. However the error message is not very explicit: $ rbd map test_img rbd: add failed: (2) No such file or directory There is no error or explanation to be seen anywhere in the logs. dmesg reports the connection to the cluster through RBD as usual, and that's it. Using the exact same commands with image size 32GB will successfully map the device. * Such an oversize image takes an awfully long time to shrink or remove. However, the image has just been created and is empty. In RADOS, I only see the corresponding rbd_id and rbd_header, but no data object at all. Still, removing the 1PB image takes roughly 8 hours. Cluster config: 3 mons, 8nodes * 72osds, about 4800pgs (2400pgs in pool rbd) cluster and public network are 10GbE, each node has 8 cores and 64GB mem Oh, so my questions: - why is it possible to create an image five times the size of the cluster without warning? - where could this No such file error come from? - why does it take long to shrink/delete a large-but-empty-and-thin-provisioned image? I know that 1PB is oversized (No such file when trying to map), and 32GB is not, so I am currently looking for the oversize threshold. More info coming soon. Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Bernhard Glomm IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] about start a osd daemon
On 11/19/13, 1:29 AM, Dnsbed Ops o...@dnsbed.com wrote: Hi, When an osd node server restarted, I found the osd daemon doesn't get started. I must run these two commands from the deploy node to restart them: ceph-deploy osd prepare ceph3.anycast.net:/tmp/osd2 ceph-deploy osd activate ceph3.anycast.net:/tmp/osd2 My questions are, #1, can it be setup to startup automacially from the system reboot? #2, each time we activate it, we must prepare it firstly? Most *nix boxes wipe /tmp on boot. If you¹re just doing a quick and dirty POC and want to test cluster behavior across reboots, place your osd on in /var/tmp or anywhere else really. JL ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is Ceph a provider of block device too ?
Thank you! I am studying and testing CEPH. I think it will be very good for my needs On 18-11-2013 20:31, Timofey root wrote: On 09 нояб. 2013 г., at 1:46, Gregory Farnum g...@inktank.com wrote: On Fri, Nov 8, 2013 at 8:49 AM, Listas lis...@adminlinux.com.br wrote: Hi ! I have clusters (IMAP service) with 2 members configured with Ubuntu + Drbd + Ext4. Intend to migrate to the use of Ceph and begin to allow distributed access to the data. Does Ceph provides the distributed filesystem and block device? Ceph's RBD is a distributed block device and works very well; you could use it to replace DRBD. The CephFS distributed filesystem is in more of a preview mode and is not supported for general use at this time. Does Ceph work fine in clusters of two members? It should work fine in a cluster of that size, but you're not getting as many advantages over other solutions at such small scales as you do from larger ones. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph (deploy?) and drive paths / mounting / best practice.
On 19.11.2013 06:56, Robert van Leeuwen wrote: Hi, ... It looks like it is just using /dev/sdX for this instead of the /dev/disk/by-id /by-path given by ceph-deploy. ... Hi Robert, I'm using the disk-label: fstab: LABEL=osd.0 /var/lib/ceph/osd/ceph-0 xfs noatime,nodiratime 0 0 Creating with mkfs.xfs -L osd.0 /dev/sdX1 With this config the path can change. Work well for me. Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
On 11/19/13, 2:10 PM, Wolfgang Hennerbichler wo...@wogri.com wrote: On Nov 19, 2013, at 3:47 PM, Bernhard Glomm bernhard.gl...@ecologic.eu wrote: Hi Nicolas just fyi rbd format 2 is not supported yet by the linux kernel (module) I believe this is wrong. I think linux supports rbd format 2 images since 3.10. One more reason to cross our fingers for official Saucy Salamander support soonŠ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
So is there any size limit on RBD images? I had a failure this morning mounting 1TB RBD. Deleting now (why does it take so long to delete if it was never even mapped, much less written to?) and will retry with smaller images. See output below. This is 0.72 on Ubuntu 13.04 with 3.12 kernel. ceph@joceph-client01:~$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ceph@joceph-client01:~$ rbd map testrbd -p testpool01 rbd: add failed: (13) Permission denied ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 000 0 00000 metadata- 000 0 00000 rbd - 120 0 0 10788 testpool01 - 000 0 00000 testpool02 - 000 0 00000 testpool03 - 000 0 00000 testpool04 - 000 0 00000 total used 23287851602 total avail 9218978040 total space11547763200 ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
-Original Message- From: Gruher, Joseph R Sent: Tuesday, November 19, 2013 12:24 PM To: 'Wolfgang Hennerbichler'; Bernhard Glomm Cc: ceph-users@lists.ceph.com Subject: RE: [ceph-users] Size of RBD images So is there any size limit on RBD images? I had a failure this morning mounting 1TB RBD. Deleting now (why does it take so long to delete if it was never even mapped, much less written to?) and will retry with smaller images. See output below. This is 0.72 on Ubuntu 13.04 with 3.12 kernel. ceph@joceph-client01:~$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ceph@joceph-client01:~$ rbd map testrbd -p testpool01 rbd: add failed: (13) Permission denied ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 000 0 0000 0 metadata- 000 0 0000 0 rbd - 120 0 0 1078 8 testpool01 - 000 0 0000 0 testpool02 - 000 0 0000 0 testpool03 - 000 0 0000 0 testpool04 - 000 0 0000 0 total used 23287851602 total avail 9218978040 total space11547763200 ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 I think I figured out where I went wrong here. I had thought if you didn't specify the pool on the 'rbd create' command line you could then later map to any pool. In retrospect that probably doesn't make a lot of sense and it appears if you don't specify the pool at the create step it just defaults to the rbd pool. See example below. ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage5 --pool testpool01 ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage5 --pool testpool01 ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage6 ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool rbd ceph@joceph-client01:/etc/ceph$ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] help v.72configure federate gateway failed
Sorry for the delay, I'm still catching up since the openstack conference. Does the system user for the destination zone exist with the same access secret and key in the source zone? If you enable debug rgw = 30 on the destination you can see why the copy_obj from the source zone is failing. Josh On 11/11/2013 12:52 AM, maoqi1982 wrote: Hi list ceph version is the latest v0.72 emperor, follow the http://ceph.com/docs/master/radosgw/federated-config/ doc i deploy two ceph cluster (one ceph per datasite ) to form a region (a master zone , a slave zone ). the metadata seem to be sync ok. but the object is failed to sync . the error is as following: INFO:radosgw_agent.worker:6053 is processing shard number 47 INFO:radosgw_agent.worker:finished processing shard 47 ** *INFO:radosgw_agent.sync:48/128 items processed* *INFO:radosgw_agent.worker:6053 is processing shard number 48* ** ** *INFO:radosgw_agent.worker:bucket instance east-bucket:us-east.4139.1 has 5 entries after 002.2.3* *INFO:radosgw_agent.worker:syncing bucket east-bucket* *ERROR:radosgw_agent.worker:failed to sync object east-bucket/驽?* *?docx: * *ERROR:radosgw_agent.worker:failed to sync object east-bucket/sss.py: state is error* *ERROR:radosgw_agent.worker:failed to sync object east-bucket/Nfg.docx: state is error* ** INFO:radosgw_agent.worker:finished processing shard 48 INFO:radosgw_agent.worker:6053 is processing shard number 49 INFO:radosgw_agent.sync:49/128 items processed INFO:radosgw_agent.sync:50/128 items processed INFO:radosgw_agent.worker:finished processing shard 49 INFO:radosgw_agent.worker:6053 is processing shard number 50 INFO:radosgw_agent.worker:finished processing shard 50 INFO:radosgw_agent.sync:51/128 items processed INFO:radosgw_agent.worker:6053 is processing shard number 51 INFO:radosgw_agent.worker:finished processing shard 51 INFO:radosgw_agent.sync:52/128 items processed INFO:radosgw_agent.worker:6053 is processing shard number 52 INFO:radosgw_agent.sync:53/128 items processed INFO:radosgw_agent.worker:finished processing shard 52 thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw-agent AccessDenied 403
On 11/13/2013 09:06 PM, lixuehui wrote: And on the slave zone gateway instence ,the info is like this : 2013-11-14 12:54:24.516840 7f51e7fef700 1 == starting new request req=0xb1e3b0 = 2013-11-14 12:54:24.526640 7f51e7fef700 1 == req done req=0xb1e3b0 http_status=200 == 2013-11-14 12:54:24.545440 7f51e4fe9700 1 == starting new request req=0xb1c690 = 2013-11-14 12:54:24.551696 7f51e4fe9700 0 WARNING: couldn't find acl header for bucket, generating default 2013-11-14 12:54:24.566005 7f51e4fe9700 0 HTTP_DATE - Thu Nov 14 04:54:24 2013 2013-11-14 12:54:24.566046 7f51e4fe9700 0 HTTP_X_AMZ_COPY_SOURCE - sss%2Frgwconf 2013-11-14 12:54:24.607998 7f51e4fe9700 1 == req done req=0xb1c690 http_status=403 == 2013-11-14 12:54:24.626466 7f51e27e4700 1 == starting new request req=0xb24260 = Any one could help to find the problem ? Does it mean , we should set acl for the bucket . In fact ,the information goes the same as it before , after setting acl for the bucket . bucket-name sss object-name rgwconf Or is there something wrong with either the HTTP_DATE or HTTP_X_AMS_COPY_SOURCE? Those headers are fine, and it's unrelated to acls since the gateway is using a system user for cross-zone copies, which has full access. Does the system user for the destination zone exist with the same access secret and key in the source zone? Josh lixuehui *发件人:* lixuehui mailto:lixue...@chinacloud.com.cn *发送时间:* 2013-11-13 16:16 *收件人:* ceph-users mailto:ceph-users@lists.ceph.com *主题:* radosgw-agent AccessDenied 403 Hi ,list We've ever reflected that ,radosgw-agent sync data failed all the time ,before. We paste the concert log here to seek any help now . application/json; charset=UTF-8 Wed, 13 Nov 2013 07:24:45 GMT x-amz-copy-source:sss%2Frgwconf /sss/rgwconf 2013-11-13T15:24:45.510 11171:DEBUG:boto:Signature: AWS CQHH7O4XULLINBNQQSPB:9ktSGas0/iuekklDmHRuU+OItek= 2013-11-13T15:24:45.511 11171:DEBUG:boto:url = 'http://ceph-rgw41.com/sss/rgwconf' params={'rgwx-op-id': 'ceph-rgw41:11160:2', 'rgwx-source-zone': u'us-east', 'rgwx-client-id': 'radosgw-agent'} headers={'Content-Length': '0', 'User-Agent': 'Boto/2.16.0 Python/2.7.3 Linux/3.5.0-23-generic', 'x-amz-copy-source': 'sss%2Frgwconf', 'Date': 'Wed, 13 Nov 2013 07:24:45 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Authorization': 'AWS CQHH7O4XULLINBNQQSPB:9ktSGas0/iuekklDmHRuU+OItek='} data=None 2013-11-13T15:24:45.519 11171:INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): ceph-rgw41.com 2013-11-13T15:24:45.580 11171:DEBUG:requests.packages.urllib3.connectionpool:PUT /sss/rgwconf?rgwx-op-id=ceph-rgw41%3A11160%3A2rgwx-source-zone=us-eastrgwx-client-id=radosgw-agent HTTP/1.1 403 78 2013-11-13T15:24:45.584 11171:DEBUG:radosgw_agent.worker:exception during sync: Http error code 403 content ?xml version=1.0 encoding=UTF-8?ErrorCodeAccessDenied/Code/Error 2013-11-13T15:24:45.587 11171:DEBUG:boto:StringToSign: GET Wed, 13 Nov 2013 07:24:45 GMT /admin/opstate 2013-11-13T15:24:45.589 11171:DEBUG:boto:Signature: AWS CQHH7O4XULLINBNQQSPB:JZwaFKhZEsQUj50jLxjNzni8n5Q= 2013-11-13T15:24:45.590 11171:DEBUG:boto:url = 'http://ceph-rgw41.com/admin/opstate' params={'client-id': 'radosgw-agent', 'object': 'sss/rgwconf', 'op-id': 'ceph-rgw41:11160:2'} headers={'Date': 'Wed, 13 Nov 2013 07:24:45 GMT', 'Content-Length': '0', 'Authorization': 'AWS CQHH7O4XULLINBNQQSPB:JZwaFKhZEsQUj50jLxjNzni8n5Q=', 'User-Agent': 'Boto/2.16.0 Python/2.7.3 Linux/3.5.0-23-generic'} data=None 2013-11-13T15:24:45.594 11171:INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): ceph-rgw41.com 2013-11-13T15:24:45.607 11171:DEBUG:requests.packages.urllib3.connectionpool:GET /admin/opstate?client-id=radosgw-agentobject=sss%2Frgwconfop-id=ceph-rgw41%3A11160%3A2 HTTP/1.1 200 None 2013-11-13T15:24:45.612 11171:DEBUG:radosgw_agent.worker:op state is [{u'timestamp': u'2013-11-13 07:24:45.561401Z', u'op_id': u'ceph-rgw41:11160:2', u'object': u'sss/rgwconf', u'state': u'error', u'client_id': u'radosgw-agent'}] 2013-11-13T15:24:45.614 11171:ERROR:radosgw_agent.worker:failed to
Re: [ceph-users] Ephemeral RBD with Havana and Dumpling
On 11/14/2013 09:54 AM, Dmitry Borodaenko wrote: On Thu, Nov 14, 2013 at 6:00 AM, Haomai Wang haomaiw...@gmail.com wrote: We are using the nova fork by Josh Durgin https://github.com/jdurgin/nova/commits/havana-ephemeral-rbd - are there more patches that need to be integrated? I hope I can release or push commits to this branch contains live-migration, incorrect filesystem size fix and ceph-snapshort support in a few days. Can't wait to see this patch! Are you getting rid of the shared storage requirement for live-migration? Yes, that's what Haomai's patch will fix for rbd-based ephemeral volumes (bug https://bugs.launchpad.net/nova/+bug/1250751). Note that volume-backed instances work with live migration just fine without a shared fs for ephemeral disks since Grizzly. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] alternative approaches to CEPH-FS
Hi Yip, Thanks for the code. With respect to can't grow, I think I can (with some difficulty perhaps?) resize the vm if I needed to, but I'm really just trying to buy myself time till CEPH-FS is production readyPoint #3 scares me, so I'll have to think about that one. Most likely I'd use a completely different technology to back-up this VM (eg rsync the key folders to some external, encrypted, cheap RAID storage, such as iomega.) Point 4 is probably not that big a deal for my needs, since CEPH itself should more-or-less ensure high availability due to disk crashes -- and as for VM crashing, I can tolerate a few minutes/hours downtime if needed, and so far I have yet to have a Linux VM crash over 3 years of running them On Mon, Nov 18, 2013 at 1:26 AM, YIP Wai Peng yi...@comp.nus.edu.sg wrote: Hi all, I've uploaded it via github - https://github.com/waipeng/nfsceph. Standard disclaimer applies. :) Actually #3 is a novel idea, I have not thought of it. Thinking about the difference just off the top of my head though, comparatively, #3 will have 1) more overheads (because of the additional VM) 2) Can't grow once you reach the hard limit of 14TB, and if you have multiple of such machines, then fragmentation becomes a problem 3) might have the risk of 14TB partition corruption wiping out all your shares 4) not as easy as HA. Although I have not worked HA into NFSCEPH yet, it should be doable by drdb-ing the NFS data directory, or any other techniques that people use for redundant NFS servers. - WP On Fri, Nov 15, 2013 at 10:26 PM, Gautam Saxena gsax...@i-a-inc.comwrote: Yip, I went to the link. Where can the script ( nfsceph) be downloaded? How's the robustness and performance of this technique? (That is, is there are any reason to believe that it would more/less robust and/or performant than option #3 mentioned in the original thread?) On Fri, Nov 15, 2013 at 1:57 AM, YIP Wai Peng yi...@comp.nus.edu.sgwrote: On Fri, Nov 15, 2013 at 12:08 AM, Gautam Saxena gsax...@i-a-inc.comwrote: 1) nfs over rbd ( http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/) We are now running this - basically an intermediate/gateway node that mounts ceph rbd objects and exports them as NFS. http://waipeng.wordpress.com/2013/11/12/nfsceph/ - WP -- *Gautam Saxena * President CEO Integrated Analysis Inc. Making Sense of Data.™ Biomarker Discovery Software | Bioinformatics Services | Data Warehouse Consulting | Data Migration Consulting www.i-a-inc.com http://www.i-a-inc.com/ gsax...@i-a-inc.com (301) 760-3077 office (240) 479-4272 direct (301) 560-3463 fax -- *Gautam Saxena * President CEO Integrated Analysis Inc. Making Sense of Data.™ Biomarker Discovery Software | Bioinformatics Services | Data Warehouse Consulting | Data Migration Consulting www.i-a-inc.com http://www.i-a-inc.com/ gsax...@i-a-inc.com (301) 760-3077 office (240) 479-4272 direct (301) 560-3463 fax ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Librados Error Codes
On 11/19/2013 05:28 AM, Behar Veliqi wrote: Hi, when using the librados c library, the documentation of the different functions just tells that it returns a negative error code on failure, e.g. the rados_read function (http://ceph.com/docs/master/rados/api/librados/#rados_read). Is there anywhere any further documentation which error code is returned under which condition and how to know _why_ the operation has failed? For some functions there is, but for most of them there are many common errors that aren't listed, and some errors depend on the OSD backend being used. The error codes are all negative POSIX errno values, so many of them should be self-explanatory (i.e. -ENOENT when an object doesn't exist, -EPERM when you don't have access to a pool, -EROFS if you try to write to a snapshot, etc.). It would be good to document these though. If you're looking into librados more, the C header has some more detail in @defgroup blocks that aren't parsed into the web docs: https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h#L279 Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] alternative approaches to CEPH-FS
On Wednesday, 20 November 2013, Gautam Saxena wrote: Hi Yip, Thanks for the code. With respect to can't grow, I think I can (with some difficulty perhaps?) resize the vm if I needed to, but I'm really just trying to buy myself time till CEPH-FS is production readyPoint #3 scares me, so I'll have to think about that one. Most likely I'd use a completely different technology to back-up this VM (eg rsync the key folders to some external, encrypted, cheap RAID storage, such as iomega.) Point 4 is probably not that big a deal for my needs, since CEPH itself should more-or-less ensure high availability due to disk crashes -- and as for VM crashing, I can tolerate a few minutes/hours downtime if needed, and so far I have yet to have a Linux VM crash over 3 years of running them Sorry, I don't really mean can't. It should be easy to expand the rbd and resize2fs the partition, but a drawback is that it will incur downtime. All in all, sounds like a feasible solution too! :) - WP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] alternative approaches to CEPH-FS
On Wednesday, 20 November 2013, Dimitri Maziuk wrote: On 11/18/2013 01:19 AM, YIP Wai Peng wrote: Hi Dima, Benchmark FYI. $ /usr/sbin/bonnie++ -s 0 -n 5:1m:4k Version 1.97 --Sequential Create-- Random Create altair -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min/sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 5:1048576:409618 1 609 11 604 918 1 436 10 686 9 Latency 1187ms 70907us 261ms2352ms 205ms 111ms Is this on an FS on RBD on the NFS server, mounted and exported? Yes it is. And benchmarked on the client nfs machine. I get Version 1.96 --Sequential Create-- Random Create nautilus-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min/sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 5:1048576:409619 1 5422 99 123 241 3 4525 99 127 2 Latency 86381us 746us 44125us 303ms 746us 60720us on an ext4 filesystem mounted on top of DRBD device backed by an LSI raid (10, battery-backed) on the main node and an mdadm raid-0 on the passive node. This is on the server with he FS exported and mounted (but not getting much use -- I ran it after hours). Hm, so maybe this nfsceph is not _that_ bad after all! :) Your read clearly wins, so I'm guessing the drdb write is the slow one. Which drdb mode are you using? On a client during working hours I get Version 1.96 --Sequential Create-- Random Create stingray-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min/sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 5:1048576:4096 3 0 1952 29 1385 13 2 0 1553 18 575 5 Latency 16383ms 19662us153ms8482ms1935us 4467ms -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] install three mon-nodes, two successed, one failed
Hello, I follow the doc there: http://ceph.com/docs/master/start/quick-ceph-deploy/ Just installed three mon-nodes, but one got failed. The command and output: ceph@ceph1:~/my-cluster$ ceph-deploy --overwrite-conf mon create ceph3.geocast.net [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy --overwrite-conf mon create ceph3.geocast.net [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph3.geocast.net [ceph_deploy.mon][DEBUG ] detecting platform for host ceph3 ... [ceph3.geocast.net][DEBUG ] connected to host: ceph3.geocast.net [ceph3.geocast.net][DEBUG ] detect platform information from remote host [ceph3.geocast.net][DEBUG ] detect machine type [ceph_deploy.mon][INFO ] distro info: Ubuntu 12.04 precise [ceph3][DEBUG ] determining if provided host has same hostname in remote [ceph3.geocast.net][DEBUG ] get remote short hostname [ceph3][DEBUG ] deploying mon to ceph3 [ceph3.geocast.net][DEBUG ] get remote short hostname [ceph3.geocast.net][DEBUG ] remote hostname: ceph3 [ceph3.geocast.net][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph3.geocast.net][DEBUG ] create the mon path if it does not exist [ceph3.geocast.net][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph3/done [ceph3.geocast.net][DEBUG ] create a done file to avoid re-doing the mon deployment [ceph3.geocast.net][DEBUG ] create the init path if it does not exist [ceph3.geocast.net][DEBUG ] locating the `service` executable... [ceph3.geocast.net][INFO ] Running command: sudo initctl emit ceph-mon cluster=ceph id=ceph3 [ceph3.geocast.net][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status [ceph3][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [ceph3][WARNIN] monitor: mon.ceph3, might not be running yet [ceph3.geocast.net][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status [ceph3][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [ceph3][WARNIN] monitor ceph3 does not exist in monmap [ceph3][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [ceph3][WARNIN] monitors may not be able to form quorum The mon log for ceph3: 2013-11-20 12:32:07.622385 7f0c24b63780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2267 2013-11-20 12:32:07.711105 7f0c24b63780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:07.711318 7f0c24b63780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:07.730717 7f6b94ed6780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2277 2013-11-20 12:32:07.820180 7f6b94ed6780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:07.820402 7f6b94ed6780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:07.839424 7f62fa747780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2287 2013-11-20 12:32:07.929246 7f62fa747780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:07.929533 7f62fa747780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:07.952320 7f32d060e780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2297 2013-11-20 12:32:08.022799 7f32d060e780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:08.023155 7f32d060e780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:08.042415 7fa5e81dc780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2307 2013-11-20 12:32:08.115243 7fa5e81dc780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:08.115528 7fa5e81dc780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:08.134854 7fd9929c4780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2317 2013-11-20 12:32:08.224308 7fd9929c4780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:08.224541 7fd9929c4780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf I have tried many times, got no luck. Can you help? Thanks in advance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] install three mon-nodes, two successed, one failed
And this is the ceph.conf: [global] fsid = 0615ddc1-abff-4fe2-8919-68448b9f6faa mon_initial_members = ceph2, ceph3, ceph4 mon_host = 172.17.6.66,172.17.6.67,172.17.6.68 auth_supported = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true Thanks. 于 2013-11-20 12:47, Dnsbed Ops 回复: Hello, I follow the doc there: http://ceph.com/docs/master/start/quick-ceph-deploy/ Just installed three mon-nodes, but one got failed. The command and output: ceph@ceph1:~/my-cluster$ ceph-deploy --overwrite-conf mon create ceph3.geocast.net [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy --overwrite-conf mon create ceph3.geocast.net [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph3.geocast.net [ceph_deploy.mon][DEBUG ] detecting platform for host ceph3 ... [ceph3.geocast.net][DEBUG ] connected to host: ceph3.geocast.net [ceph3.geocast.net][DEBUG ] detect platform information from remote host [ceph3.geocast.net][DEBUG ] detect machine type [ceph_deploy.mon][INFO ] distro info: Ubuntu 12.04 precise [ceph3][DEBUG ] determining if provided host has same hostname in remote [ceph3.geocast.net][DEBUG ] get remote short hostname [ceph3][DEBUG ] deploying mon to ceph3 [ceph3.geocast.net][DEBUG ] get remote short hostname [ceph3.geocast.net][DEBUG ] remote hostname: ceph3 [ceph3.geocast.net][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph3.geocast.net][DEBUG ] create the mon path if it does not exist [ceph3.geocast.net][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph3/done [ceph3.geocast.net][DEBUG ] create a done file to avoid re-doing the mon deployment [ceph3.geocast.net][DEBUG ] create the init path if it does not exist [ceph3.geocast.net][DEBUG ] locating the `service` executable... [ceph3.geocast.net][INFO ] Running command: sudo initctl emit ceph-mon cluster=ceph id=ceph3 [ceph3.geocast.net][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status [ceph3][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [ceph3][WARNIN] monitor: mon.ceph3, might not be running yet [ceph3.geocast.net][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status [ceph3][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [ceph3][WARNIN] monitor ceph3 does not exist in monmap [ceph3][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [ceph3][WARNIN] monitors may not be able to form quorum The mon log for ceph3: 2013-11-20 12:32:07.622385 7f0c24b63780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2267 2013-11-20 12:32:07.711105 7f0c24b63780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:07.711318 7f0c24b63780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:07.730717 7f6b94ed6780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2277 2013-11-20 12:32:07.820180 7f6b94ed6780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:07.820402 7f6b94ed6780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:07.839424 7f62fa747780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2287 2013-11-20 12:32:07.929246 7f62fa747780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:07.929533 7f62fa747780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:07.952320 7f32d060e780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2297 2013-11-20 12:32:08.022799 7f32d060e780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:08.023155 7f32d060e780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:08.042415 7fa5e81dc780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2307 2013-11-20 12:32:08.115243 7fa5e81dc780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:08.115528 7fa5e81dc780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf 2013-11-20 12:32:08.134854 7fd9929c4780 0 ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2317 2013-11-20 12:32:08.224308 7fd9929c4780 0 mon.ceph3 does not exist in monmap, will attempt to join an existing cluster 2013-11-20 12:32:08.224541 7fd9929c4780 -1 no public_addr or public_network specified, and mon.ceph3 not present in monmap or ceph.conf I have tried