Re: [ceph-users] a bug of rgw?
rgw_admin.cc not rgw_main.cc or rgw_code.cc. 2014-12-04 16:14 GMT+08:00 han vincent hang...@gmail.com: rgw_admin not rgw_main.cc or rgw_code.cc. 2014-12-04 16:02 GMT+08:00 han vincent hang...@gmail.com: I am sorry, I have made a mistake. The source file of the code is rgw_code.cc instead of rgw_mian.cc. 2014-12-04 11:13 GMT+08:00 han vincent hang...@gmail.com: hello, every one. when I read the source code of ceph of which the version is 0.80.1. I found that line 1646 in rgw_main.cc wrote as following: uint64_t total_time = entry.total_time.sec() * 100LL * entry.total_time.usec(); I did not understand the meaning of this line of code. I think the second operation symbol of “*” should be changed to +. so this line should be uint64_t total_time = entry.total_time.sec() * 100LL + entry.total_time.usec(); . Is it a bug? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] a bug of rgw?
rgw_admin not rgw_main.cc or rgw_code.cc. 2014-12-04 16:02 GMT+08:00 han vincent hang...@gmail.com: I am sorry, I have made a mistake. The source file of the code is rgw_code.cc instead of rgw_mian.cc. 2014-12-04 11:13 GMT+08:00 han vincent hang...@gmail.com: hello, every one. when I read the source code of ceph of which the version is 0.80.1. I found that line 1646 in rgw_main.cc wrote as following: uint64_t total_time = entry.total_time.sec() * 100LL * entry.total_time.usec(); I did not understand the meaning of this line of code. I think the second operation symbol of “*” should be changed to +. so this line should be uint64_t total_time = entry.total_time.sec() * 100LL + entry.total_time.usec(); . Is it a bug? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Empty Rados log
Hi all! I have a CEPH installation with radosgw and the radosgw.log in the /var/log/ceph directory is empty. In the ceph.conf I have log file = /var/log/ceph/radosgw.log debug ms = 1 debug rgw = 20 under the: [client.radosgw.gateway] Any ideas? Best, George ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Radosgw-Agent
Hello - Please help me here. Where I can locate the source package? On Tuesday, December 2, 2014 12:41 PM, lakshmi k s lux...@yahoo.com wrote: Hello: I am trying to locate the source package used for DebianWheezy for the radosgw-agent 1.2-1-bpo70+1 that is available from the cephrepository. Our company requires us to verify package builds fromsource and to check licenses from those same source packages. However I have notbeen able to locate the source package for the 1.2-1~bpo70+1 version that isavailable as a pre-built package for debian wheezy from the current cephsoftware repository. Can anyone tell me where the repo is that I can put intomy sources.list so I can pull this down to do our required verification steps? Thank you.Lakshmi. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] a bug of rgw?
I am sorry, I have made a mistake. The source file of the code is rgw_code.cc instead of rgw_mian.cc. 2014-12-04 11:13 GMT+08:00 han vincent hang...@gmail.com: hello, every one. when I read the source code of ceph of which the version is 0.80.1. I found that line 1646 in rgw_main.cc wrote as following: uint64_t total_time = entry.total_time.sec() * 100LL * entry.total_time.usec(); I did not understand the meaning of this line of code. I think the second operation symbol of “*” should be changed to +. so this line should be uint64_t total_time = entry.total_time.sec() * 100LL + entry.total_time.usec(); . Is it a bug? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Unable to start OSD service of OSD which is in down state
Hi all, One of OSD in my cluster is in down state, I am not able to start service on it. Is there any way that I can start osd service on it? ems@rack2-storage-1:~$ sudo ceph osd tree # idweight type name up/down reweight -1 46.9root default -2 23.45 host rack2-storage-5 1 3.35osd.1 up 1 3 3.35osd.3 up 1 4 3.35osd.4 up 1 5 3.35osd.5 up 1 6 3.35osd.6 up 1 0 3.35osd.0 up 1 2 3.35osd.2 up 1 -3 23.45 host rack2-storage-1 7 3.35osd.7 up 1 8 3.35osd.8 up 1 9 3.35osd.9 up 1 10 3.35osd.10 up 1 11 3.35osd.11 up 1 12 3.35osd.12 up 1 13 3.35osd.13 down1 ems@rack2-storage-1:~$ ems@rack2-storage-1:~$ sudo start ceph-osd id=13 ceph-osd (ceph/13) stop/pre-start, process 161610 ems@rack2-storage-1:~$ ems@rack2-storage-1:~$ ps -ef | grep osd root 32552 1 99 Dec01 ?3-06:46:30 /usr/bin/ceph-osd --cluster=ceph -i 8 -f root 38540 1 99 Dec01 ?4-09:36:39 /usr/bin/ceph-osd --cluster=ceph -i 12 -f root 93973 1 89 14:42 ?02:05:17 /usr/bin/ceph-osd --cluster=ceph -i 7 -f root 101963 1 93 15:16 ?01:38:59 /usr/bin/ceph-osd --cluster=ceph -i 10 -f root 160394 1 90 16:59 ?00:02:01 /usr/bin/ceph-osd --cluster=ceph -i 9 -f root 160764 1 95 16:59 ?00:01:48 /usr/bin/ceph-osd --cluster=ceph -i 11 -f ems 161614 79737 0 17:01 pts/000:00:00 grep --color=auto osd ems@rack2-storage-1:~$ ems@rack2-storage-1:~$ sudo /usr/bin/ceph-osd --cluster=ceph -i 13 -f [1] 161623 ems@rack2-storage-1:~$ 2014-12-04 17:01:50.586525 7fd174248900 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-13: (5) Input/output error [1]+ Exit 1 sudo /usr/bin/ceph-osd --cluster=ceph -i 13 -f ems@rack2-storage-1:~$ -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Tool or any command to inject metadata/data corruption on rbd
Hi all, I would like to know which tool or cli that all users are using to simulate metadata/data corruption. This is to test scrub operation. -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Virtual traffic on cluster network
Hi, i am wondering about running virtual environment traffic (VM - Ceph) traffic on the ceph cluster network by plugging virtual hosts into this network. Is this a good idea? My thoughts are no, as VM - ceph traffic would be client traffic from ceph perspective. Just want the community's thoughts on this. thanks - p ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RadosGW and Apache Limits
Hi! On CentOS 6.6 I have installed CEPH and ceph-radosgw When I try to (re)start the ceph-radosgw service I am getting the following: # service ceph-radosgw restart Stopping radosgw instance(s)...[ OK ] Starting radosgw instance(s)... /usr/bin/dirname: extra operand `-n' Try `/usr/bin/dirname --help' for more information. bash: line 0: ulimit: open files: cannot modify limit: Operation not permitted Starting client.radosgw.gateway... [ OK ] /usr/bin/radosgw is running. # Why is it happening? Is this normal?? If I change /etc/security/limits.conf and I add apache hardnofile 32768 then the error of ulimit dissapears. Is this the correct way? Should I do something else?? Regards, George ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tool or any command to inject metadata/data corruption on rbd
AFAIK there is no tool to do this. You simply rm object or dd a new content in the object (fill with zero) On 04 Dec 2014, at 13:41, Mallikarjun Biradar mallikarjuna.bira...@gmail.com wrote: Hi all, I would like to know which tool or cli that all users are using to simulate metadata/data corruption. This is to test scrub operation. -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Cheers. Sébastien Han Cloud Architect Always give 100%. Unless you're giving blood. Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 11 bis, rue Roquépine - 75008 Paris Web : www.enovance.com - Twitter : @enovance signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Virtual traffic on cluster network
Hi, Ceph cluster network is only useful for OSDs. Your vm only need access to public network (or client network if you prefer). My cluster is also in a virtual environnement. MONs and MDS are virtuals. OSDs are physicals of course. -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On jeu., 2014-12-04 at 12:45 +, Peter wrote: Hi, i am wondering about running virtual environment traffic (VM - Ceph) traffic on the ceph cluster network by plugging virtual hosts into this network. Is this a good idea? My thoughts are no, as VM - ceph traffic would be client traffic from ceph perspective. Just want the community's thoughts on this. thanks - p ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tool or any command to inject metadata/data corruption on rbd
For metadata corruption you would have to modify object file's extended attributes (with xattr for example). -- Tomasz Kuzemko tomasz.kuze...@ovh.net On Thu, Dec 04, 2014 at 02:26:56PM +0100, Sebastien Han wrote: AFAIK there is no tool to do this. You simply rm object or dd a new content in the object (fill with zero) On 04 Dec 2014, at 13:41, Mallikarjun Biradar mallikarjuna.bira...@gmail.com wrote: Hi all, I would like to know which tool or cli that all users are using to simulate metadata/data corruption. This is to test scrub operation. -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Cheers. Sébastien Han Cloud Architect Always give 100%. Unless you're giving blood. Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 11 bis, rue Roquépine - 75008 Paris Web : www.enovance.com - Twitter : @enovance ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com signature.asc Description: Digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Virtual traffic on cluster network
I was thinking the same thing for the following implementation: I would like to have an RBD volume mounted and accessible at the same time by different VMs (using OCFS2). Therefore I was also thinking that I had to put VMs on the internal CEPH network by adding a second NIC and plugging that into this network. Is this a bad idea?? Do you have something else to propose?? Regards, George On Thu, 04 Dec 2014 14:31:01 +0100, Thomas Lemarchand wrote: Hi, Ceph cluster network is only useful for OSDs. Your vm only need access to public network (or client network if you prefer). My cluster is also in a virtual environnement. MONs and MDS are virtuals. OSDs are physicals of course. -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On jeu., 2014-12-04 at 12:45 +, Peter wrote: Hi, i am wondering about running virtual environment traffic (VM - Ceph) traffic on the ceph cluster network by plugging virtual hosts into this network. Is this a good idea? My thoughts are no, as VM - ceph traffic would be client traffic from ceph perspective. Just want the community's thoughts on this. thanks - p ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Incomplete PGs
I have a small update to this: After an even closer reading of an offending pg's query I noticed the following: peer: 4, pgid: 19.6e, last_update: 51072'48910307, last_complete: 51072'48910307, log_tail: 50495'48906592, The log tail seems to have lagged behind the last_update/last_complete. I suspect this is whats causing the cluster to reject these pgs. Anyone know how i can go about cleaning this up? Aaron On Dec 1, 2014, at 8:12 PM, Aaron Bassett aa...@five3genomics.com wrote: Hi all, I have a problem with some incomplete pgs. Here’s the backstory: I had a pool that I had accidently left with a size of 2. On one of the ods nodes, the system hdd started to fail and I attempted to rescue it by sacrificing one of my osd nodes. That went ok and I was able to bring the node back up minus the one osd. Now I have 11 incomplete osds. I believe these are mostly from the pool that only had size two, but I cant tell for sure. I found another thread on here that talked about using ceph_objectstore_tool to add or remove pg data to get out of an incomplete state. Let’s start with the one pg I’ve been playing with, this is a loose description of where I’ve been. First I saw that it had the missing osd in “down_osds_we_would_probe” when I queried it, and some reading around that told me to recreate the missing osd, so I did that. It (obviously) didnt have the missing data, but it took the pg from down+incomplete to just incomplete. Then I tried pg_force_create and that didnt seem to make a difference. Some more googling then brought me to ceph_objectstore_tool and I started to take a closer look at the results from pg query. I noticed that the list of probing osds gets longer and longer till the end of the query has something like: probing_osds: [ 0, 3, 4, 16, 23, 26, 35, 41, 44, 51, 56”], So I took a look at those osds and noticed that some of them have data in the directory for the troublesome pg and others dont. So I tried picking one with the *most* data and i used ceph_objectstore_tool to export the pg. It was 6G so a fair amount of data is still there. I then imported it (after removing) into all the others in that list. Unfortunately, it is still incomplete. I’m not sure what my next step should be here. Here’s some other stuff from the query on it: info: { pgid: 0.63b, last_update: 50495'8246, last_complete: 50495'8246, log_tail: 20346'5245, last_user_version: 8246, last_backfill: MAX, purged_snaps: [], history: { epoch_created: 1, last_epoch_started: 51102, last_epoch_clean: 50495, last_epoch_split: 0, same_up_since: 68312, same_interval_since: 68312, same_primary_since: 68190, last_scrub: 28158'8240, last_scrub_stamp: 2014-11-18 17:08:49.368486, last_deep_scrub: 28158'8240, last_deep_scrub_stamp: 2014-11-18 17:08:49.368486, last_clean_scrub_stamp: 2014-11-18 17:08:49.368486}, stats: { version: 50495'8246, reported_seq: 84279, reported_epoch: 69394, state: down+incomplete, last_fresh: 2014-12-01 23:23:07.355308, last_change: 2014-12-01 21:28:52.771807, last_active: 2014-11-24 13:37:09.784417, last_clean: 2014-11-22 21:59:49.821836, last_became_active: 0.00, last_unstale: 2014-12-01 23:23:07.355308, last_undegraded: 2014-12-01 23:23:07.355308, last_fullsized: 2014-12-01 23:23:07.355308, mapping_epoch: 68285, log_start: 20346'5245, ondisk_log_start: 20346'5245, created: 1, last_epoch_clean: 50495, parent: 0.0, parent_split_bits: 0, last_scrub: 28158'8240, last_scrub_stamp: 2014-11-18 17:08:49.368486, last_deep_scrub: 28158'8240, last_deep_scrub_stamp: 2014-11-18 17:08:49.368486, last_clean_scrub_stamp: 2014-11-18 17:08:49.368486, log_size: 3001, ondisk_log_size: 3001, Also in the peering section, all the peers now have the same last_update: which makes me think it should just pick up and take off. There is another think I’m having problems with and I’m not sure if it’s related or not. I set a crush map manually as I have a mix of ssd and platter osds and it seems to work when I set it, the cluster starts rebalancing, etc, but if I do a restart ceph-all on all my nodes the crush maps seems to revert to the one I didn’t set. I don’t know if its being blocked from taking by these incomplete pgs or if I’m missing a step to get it to “stick” It makes me think when I’m stopping and starting these osds to use ceph_objectstore_tool on them they may be getting out of sync with the cluster. Any insights would be greatly appreciated, Aaron ___ ceph-users mailing list
[ceph-users] Suitable SSDs for journal
Hi all, Does anyone know about a list of good and bad SSD disks for OSD journals? I was pointed to http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ But I was looking for something more complete? For example, I have a Samsung 840 Pro that gives me even worse performance than a Crucial m550... I even thought it was dying (but doesn't seem this is the case). Maybe creating a community-contributed list could be a good idea? Regards Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Suitable SSDs for journal
Hi Eneko, There has been various discussions on the list previously as to the best SSD for Journal use. All of them have pretty much come to the conclusion that the Intel S3700 models are the best suited and in fact work out the cheapest in terms of write durability. Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eneko Lacunza Sent: 04 December 2014 14:35 To: Ceph Users Subject: [ceph-users] Suitable SSDs for journal Hi all, Does anyone know about a list of good and bad SSD disks for OSD journals? I was pointed to http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ But I was looking for something more complete? For example, I have a Samsung 840 Pro that gives me even worse performance than a Crucial m550... I even thought it was dying (but doesn't seem this is the case). Maybe creating a community-contributed list could be a good idea? Regards Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Suitable SSDs for journal
Thanks, will look back in the list archive. On 04/12/14 15:47, Nick Fisk wrote: Hi Eneko, There has been various discussions on the list previously as to the best SSD for Journal use. All of them have pretty much come to the conclusion that the Intel S3700 models are the best suited and in fact work out the cheapest in terms of write durability. Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eneko Lacunza Sent: 04 December 2014 14:35 To: Ceph Users Subject: [ceph-users] Suitable SSDs for journal Hi all, Does anyone know about a list of good and bad SSD disks for OSD journals? I was pointed to http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ But I was looking for something more complete? For example, I have a Samsung 840 Pro that gives me even worse performance than a Crucial m550... I even thought it was dying (but doesn't seem this is the case). Maybe creating a community-contributed list could be a good idea? Regards Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Suitable SSDs for journal
Eneko, I do have plan to push to a performance initiative section on the ceph.com/docs sooner or later so people will put their own results through github PR. On 04 Dec 2014, at 16:09, Eneko Lacunza elacu...@binovo.es wrote: Thanks, will look back in the list archive. On 04/12/14 15:47, Nick Fisk wrote: Hi Eneko, There has been various discussions on the list previously as to the best SSD for Journal use. All of them have pretty much come to the conclusion that the Intel S3700 models are the best suited and in fact work out the cheapest in terms of write durability. Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eneko Lacunza Sent: 04 December 2014 14:35 To: Ceph Users Subject: [ceph-users] Suitable SSDs for journal Hi all, Does anyone know about a list of good and bad SSD disks for OSD journals? I was pointed to http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ But I was looking for something more complete? For example, I have a Samsung 840 Pro that gives me even worse performance than a Crucial m550... I even thought it was dying (but doesn't seem this is the case). Maybe creating a community-contributed list could be a good idea? Regards Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Cheers. Sébastien Han Cloud Architect Always give 100%. Unless you're giving blood. Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 11 bis, rue Roquépine - 75008 Paris Web : www.enovance.com - Twitter : @enovance signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 答复: Re: RBD read-ahead didn't improve 4K read performance
Hi, Maybe this could be interesting for you: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread https://www.mail-archive.com/qemu-devel@nongnu.org/msg268718.html Currently virtio-blk don't support merge request on read. (I think virtio-scsi is already doing it). So, that's mean that seq 4k ios, are aggregated, and so bigger and less ios are going to ceph. So performance should improve. - Mail original - De: duan xufeng duan.xuf...@zte.com.cn À: Alexandre DERUMIER aderum...@odiso.com Cc: ceph-users ceph-us...@ceph.com, si dawei si.da...@zte.com.cn Envoyé: Vendredi 21 Novembre 2014 09:21:49 Objet: 答复: Re: [ceph-users] RBD read-ahead didn't improve 4K read performance Hi, I test in VM with fio, here is the config: [global] direct=1 ioengine=aio iodepth=1 [sequence read 4K] rw=read bs=4K size=1024m directory=/mnt filename=test sequence read 4K: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 fio-2.1.3 Starting 1 process sequence read 4K: Laying out IO file(s) (1 file(s) / 1024MB) ^Cbs: 1 (f=1): [R] [18.0% done] [1994KB/0KB/0KB /s] [498/0/0 iops] [eta 07m:14s] fio: terminating on signal 2 sequence read 4K: (groupid=0, jobs=1): err= 0: pid=1156: Fri Nov 21 12:32:53 2014 read : io=187408KB, bw=1984.1KB/s, iops=496, runt= 94417msec slat (usec): min=22, max=878, avg=48.36, stdev=22.63 clat (usec): min=1335, max=17618, avg=1956.45, stdev=247.26 lat (usec): min=1371, max=17680, avg=2006.97, stdev=248.47 clat percentiles (usec): | 1.00th=[ 1560], 5.00th=[ 1640], 10.00th=[ 1704], 20.00th=[ 1784], | 30.00th=[ 1848], 40.00th=[ 1896], 50.00th=[ 1944], 60.00th=[ 1992], | 70.00th=[ 2064], 80.00th=[ 2128], 90.00th=[ 2192], 95.00th=[ 2288], | 99.00th=[ 2448], 99.50th=[ 2640], 99.90th=[ 3856], 99.95th=[ 4256], | 99.99th=[ 9408] bw (KB /s): min= 1772, max= 2248, per=100.00%, avg=1986.55, stdev=85.76 lat (msec) : 2=60.69%, 4=39.23%, 10=0.07%, 20=0.01% cpu : usr=1.92%, sys=2.98%, ctx=47125, majf=0, minf=28 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued : total=r=46852/w=0/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=187408KB, aggrb=1984KB/s, minb=1984KB/s, maxb=1984KB/s, mint=94417msec, maxt=94417msec Disk stats (read/write): sda: ios=46754/11, merge=0/10, ticks=91144/40, in_queue=91124, util=96.73% the the rados benchmark: # rados -p volumes bench 60 seq -b 4096 -t 1 Total time run: 44.922178 Total reads made: 24507 Read size: 4096 Bandwidth (MB/sec): 2.131 Average Latency: 0.00183069 Max latency: 0.004598 Min latency: 0.001224 Re: [ceph-users] RBD read-ahead didn't improve 4K read performance Alexandre DERUMIER 收件人:duan xufeng 2014/11/21 14:51 抄送: si dawei, ceph-users Hi, I don't have tested yet rbd readhead, but maybe do you reach qemu limit. (by default qemu can use only 1thread/1core to manage ios, check you qemu cpu). Do you have some performance results ? how many iops ? but I have had 4x improvement in qemu-kvm, with virtio-scsi + num_queues + lasts kernel. (4k seq coalesced reads in qemu, was doing bigger iops to ceph). libvirt : controller type='scsi' index='0' model='virtio-scsi' num_queues='8'/ Regards, Alexandre - Mail original - De: duan xufeng duan.xuf...@zte.com.cn À: ceph-users ceph-us...@ceph.com Cc: si dawei si.da...@zte.com.cn Envoyé: Vendredi 21 Novembre 2014 03:58:38 Objet: [ceph-users] RBD read-ahead didn't improve 4K read performance hi, I upgraded CEPH to 0.87 for rbd readahead , but can't see any performance improvement in 4K seq read in the VM. How can I know if the readahead is take effect? thanks. ceph.conf [client] rbd_cache = true rbd_cache_size = 335544320 rbd_cache_max_dirty = 251658240 rbd_cache_target_dirty = 167772160 rbd readahead trigger requests = 1 rbd readahead max bytes = 4194304 rbd readahead disable after bytes = 0 ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s). If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited. If you have received this mail in error, please delete it and notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ZTE Information Security Notice: The information contained in this mail (and any attachment
Re: [ceph-users] Virtual traffic on cluster network
perhaps I'm confused about it, but what I mean is, virtual host to storage traffic, ie: physical virtual host machines plugged into ceph cluster network. - P On 04/12/14 13:54, Georgios Dimitrakakis wrote: I was thinking the same thing for the following implementation: I would like to have an RBD volume mounted and accessible at the same time by different VMs (using OCFS2). Therefore I was also thinking that I had to put VMs on the internal CEPH network by adding a second NIC and plugging that into this network. Is this a bad idea?? Do you have something else to propose?? Regards, George On Thu, 04 Dec 2014 14:31:01 +0100, Thomas Lemarchand wrote: Hi, Ceph cluster network is only useful for OSDs. Your vm only need access to public network (or client network if you prefer). My cluster is also in a virtual environnement. MONs and MDS are virtuals. OSDs are physicals of course. -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On jeu., 2014-12-04 at 12:45 +, Peter wrote: Hi, i am wondering about running virtual environment traffic (VM - Ceph) traffic on the ceph cluster network by plugging virtual hosts into this network. Is this a good idea? My thoughts are no, as VM - ceph traffic would be client traffic from ceph perspective. Just want the community's thoughts on this. thanks - p ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Giant osd problems - loss of IO
On Fri, Nov 14, 2014 at 4:38 PM, Andrei Mikhailovsky and...@arhont.com wrote: Any other suggestions why several osds are going down on Giant and causing IO to stall? This was not happening on Firefly. Thanks I had a very similar probem to yours which started after upgrading from Firefly to Giant and then later I added two new osd nodes, with 7 osds on each. My cluster originally had 4 nodes, with 7 osds on each node, 28 osds total, running Gian. I did not have any problems at this time. My problems started after adding two new nodes, so I had 6 nodes and 42 total osds. It would run fine on low load, but when the request load increased, osds started to fall over. I was able to set the debug_ms to 10 and capture the logs from a failed OSD. There were a few different reasons the osds were going down. This example shows it terminating normally for an unspecified reason a minute after it notices it is marked down in the map. Osd 25 actually marks this osd (osd 35) down. For some reason many osds cannot communicate with each other. There are other examples where I see the heartbeat_check: no reply from osd.blah message for long periods of time (hours) and neither osd crashes or terminates. 2014-12-01 16:27:06.772616 7f8b642d1700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:06.056972 (cutoff 2014-12-01 16:26:46.772608) 2014-12-01 16:27:07.772767 7f8b642d1700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:06.056972 (cutoff 2014-12-01 16:26:47.772759) 2014-12-01 16:27:08.772990 7f8b642d1700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:06.056972 (cutoff 2014-12-01 16:26:48.772982) 2014-12-01 16:27:09.559894 7f8b3b1fe700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:06.056972 (cutoff 2014-12-01 16:26:49.559891) 2014-12-01 16:27:09.773177 7f8b642d1700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:09.559087 (cutoff 2014-12-01 16:26:49.773173) 2014-12-01 16:27:10.773307 7f8b642d1700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:09.559087 (cutoff 2014-12-01 16:26:50.773299) 2014-12-01 16:27:11.261557 7f8b3b1fe700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:09.559087 (cutoff 2014-12-01 16:26:51.261554) 2014-12-01 16:27:11.773512 7f8b642d1700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:11.260129 (cutoff 2014-12-01 16:26:51.773504) 2014-12-01 16:27:12.773741 7f8b642d1700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:11.260129 (cutoff 2014-12-01 16:26:52.773733) 2014-12-01 16:27:13.773884 7f8b642d1700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:11.260129 (cutoff 2014-12-01 16:26:53.773876) 2014-12-01 16:27:14.163369 7f8b3b1fe700 -1 osd.35 79679 heartbeat_check: no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 16:27:11.260129 (cutoff 2014-12-01 16:26:54.163366) 2014-12-01 16:27:14.507632 7f8b4fb7f700 0 -- 172.1.2.6:6802/5210 172.1.2.5:6802/2755 pipe(0x2af06940 sd=57 :51521 s=2 pgs=384 cs=1 l=0 c=0x2af094a0).fault with nothing to send, going to standby 2014-12-01 16:27:14.511704 7f8b37af1700 0 -- 172.1.2.6:6802/5210 172.1.2.2:6812/34015988 pipe(0x2af06c00 sd=69 :41512 s=2 pgs=38842 cs=1 l=0 c=0x2af09600).fault with nothing to send, going to standby 2014-12-01 16:27:14.511966 7f8b5030c700 0 -- 172.1.2.6:6802/5210 172.1.2.4:6802/40022302 pipe(0x30cbcdc0 sd=93 :6802 s=2 pgs=66722 cs=3 l=0 c=0x2af091e0).fault with nothing to send, going to standby 2014-12-01 16:27:14.514744 7f8b548a5700 0 -- 172.1.2.6:6802/5210 172.1.2.2:6800/9016639 pipe(0x2af04dc0 sd=38 :60965 s=2 pgs=11747 cs=1 l=0 c=0x2af086e0).fault with nothing to send, going to standby 2014-12-01 16:27:14.516712 7f8b349c7700 0 -- 172.1.2.6:6802/5210 172.1.2.2:6802/25277 pipe(0x2b04cc00 sd=166 :6802 s=2 pgs=62 cs=1 l=0 c=0x2b043080).fault with nothing to send, going to standby 2014-12-01 16:27:14.516814 7f8b2bd3b700 0 -- 172.1.2.6:6802/5210 172.1.2.4:6804/16770 pipe(0x30cbd600 sd=79 :6802 s=2 pgs=607 cs=3 l=0 c=0x2af08c60).fault with nothing to send, going to standby 2014-12-01 16:27:14.518439 7f8b2a422700 0 -- 172.1.2.6:6802/5210 172.1.2.5:6806/31172 pipe(0x30cbc840 sd=28 :6802 s=2 pgs=22 cs=1 l=0 c=0x3041f5a0).fault with nothing to send, going to standby 2014-12-01 16:27:14.518883 7f8b589ba700 0 -- 172.1.2.6:6802/5210 172.1.2.1:6803/4031631 pipe(0x2af042c0 sd=32 :58296 s=2 pgs=35500 cs=3 l=0 c=0x2af08160).fault with nothing to
[ceph-users] v0.89 released
This is the second development release since Giant. The big items include the first batch of scrub patchs from Greg for CephFS, a rework in the librados object listing API to properly handle namespaces, and a pile of bug fixes for RGW. There are also several smaller issues fixed up in the performance area with buffer alignment and memory copies, osd cache tiering agent, and various CephFS fixes. Upgrading - * New ability to list all objects from all namespaces can fail or return incomplete results when not all OSDs have been upgraded. Features rados --all ls, rados cppool, rados export, rados cache-flush-evict-all and rados cache-try-flush-evict-all can also fail or return incomplete results. Notable Changes --- * buffer: add list::get_contiguous (Sage Weil) * buffer: avoid rebuild if buffer already contiguous (Jianpeng Ma) * ceph-disk: improved systemd support (Owen Synge) * ceph-disk: set guid if reusing journal partition (Dan van der Ster) * ceph-fuse, libcephfs: allow xattr caps in inject_release_failure (#9800 John Spray) * ceph-fuse, libcephfs: fix I_COMPLETE_ORDERED checks (#9894 Yan, Zheng) * ceph-fuse: fix dentry invalidation on 3.18+ kernels (#9997 Yan, Zheng) * crush: fix detach_bucket (#10095 Sage Weil) * crush: fix several bugs in adjust_item_weight (Rongze Zhu) * doc: add dumpling to firefly upgrade section (#7679 John Wilkins) * doc: document erasure coded pool operations (#9970 Loic Dachary) * doc: file system osd config settings (Kevin Dalley) * doc: key/value store config reference (John Wilkins) * doc: update openstack docs for Juno (Sebastien Han) * fix cluster logging from non-mon daemons (Sage Weil) * init-ceph: check for systemd-run before using it (Boris Ranto) * librados: fix infinite loop with skipped map epochs (#9986 Ding Dinghua) * librados: fix iterator operator= bugs (#10082 David Zafman, Yehuda Sadeh) * librados: fix null deref when pool DNE (#9944 Sage Weil) * librados: fix timer race from recent refactor (Sage Weil) * libradosstriper: fix shutdown hang (Dongmao Zhang) * librbd: don't close a closed parent in failure path (#10030 Jason Dillaman) * librbd: fix diff test (#10002 Josh Durgin) * librbd: fix locking for readahead (#10045 Jason Dillaman) * librbd: refactor unit tests to use fixtures (Jason Dillaman) * many many coverity cleanups (Danny Al-Gaaf) * mds: a whole bunch of initial scrub infrastructure (Greg Farnum) * mds: fix compat_version for MClientSession (#9945 John Spray) * mds: fix reply snapbl (Yan, Zheng) * mon: allow adding tiers to fs pools (#10135 John Spray) * mon: fix MDS health status from peons (#10151 John Spray) * mon: fix caching for min_last_epoch_clean (#9987 Sage Weil) * mon: fix error output for add_data_pool (#9852 Joao Eduardo Luis) * mon: include entity name in audit log for forwarded requests (#9913 Joao Eduardo Luis) * mon: paxos: allow reads while proposing (#9321 #9322 Joao Eduardo Luis) * msgr: asyncmessenger: add kqueue support (#9926 Haomai Wang) * osd, librados: revamp PG listing API to handle namespaces (#9031 #9262 #9438 David Zafman) * osd, mon: send intiial pg create time from mon to osd (#9887 David Zafman) * osd: allow whiteout deletion in cache pool (Sage Weil) * osd: cache pool: ignore min flush age when cache is full (Xinze Chi) * osd: erasure coding: allow bench.sh to test ISA backend (Yuan Zhou) * osd: erasure-code: encoding regression tests, corpus (#9420 Loic Dachary) * osd: fix journal shutdown race (Sage Weil) * osd: fix object age eviction (Zhiqiang Wang) * osd: fix object atime calculation (Xinze Chi) * osd: fix past_interval display bug (#9752 Loic Dachary) * osd: journal: fix alignment checks, avoid useless memmove (Jianpeng Ma) * osd: journal: update committed_thru after replay (#6756 Samuel Just) * osd: keyvaluestore_dev: optimization (Chendi Xue) * osd: make misdirected op checks robust for EC pools (#9835 Sage Weil) * osd: removed some dead code (Xinze Chi) * qa: parallelize make check (Loic Dachary) * qa: tolerate nearly-full disk for make check (Loic Dachary) * rgw: create subuser if needed when creating user (#10103 Yehuda Sadeh) * rgw: fix If-Modified-Since (VRan Liu) * rgw: fix content-length update (#9576 Yehuda Sadeh) * rgw: fix disabling of max_size quota (#9907 Dong Lei) * rgw: fix incorrect len when len is 0 (#9877 Yehuda Sadeh) * rgw: fix object copy content type (#9478 Yehuda Sadeh) * rgw: fix user stags in get-user-info API (#9359 Ray Lv) * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda Sadeh) * rgw: return timestamp on GET/HEAD (#8911 Yehuda Sadeh) * rgw: set ETag on object copy (#9479 Yehuda Sadeh) * rgw: update bucket index on attr changes, for multi-site sync (#5595 Yehuda Sadeh) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.89.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see
[ceph-users] Giant or Firefly for production
Hi Cephers, Have anyone of you decided to put Giant into production instead of Firefly? Any gotchas? Regards Anthony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Hello, This morning I decided to reboot a storage node (Debian Jessie, thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some changes. It came back up one OSD short, the last log lines before the reboot are: --- 2014-12-05 09:35:27.700330 7f87e789c700 2 -- 10.0.8.21:6823/29520 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] cancel_copy_ops --- Quite obviously it didn't complete its shutdown, so unsurprisingly we get: --- 2014-12-05 09:37:40.278128 7f218a7037c0 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1 filestore(/var/lib/ceph/osd/ceph-4) mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument 2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1 ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m --- Thankfully this isn't production yet and I was eventually able to recover the OSD by re-creating the journal (ceph-osd -i 4 --mkjournal), but it leaves me with a rather bad taste in my mouth. So the pertinent questions would be: 1. What caused this? My bet is on the evil systemd just pulling the plug before the poor OSD had finished its shutdown job. 2. How to prevent it from happening again? Is there something the Ceph developers can do with regards to init scripts? Or is this something to be brought up with the Debian maintainer? Debian is transiting from sysv-init to systemd (booo!) with Jessie, but the OSDs still have a sysvinit magic file in their top directory. Could this have an affect on things? 3. Is it really that easy to trash your OSDs? In the case a storage node crashes, am I to expect most if not all OSDs or at least their journals to require manual loving? Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com