[ceph-users] questions about rgw, multiple zones
hi,all 1. how to set region's endpoints? how to know there are how many endpoints? 2. i follow the step of 'create a region', but after that, i can list the new region. default region is always there. 3. there is one rgw for each zone. after rgw starts up. i can find the pools related to a zone are created? thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw issues
i ignore a detail, to set FastCgiWrapper off thanks At 2014-10-30 10:01:19, "yuelongguang" wrote: lists ceph, hi: how do you solve this issue? i run into it when i tryy to deploy 2 rgws on one ceph cluster in default region and default zone. thanks At 2014-07-01 09:06:24, "Brian Rak" wrote: >That sounds like you have some kind of odd situation going on. We only >use radosgw with nginx/tengine so I can't comment on the apache part of it. > >My understanding is this: > >You start ceph-radosgw, this creates a fastcgi socket somewhere (verify >this is created with lsof, there are some permission problems that will >result in radosgw running, but not opening the socket). > >Apache is configured to connect to this socket, and forward any incoming >requests. Apache should not be launching things. > >I did set Apache up once to test a bug, so I took a look at my config. >I do *not* have a s3gw.fcgi file on disk. Have you tried removing >that? I think that with FastCgiExternalServer, you don't need >s3gw.fcgi. The other thing that got me was socket path in >FastCgiExternalServer is relative to whatever you have FastCgiIpcDir set >to (which the Ceph docs don't seem to take into account). > > >On 6/30/2014 8:40 PM, lists+c...@deksai.com wrote: >> On 2014-06-16 13:16, lists+c...@deksai.com wrote: >>> I've just tried setting up the radosgw on centos6 according to >>> http://ceph.com/docs/master/radosgw/config/ >> >>> While I can run the admin commands just fine to create users etc., >>> making a simple wget request to the domain I set up returns a 500 due >>> to a timeout. Every request I make results in another radosgw process >>> being created, which seems to start even more processes itself. I >>> only have to make a few requests to have about 60 radosgw processes. >>> >> >> Guess I'll try again. I gave this another shot, following the >> documentation, and still end up with basically a fork bomb rather than >> the nice ListAllMyBucketsResult output that the docs say I should >> get. Everything else about the cluster works fine, and I see others >> talking about the gateway as if it just worked, so I'm led to believe >> that I'm probably doing something stupid. Has anybody else run into >> the situation where apache times out while fastcgi just launches more >> and more processes? >> >> The init script launches a process, and the webserver seems to launch >> the same thing, so I'm not clear on what should be happening here. >> Either way, I get nothing back when making a simple GET request to the >> domain. >> >> If anybody has suggestions, even if they are "You nincompoop! >> Everybody knows that you need to do such and such", that would be >> helpful. >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw issues
lists ceph, hi: how do you solve this issue? i run into it when i tryy to deploy 2 rgws on one ceph cluster in default region and default zone. thanks At 2014-07-01 09:06:24, "Brian Rak" wrote: >That sounds like you have some kind of odd situation going on. We only >use radosgw with nginx/tengine so I can't comment on the apache part of it. > >My understanding is this: > >You start ceph-radosgw, this creates a fastcgi socket somewhere (verify >this is created with lsof, there are some permission problems that will >result in radosgw running, but not opening the socket). > >Apache is configured to connect to this socket, and forward any incoming >requests. Apache should not be launching things. > >I did set Apache up once to test a bug, so I took a look at my config. >I do *not* have a s3gw.fcgi file on disk. Have you tried removing >that? I think that with FastCgiExternalServer, you don't need >s3gw.fcgi. The other thing that got me was socket path in >FastCgiExternalServer is relative to whatever you have FastCgiIpcDir set >to (which the Ceph docs don't seem to take into account). > > >On 6/30/2014 8:40 PM, lists+c...@deksai.com wrote: >> On 2014-06-16 13:16, lists+c...@deksai.com wrote: >>> I've just tried setting up the radosgw on centos6 according to >>> http://ceph.com/docs/master/radosgw/config/ >> >>> While I can run the admin commands just fine to create users etc., >>> making a simple wget request to the domain I set up returns a 500 due >>> to a timeout. Every request I make results in another radosgw process >>> being created, which seems to start even more processes itself. I >>> only have to make a few requests to have about 60 radosgw processes. >>> >> >> Guess I'll try again. I gave this another shot, following the >> documentation, and still end up with basically a fork bomb rather than >> the nice ListAllMyBucketsResult output that the docs say I should >> get. Everything else about the cluster works fine, and I see others >> talking about the gateway as if it just worked, so I'm led to believe >> that I'm probably doing something stupid. Has anybody else run into >> the situation where apache times out while fastcgi just launches more >> and more processes? >> >> The init script launches a process, and the webserver seems to launch >> the same thing, so I'm not clear on what should be happening here. >> Either way, I get nothing back when making a simple GET request to the >> domain. >> >> If anybody has suggestions, even if they are "You nincompoop! >> Everybody knows that you need to do such and such", that would be >> helpful. >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] fail to add another rgw
hi, clewis: my environment: one ceph cluster, 3 nodes, each node has one monitor and one osd. one rgw(rgw1) which is on one of them(osd1). before i deploy the second rgw(rgw2), the first rgw works well. after i deploy a second rgw, which can not start. the number of radosgw process increases constantly. the configuration file of rgw1 and rgw2 are almost same, except servername, host option of section of client.radosgw.gateway in ceph.conf. default region, default zone. another test: i shutdown rgw1, then try to start rgw2. of course rgw3 can not start like before. then i try to restart rgw1, it fails . errors are almost same as rgw2? i try to deploy multiple rgw on one ceph cluster in default zone and default region。 thanks。 ---log--- [root@cephosd2-monb ceph]# /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug-rgw=10 -n client.radosgw.gateway 2014-10-29 21:59:10.763921 7f32d24cf820 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 11887 2014-10-29 21:59:10.767922 7f32d24cf820 -1 asok(0xaa3110) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.radosgw.gateway.asok': (17) File exists 2014-10-29 21:59:10.776185 7f32c17fb700 2 RGWDataChangesLog::ChangesRenewThread: start 2014-10-29 21:59:10.777282 7f32d24cf820 10 cache get: name=.rgw.root+default.region : miss 2014-10-29 21:59:10.780609 7f32d24cf820 10 cache put: name=.rgw.root+default.region 2014-10-29 21:59:10.780617 7f32d24cf820 10 adding .rgw.root+default.region to cache LRU end 2014-10-29 21:59:10.780634 7f32d24cf820 10 cache get: name=.rgw.root+default.region : type miss (requested=1, cached=6) 2014-10-29 21:59:10.780658 7f32d24cf820 10 cache get: name=.rgw.root+default.region : hit 2014-10-29 21:59:10.781820 7f32d24cf820 10 cache put: name=.rgw.root+default.region 2014-10-29 21:59:10.781825 7f32d24cf820 10 moving .rgw.root+default.region to cache LRU end 2014-10-29 21:59:10.781883 7f32d24cf820 10 cache get: name=.rgw.root+region_info.default : miss 2014-10-29 21:59:10.783149 7f32d24cf820 10 cache put: name=.rgw.root+region_info.default 2014-10-29 21:59:10.783156 7f32d24cf820 10 adding .rgw.root+region_info.default to cache LRU end 2014-10-29 21:59:10.783168 7f32d24cf820 10 cache get: name=.rgw.root+region_info.default : type miss (requested=1, cached=6) 2014-10-29 21:59:10.783187 7f32d24cf820 10 cache get: name=.rgw.root+region_info.default : hit 2014-10-29 21:59:10.784622 7f32d24cf820 10 cache put: name=.rgw.root+region_info.default 2014-10-29 21:59:10.784627 7f32d24cf820 10 moving .rgw.root+region_info.default to cache LRU end 2014-10-29 21:59:10.784671 7f32d24cf820 10 cache get: name=.rgw.root+zone_info.default : miss 2014-10-29 21:59:10.788050 7f32d24cf820 10 cache put: name=.rgw.root+zone_info.default 2014-10-29 21:59:10.788071 7f32d24cf820 10 adding .rgw.root+zone_info.default to cache LRU end 2014-10-29 21:59:10.788091 7f32d24cf820 10 cache get: name=.rgw.root+zone_info.default : type miss (requested=1, cached=6) 2014-10-29 21:59:10.788125 7f32d24cf820 10 cache get: name=.rgw.root+zone_info.default : hit 2014-10-29 21:59:10.789630 7f32d24cf820 10 cache put: name=.rgw.root+zone_info.default 2014-10-29 21:59:10.789645 7f32d24cf820 10 moving .rgw.root+zone_info.default to cache LRU end 2014-10-29 21:59:10.789695 7f32d24cf820 2 zone default is master 2014-10-29 21:59:10.789742 7f32d24cf820 10 cache get: name=.rgw.root+region_map : miss 2014-10-29 21:59:10.791929 7f32d24cf820 10 cache put: name=.rgw.root+region_map 2014-10-29 21:59:10.791958 7f32d24cf820 10 adding .rgw.root+region_map to cache LRU end 2014-10-29 21:59:10.898679 7f32c0af7700 2 garbage collection: start 2014-10-29 21:59:10.899114 7f32a35fe700 0 ERROR: can't get key: ret=-2 2014-10-29 21:59:10.899663 7f32a35fe700 0 ERROR: sync_all_users() returned ret=-2 2014-10-29 21:59:10.900019 7f32d24cf820 0 framework: fastcgi 2014-10-29 21:59:10.900046 7f32d24cf820 0 starting handler: fastcgi 2014-10-29 21:59:10.909479 7f32a20fb700 10 allocated request req=0x7f329400b7c0 2014-10-29 21:59:10.926163 7f32c0af7700 0 RGWGC::process() failed to acquire lock on gc.89 2014-10-29 21:59:10.927823 7f32c0af7700 0 RGWGC::process() failed to acquire lock on gc.90 2014-10-29 21:59:10.958487 7f32c0af7700 0 RGWGC::process() failed to acquire lock on gc.93 2014-10-29 21:59:11.002497 7f32c0af7700 0 RGWGC::process() failed to acquire lock on gc.97 2014-10-29 21:59:11.032245 7f32c0af7700 0 RGWGC::process() failed to acquire lock on gc.0___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] can we deploy multi-rgw on one ceph cluster?
clewis: 1. i do not understand the use case of multiple regions and multiple zones and multiple rgw. in actual use(upload, download) , users have dealings with rgw directly, in this process , what do regions and zones do? 2. it is radosgw-agent to sync data and metadata. if data synchronization is finished right after metadata is synced , or conversely? if not, how is it when user access a file throught another region/zone? 3. during using multiple regions/zones,rgw , do they have any relations with cdn(Content Delivery Network)? or how to use multiple regions/zones/rgws with cdn? thanks, looking forward to your reply. 在 2014-10-28 02:29:03,"Craig Lewis" 写道: On Sun, Oct 26, 2014 at 9:08 AM, yuelongguang wrote: hi, 1. if one radosgw daemon corresponds to one zone ? the rate is 1:1 Not necessarily. You need at least one radosgw daemon per zone, but you can have more. I have a two small clusters. The primary has 5 nodes, and the secondary has 4 nodes. Every node in the clusters run an apache and radosgw. It's possible (and confusing) to run multiple radosgw daemons on a single node for different clusters. You can either use Apache VHosts, or have CivetWeb listening on different ports. I won't recommend this though, as it introduces a common failure mode to both zones. 2. it seems that we can deploy any number of rgw in a gingle ceph cluster, those rgw can work separately or cooperate by using radosgw-agent to sync data and metadata, am i right? You can deploy as many zones as you want in a single cluster. Each zone needs a set of pools and a radosgw daemon. They can be completely independant, or have a master-slave replication setup using radosgw-agent. Keep in mind that radosgw-agent is not bi-directional replication, and the secondary zone is read-only. 3. do you know how to set up load balance for rgws? is nginx a good choose, how to let nginx work with rgw? Any Load Balancer should work, since the protocol is just HTTP/HTTPS. Some people on the list had issues with nginx. Search the list archive for radosgw and tengine. I'm using HAProxy, and it's working for me. I have a slight issue in my secondary cluster, with locking during replication. I believe I need to enable some kind of stickiness, but I haven't gotten around to investigating. In the mean time, I've configured that cluster with a single node in the active backend, and the other nodes in a backup backend. It's not a setup that can work for everybody, but it meets my needs until I fix the real issue. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] can we deploy multi-rgw on one ceph cluster?
hi, 1. if one radosgw daemon corresponds to one zone ? the rate is 1:1 2. it seems that we can deploy any number of rgw in a gingle ceph cluster, those rgw can work separately or cooperate by using radosgw-agent to sync data and metadata, am i right? 3. do you know how to set up load balance for rgws? is nginx a good choose, how to let nginx work with rgw? thanks At 2014-10-25 05:53:27, "Craig Lewis" wrote: You can deploy multiple RadosGW in a single cluster. You'll need to setup zones (see http://ceph.com/docs/master/radosgw/federated-config/). Most people seem to be using zones for geo-replication, but local replication works even better. Multiple zones don't have to be replicated either. For example, you could use multiple zones for tiered services. For example, a service with 4x replication on pure SSDs, and a cheaper service with 2x replication on HDDs. If you do have separate zones in a single cluster, you'll want to configure different OSDs to serve the different zones. You want fault isolation between the zones. The problems this brings are mostly management of the extra complexity. CivetWeb is embedded into the RadosGW daemon, where as Apache talks to RadosGW using FastCGI. Overall, CivetWeb should be simpler to setup and manage, since it doesn't require Apache, it's configuration, or the overhead. I don't know if Civetweb is considered production ready. Giant has a bunch of fixes for Civetweb, so I'm leaning towards "not on Firefly" unless somebody more knowledgeable tells me otherwise. On Thu, Oct 23, 2014 at 11:04 PM, yuelongguang wrote: hi,yehuda 1. can we deploy multi-rgws on one ceph cluster? if so does it bring us any problems? 2. what is the major difference between apache and civetweb? what is civetweb's advantage? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] can we deploy multi-rgw on one ceph cluster?
hi,yehuda 1. can we deploy multi-rgws on one ceph cluster? if so does it bring us any problems? 2. what is the major difference between apache and civetweb? what is civetweb's advantage? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] can we deploy multi-rgw on one ceph cluster?
hi,all can we deploy multi-rgws on one ceph cluster? if so does it bring us any problems? thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] question about erasure coded pool and rados
1. why erasure coded pool does not work with rbd? 2. i used rados command to put a file into erasue coded pool,then rm it. why the file remains on osd's backend fs all the time? 3. what is the best use case with erasure coded pool? 4. command of 'rados ls' is to list objects, where are the object-name stored? 5.when a rbd file is put on erasure coded pool, where is the infomation(rbd name) of the rbd stored? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pool size/min_size does not make any effect on erasure-coded pool, right?
hi,all pool size/min_size does not make any effect on erasure-coded pool,right? and erasure-coded pool does support rbd? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pool size/min_size does not make any effect on erasure-coded pool, right?
hi,all pool size/min_size does not make any effect on erasure-coded pool,right? thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] what does osd's ms_objecter do? and who will connect it?
thanks, sage weil. writing fs is a serious matter,we should make it clear, includes coding style. there are other places we should fix. thanks At 2014-09-29 12:10:52, "Sage Weil" wrote: >On Mon, 29 Sep 2014, yuelongguang wrote: >> hi, sage will1. >> you mean if i use cache tiering, client's objecter can know how to connect >> to osd daemon's objecter? >> if i can see it throught osdmap? >> 2. >> i know if rbd use cache, it use objecter, so i thought objecter is a cache >> for IO scatter/gather. >> i do not know 'COPYFROM operation ', you mention. >> >> 3. >> i search radosclient.cc, i know objecter is a client for osd . >> but for osd daemon, the code tells me osd's objecter is a listen socket. >> ceph-osd.cc : main: >> ms_objecter->bind(g_conf->public_addr) >> >> OSD:OSD :: objecter_messenger(ms_objecter) >> >> objecter_messenger->add_dispatcher_head(&service.objecter_dispatcher) , this >> tells me all messages received from pipes(which is accepted by >> objecter_messenger) are handled by Objecter::dispatch. right? >> >> no code tells me this objecter connect other osds actively. > >Yeah. I mean that the bind() call for ms_objecter is a mistake. It only >initiates client-side connections, so it doesn't need to bind/listen. >I'll fix it shortly. (It's pretty harmless, though.. just confusing.) > >sage > > >> >> maybe i miss something. >> thanks >> >> >> >> At 2014-09-29 11:23:37, "Sage Weil" wrote: >> >On Mon, 29 Sep 2014, yuelongguang wrote: >> >> hi,all >> >> 1. >> >> >> >> and who will connect it? as for osd, this ms_objecter is listen socket. >> >> it is not included in osdmap. so how to know ms_objecter's listen addre >> ss >> >> and connect it. >> > >> >Ah, that's a mistake. It is only used to connect to other OSDs as a >> >client for the COPYFROM operation and for cache tiering. >> > >> >sage >> >> >> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] what does osd's ms_objecter do? and who will connect it?
hi, sage will 1. you mean if i use cache tiering, client's objecter can know how to connect to osd daemon's objecter? if i can see it throught osdmap? 2. i know if rbd use cache, it use objecter, so i thought objecter is a cache for IO scatter/gather. i do not know 'COPYFROM operation ', you mention. 3. i search radosclient.cc, i know objecter is a client for osd . but for osd daemon, the code tells me osd's objecter is a listen socket. ceph-osd.cc : main: ms_objecter->bind(g_conf->public_addr) OSD:OSD :: objecter_messenger(ms_objecter) objecter_messenger->add_dispatcher_head(&service.objecter_dispatcher) , this tells me all messages received from pipes(which is accepted by objecter_messenger) are handled by Objecter::dispatch. right? no code tells me this objecter connect other osds actively. maybe i miss something. thanks At 2014-09-29 11:23:37, "Sage Weil" wrote: >On Mon, 29 Sep 2014, yuelongguang wrote: >> hi,all >> 1. >> >> and who will connect it? as for osd, this ms_objecter is listen socket. >> it is not included in osdmap. so how to know ms_objecter's listen address >> and connect it. > >Ah, that's a mistake. It is only used to connect to other OSDs as a >client for the COPYFROM operation and for cache tiering. > >sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] what does osd's ms_objecter do? and who will connect it?
hi,all 1. and who will connect it? as for osd, this ms_objecter is listen socket. it is not included in osdmap. so how to know ms_objecter's listen address and connect it. thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] bug: ceph-deploy does not support jumbo frame
thanks. i have not configured switch. i just know about it. 在 2014-09-25 12:38:48,"Irek Fasikhov" 写道: You have configured the switch? 2014-09-25 5:07 GMT+04:00 yuelongguang : hi,all after i set mtu=9000, ceph-deply waits reply all the time , 'detecting platform for host.' how to know what commands ceph-deploy need that osd to do? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] bug: ceph-deploy does not support jumbo frame
hi,all after i set mtu=9000, ceph-deply waits reply all the time , 'detecting platform for host.' how to know what commands ceph-deploy need that osd to do? thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] question about client's cluster aware
hi,all my question is from my test. let's take a example. object1(4MB)--> pg 0.1 --> osd 1,2,3,p1 when client is writing object1, during the write , osd1 is down. let suppose 2MB is writed. 1. when the connection to osd1 is down, what does client do? ask monitor for new osdmap? or only the pg map? 2. now client gets a newer map , continues the write , the primary osd should be osd2. the rest 2MB is writed out. now what does ceph do to integrate the two part data? and to promise that replicas is enough? 3. where is the code. Be sure to tell me where the code is。 it is a very difficult question. Thanks so much___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] question about object replication theory
hi,all take a look at the link , http://www.ceph.com/docs/master/architecture/#smart-daemons-enable-hyperscale could you explain point 2, 3 in that picture. 1. at point 2,3, before primary writes data to next osd, where is the data? it is in momory or on disk already? 2. where is the code of point 2 or 3, at there primary distributes data to others? thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] confusion when kill 3 osds that store the same pg
hi,all in order to test ceph stability. i try to kill osds. in this case ,i kill 3 osds(osd3,2,0) that store the same pg 2.30. ---crush--- osdmap e1342 pool 'rbd' (2) object 'rbd_data.19d92ae8944a.' -> pg 2.c59a45b0 (2.30) -> up ([3,2,0], p3) acting ([3,2,0], p3) [root@cephosd5-gw current]# ceph osd tree # idweight type name up/down reweight -1 0.09995 root default -2 0.01999 host cephosd1-mona 0 0.01999 osd.0 down0 -3 0.01999 host cephosd2-monb 1 0.01999 osd.1 up 1 -4 0.01999 host cephosd3-monc 2 0.01999 osd.2 down0 -5 0.01999 host cephosd4-mdsa 3 0.01999 osd.3 down0 -6 0.01999 host cephosd5-gw 4 0.01999 osd.4 up 1 - according to the test result, i have some confusion. 1. [root@cephosd5-gw current]# ceph pg 2.30 query Error ENOENT: i don't have pgid 2.30 why i can not query infomations of this pg? how to dump this pg? 2. #ceph osd map rbd rbd_data.19d92ae8944a. osdmap e1451 pool 'rbd' (2) object 'rbd_data.19d92ae8944a.' -> pg 2.c59a45b0 (2.30) -> up ([4,1], p4) acting ([4,1], p4) does 'ceph osd map' command just calculate map , but does not check real pg stat? i do not find 2.30 on osd1 and osd.4. new that client will get the new map, why client hang ? thanks very much ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] do you have any test case that lost data mostlikely
hi,all i want to test some cases that lost data mostlikely. now i just test killing osds. do you have any such test cases? thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about librbd io(fio paramenters)
fio paramenters --fio [global] ioengine=libaio direct=1 rw=randwrite filename=/dev/vdb time_based runtime=300 stonewall [iodepth32] iodepth=32 bs=4k At 2014-09-11 05:04:09, "yuelongguang" wrote: hi, josh durgin: please look at my test. inside vm using fio to test rbd performance. fio paramters: dircet io, bs=4k, iodepth >> 4 from the infomation below, it does not match. avgrq-sz is not approximately 8, for avgqu-sz , its value is small and ruleless, lesser than 32. why? in ceph , which part maybe gather/scatter io request. why avgqu-szis so small? let's work it out. haha thanks iostat-iodepth=32-- blocksize=4k-- Linux 2.6.32-358.el6.x86_64 (cephosd4-mdsa) 2014年09月11日 _x86_64_(2 CPU) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.12 5.818.19 35.39 132.09 670.6518.42 0.317.06 0.55 2.41 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 291.500.00 1151.00 0.00 13091.5011.37 5.064.40 0.23 26.35 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 208.500.00 1020.00 0.00 8294.50 8.13 2.522.47 0.39 39.30 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0036.000.00 1076.00 0.00 17560.0016.32 0.600.56 0.30 32.30 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 242.500.00 1143.00 0.00 22402.0019.60 3.783.31 0.25 28.90 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0031.000.00 906.50 0.00 5351.50 5.90 0.370.40 0.28 25.70 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 294.500.00 1148.50 0.00 16620.5014.47 4.493.91 0.21 24.60 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0026.500.00 810.50 0.00 4922.50 6.07 0.370.45 0.35 28.35 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0045.500.00 1022.00 0.00 6117.00 5.99 0.380.37 0.28 28.15 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 300.000.00 1155.00 0.00 16997.5014.72 3.583.10 0.21 24.30 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0027.000.00 962.50 0.00 6846.50 7.11 0.440.46 0.35 33.60 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 270.000.00 1249.50 0.00 14400.0011.52 4.613.69 0.25 31.25 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0015.003.00 660.0024.00 4247.00 6.44 0.380.57 0.45 29.60 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0017.00 24.50 592.50 196.00 8039.0013.35 0.580.94 0.83 51.05 在 2014-09-10 08:37:23,"Josh Durgin" 写道: >On 09/09/2014 07:06 AM, yuelongguang wrote: >> hi, josh.durgin: >> i want to know how librbd launch io request. >> use case: >> inside vm, i use fio to test rbd-disk's io performance. >> fio's pramaters are bs=4k, direct io, qemu cache=none. >> in this case, if librbd just send what it gets from vm, i mean no >> gather/scatter. the rate , io inside vm : io at librbd: io at osd >> filestore = 1:1:1? > >If the rbd image is not a clone, the io issued from the vm's block >driver will match the io issued by librbd. With caching disabled >as you have it, the io from the OSDs will be similar, with some >small amount extra for OSD bookkeeping. > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about librbd io
hi, josh durgin: please look at my test. inside vm using fio to test rbd performance. fio paramters: dircet io, bs=4k, iodepth >> 4 from the infomation below, it does not match. avgrq-sz is not approximately 8, for avgqu-sz , its value is small and ruleless, lesser than 32. why? in ceph , which part maybe gather/scatter io request. why avgqu-szis so small? let's work it out. haha thanks iostat-iodepth=32-- blocksize=4k-- Linux 2.6.32-358.el6.x86_64 (cephosd4-mdsa) 2014年09月11日 _x86_64_(2 CPU) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.12 5.818.19 35.39 132.09 670.6518.42 0.317.06 0.55 2.41 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 291.500.00 1151.00 0.00 13091.5011.37 5.064.40 0.23 26.35 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 208.500.00 1020.00 0.00 8294.50 8.13 2.522.47 0.39 39.30 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0036.000.00 1076.00 0.00 17560.0016.32 0.600.56 0.30 32.30 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 242.500.00 1143.00 0.00 22402.0019.60 3.783.31 0.25 28.90 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0031.000.00 906.50 0.00 5351.50 5.90 0.370.40 0.28 25.70 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 294.500.00 1148.50 0.00 16620.5014.47 4.493.91 0.21 24.60 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0026.500.00 810.50 0.00 4922.50 6.07 0.370.45 0.35 28.35 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0045.500.00 1022.00 0.00 6117.00 5.99 0.380.37 0.28 28.15 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 300.000.00 1155.00 0.00 16997.5014.72 3.583.10 0.21 24.30 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0027.000.00 962.50 0.00 6846.50 7.11 0.440.46 0.35 33.60 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.00 270.000.00 1249.50 0.00 14400.0011.52 4.613.69 0.25 31.25 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0015.003.00 660.0024.00 4247.00 6.44 0.380.57 0.45 29.60 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 0.0017.00 24.50 592.50 196.00 8039.0013.35 0.580.94 0.83 51.05 在 2014-09-10 08:37:23,"Josh Durgin" 写道: >On 09/09/2014 07:06 AM, yuelongguang wrote: >> hi, josh.durgin: >> i want to know how librbd launch io request. >> use case: >> inside vm, i use fio to test rbd-disk's io performance. >> fio's pramaters are bs=4k, direct io, qemu cache=none. >> in this case, if librbd just send what it gets from vm, i mean no >> gather/scatter. the rate , io inside vm : io at librbd: io at osd >> filestore = 1:1:1? > >If the rbd image is not a clone, the io issued from the vm's block >driver will match the io issued by librbd. With caching disabled >as you have it, the io from the OSDs will be similar, with some >small amount extra for OSD bookkeeping. > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] why one osd-op from client can get two osd-op-reply?
as for the second question, could you tell me where the code is. how ceph makes size/min_szie copies? thanks At 2014-09-11 12:19:18, "Gregory Farnum" wrote: >On Wed, Sep 10, 2014 at 8:29 PM, yuelongguang wrote: >> >> >> >> >> as for ack and ondisk, ceph has size and min_size to decide there are how >> many replications. >> if client receive ack or ondisk, which means there are at least min_size >> osds have done the ops? >> >> i am reading the cource code, could you help me with the two questions. >> >> 1. >> on osd, where is the code that reply ops separately according to ack or >> ondisk. >> i check the code, but i thought they always are replied together. > >It depends on what journaling mode you're in, but generally they're >triggered separately (unless it goes on disk first, in which case it >will skip the ack — this is the mode it uses for non-btrfs >filesystems). The places where it actually replies are pretty clear >about doing one or the other, though... > >> >> 2. >> now i just know how client write ops to primary osd, inside osd cluster, >> how it promises min_size copy are reached. >> i mean when primary osd receives ops , how it spreads ops to others, and >> how it processes other's reply. > >That's not how it works. The primary for a PG will not go "active" >with it until it has at least min_size copies that it knows about. >Once the OSD is doing any processing of the PG, it requires all >participating members to respond before it sends any messages back to >the client. >-Greg >Software Engineer #42 @ http://inktank.com | http://ceph.com > >> >> >> greg, thanks very much >> >> >> >> >> >> 在 2014-09-11 01:36:39,"Gregory Farnum" 写道: >> >> The important bit there is actually near the end of the message output line, >> where the first says "ack" and the second says "ondisk". >> >> I assume you're using btrfs; the ack is returned after the write is applied >> in-memory and readable by clients. The ondisk (commit) message is returned >> after it's durable to the journal or the backing filesystem. >> -Greg >> >> On Wednesday, September 10, 2014, yuelongguang wrote: >>> >>> hi,all >>> i recently debug ceph rbd, the log tells that one write to osd can get >>> two if its reply. >>> the difference between them is seq. >>> why? >>> >>> thanks >>> ---log- >>> reader got message 6 0x7f58900010a0 osd_op_reply(15 >>> rbd_data.19d92ae8944a.0001 [set-alloc-hint object_size 4194304 >>> write_size 4194304,write 0~3145728] v211'518 uv518 ack = 0) v6 >>> 2014-09-10 08:47:32.348213 7f58bc16b700 20 -- 10.58.100.92:0/1047669 queue >>> 0x7f58900010a0 prio 127 >>> 2014-09-10 08:47:32.348230 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader reading tag... >>> 2014-09-10 08:47:32.348245 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got MSG >>> 2014-09-10 08:47:32.348257 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got envelope type=43 src osd.1 front=247 data=0 off 0 >>> 2014-09-10 08:47:32.348269 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader wants 247 from dispatch throttler 247/104857600 >>> 2014-09-10 08:47:32.348286 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got front 247 >>> 2014-09-10 08:47:32.348303 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).aborted = 0 >>> 2014-09-10 08:47:32.348312 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got 247 + 0 + 0 byte message >>> 2014-09-10 08:47:32.348332 7f58bc16b700 10 check_message_signature: seq # >>> = 7 front_crc_ = 3699418201 middle_crc = 0 data_crc = 0 >>> 2014-09-10 08:47:32.348369 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got message 7 0x7f5890003660 osd_op_reply(15 >>> rbd_data.19d92ae8944a.0001 [set-alloc-hint object_size 4194304 >>> write_size 4194304,write 0~3145728] v211'518 uv518 ondisk = 0) v6 >>> >>> >> >> >> -- >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd cpu usage is bigger than 100%
hi,all i am testing rbd performance, now there is only one vm which is using rbd as its disk, and inside it fio is doing r/w. the big diffenence is that i set a big iodepth other than iodepth=1. according to my test, the bigger iodepth, the bigger cpu usage. analyse the output of top command. 1. 12% wa, if it means disk speed is not fast enough? 2. from where we can know whether ceph's number of threads is enough or not? how do you think about it, which part is using up cpu? i want to find the root cause, why big iodepth leads to high cpu usage. ---default options osd_op_threads": "2", "osd_disk_threads": "1", "osd_recovery_threads": "1", "filestore_op_threads": "2", thanks --top---iodepth=16- top - 15:27:34 up 2 days, 6:03, 2 users, load average: 0.49, 0.56, 0.62 Tasks: 97 total, 1 running, 96 sleeping, 0 stopped, 0 zombie Cpu(s): 19.0%us, 8.1%sy, 0.0%ni, 59.3%id, 12.1%wa, 0.0%hi, 0.8%si, 0.7%st Mem: 1922540k total, 1853180k used,69360k free, 7012k buffers Swap: 1048568k total,76796k used, 971772k free, 1034272k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 2763 root 20 0 1112m 386m 5028 S 60.8 20.6 200:43.47 ceph-osd -top top - 19:50:08 up 1 day, 10:26, 2 users, load average: 1.55, 0.97, 0.81 Tasks: 97 total, 1 running, 96 sleeping, 0 stopped, 0 zombie Cpu(s): 37.6%us, 14.2%sy, 0.0%ni, 37.0%id, 9.4%wa, 0.0%hi, 1.3%si, 0.5%st Mem: 1922540k total, 1820196k used, 102344k free,23100k buffers Swap: 1048568k total,91724k used, 956844k free, 1052292k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4312 root 20 0 1100m 337m 5192 S 107.3 18.0 88:33.27 ceph-osd 1704 root 20 0 514m 272m 3648 S 0.7 14.5 3:27.19 ceph-mon --iostat-- Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 5.50 137.50 247.00 782.00 2896.00 8773.0011.34 7.083.55 0.63 65.05 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 9.50 119.00 327.50 458.50 3940.00 4733.5011.03 12.03 19.66 0.70 55.40 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 15.5010.50 324.00 559.50 3784.00 3398.00 8.13 1.982.22 0.81 71.25 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 4.50 253.50 273.50 803.00 3056.00 12155.0014.13 4.704.32 0.55 59.55 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 10.00 6.00 294.00 488.00 3200.00 2933.50 7.84 1.101.49 0.70 54.85 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 10.0014.00 333.00 645.00 3780.00 3846.00 7.80 2.132.15 0.90 87.55 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 11.00 240.50 259.00 579.00 3144.00 10035.5015.73 8.51 10.18 0.84 70.20 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 10.5017.00 318.50 707.00 3876.00 4084.50 7.76 1.321.30 0.61 62.65 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 4.50 208.00 233.50 918.00 2648.00 19214.5018.99 5.434.71 0.55 63.20 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vdd 7.00 1.50 306.00 212.00 3376.00 2176.5010.72 1.031.83 0.96 49.70 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] why one osd-op from client can get two osd-op-reply?
as for ack and ondisk, ceph has size and min_size to decide there are how many replications. if client receive ack or ondisk, which means there are at least min_size osds have done the ops? i am reading the cource code, could you help me with the two questions. 1. on osd, where is the code that reply ops separately according to ack or ondisk. i check the code, but i thought they always are replied together. 2. now i just know how client write ops to primary osd, inside osd cluster, how it promises min_size copy are reached. i mean when primary osd receives ops , how it spreads ops to others, and how it processes other's reply. greg, thanks very much 在 2014-09-11 01:36:39,"Gregory Farnum" 写道: The important bit there is actually near the end of the message output line, where the first says "ack" and the second says "ondisk". I assume you're using btrfs; the ack is returned after the write is applied in-memory and readable by clients. The ondisk (commit) message is returned after it's durable to the journal or the backing filesystem. -Greg On Wednesday, September 10, 2014, yuelongguang wrote: hi,all i recently debug ceph rbd, the log tells that one write to osd can get two if its reply. the difference between them is seq. why? thanks ---log- reader got message 6 0x7f58900010a0 osd_op_reply(15 rbd_data.19d92ae8944a.0001 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~3145728] v211'518 uv518 ack = 0) v6 2014-09-10 08:47:32.348213 7f58bc16b700 20 -- 10.58.100.92:0/1047669 queue 0x7f58900010a0 prio 127 2014-09-10 08:47:32.348230 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader reading tag... 2014-09-10 08:47:32.348245 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got MSG 2014-09-10 08:47:32.348257 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got envelope type=43 src osd.1 front=247 data=0 off 0 2014-09-10 08:47:32.348269 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader wants 247 from dispatch throttler 247/104857600 2014-09-10 08:47:32.348286 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got front 247 2014-09-10 08:47:32.348303 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).aborted = 0 2014-09-10 08:47:32.348312 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got 247 + 0 + 0 byte message 2014-09-10 08:47:32.348332 7f58bc16b700 10 check_message_signature: seq # = 7 front_crc_ = 3699418201 middle_crc = 0 data_crc = 0 2014-09-10 08:47:32.348369 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got message 7 0x7f5890003660 osd_op_reply(15 rbd_data.19d92ae8944a.0001 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~3145728] v211'518 uv518 ondisk = 0) v6 -- Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd cpu usage is bigger than 100%
hi,all i am testing rbd performance, now there is only one vm which is using rbd as its disk, and inside it fio is doing r/w. the big diffenence is that i set a big iodepth other than iodepth=1. how do you think about it, which part is using up cpu? i want to find the root cause. ---default options osd_op_threads": "2", "osd_disk_threads": "1", "osd_recovery_threads": "1", "filestore_op_threads": "2", thanks top - 19:50:08 up 1 day, 10:26, 2 users, load average: 1.55, 0.97, 0.81 Tasks: 97 total, 1 running, 96 sleeping, 0 stopped, 0 zombie Cpu(s): 37.6%us, 14.2%sy, 0.0%ni, 37.0%id, 9.4%wa, 0.0%hi, 1.3%si, 0.5%st Mem: 1922540k total, 1820196k used, 102344k free,23100k buffers Swap: 1048568k total,91724k used, 956844k free, 1052292k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4312 root 20 0 1100m 337m 5192 S 107.3 18.0 88:33.27 ceph-osd 1704 root 20 0 514m 272m 3648 S 0.7 14.5 3:27.19 ceph-mon ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] why one osd-op from client can get two osd-op-reply?
hi,all i recently debug ceph rbd, the log tells that one write to osd can get two if its reply. the difference between them is seq. why? thanks ---log- reader got message 6 0x7f58900010a0 osd_op_reply(15 rbd_data.19d92ae8944a.0001 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~3145728] v211'518 uv518 ack = 0) v6 2014-09-10 08:47:32.348213 7f58bc16b700 20 -- 10.58.100.92:0/1047669 queue 0x7f58900010a0 prio 127 2014-09-10 08:47:32.348230 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader reading tag... 2014-09-10 08:47:32.348245 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got MSG 2014-09-10 08:47:32.348257 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got envelope type=43 src osd.1 front=247 data=0 off 0 2014-09-10 08:47:32.348269 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader wants 247 from dispatch throttler 247/104857600 2014-09-10 08:47:32.348286 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got front 247 2014-09-10 08:47:32.348303 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).aborted = 0 2014-09-10 08:47:32.348312 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got 247 + 0 + 0 byte message 2014-09-10 08:47:32.348332 7f58bc16b700 10 check_message_signature: seq # = 7 front_crc_ = 3699418201 middle_crc = 0 data_crc = 0 2014-09-10 08:47:32.348369 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 c=0xfae940).reader got message 7 0x7f5890003660 osd_op_reply(15 rbd_data.19d92ae8944a.0001 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~3145728] v211'518 uv518 ondisk = 0) v6 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] question about librbd io
hi, josh.durgin: i want to know how librbd launch io request. use case: inside vm, i use fio to test rbd-disk's io performance. fio's pramaters are bs=4k, direct io, qemu cache=none. in this case, if librbd just send what it gets from vm, i mean no gather/scatter. the rate , io inside vm : io at librbd: io at osd filestore = 1:1:1? thanks fio [global] ioengine=libaio buffered=0 rw=randrw #size=3g #directory=/data1 filename=/dev/vdb [file0] iodepth=1 bs=4k time_based runtime=300 stonewall___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] all my osds are down, but ceph -s tells they are up and in.
hi,all that is crazy. 1. all my osds are down, but ceph -s tells they are up and in. why? 2. now all osds are down, a vm is using rbd as its disk, and inside vm fio is r/wing the disk , but it hang ,can not be killed. why ? thanks [root@cephosd2-monb ~]# ceph -v ceph version 0.81 (8de9501df275a5fe29f2c64cb44f195130e4a8fc) [root@cephosd2-monb ~]# ceph -s cluster 508634f6-20c9-43bb-bc6f-b777f4bb1651 health HEALTH_WARN mds 0 is laggy monmap e13: 3 mons at {cephosd1-mona=10.154.249.3:6789/0,cephosd2-monb=10.154.249.4:6789/0,cephosd3-monc=10.154.249.5:6789/0}, election epoch 154, quorum 0,1,2 cephosd1-mona,cephosd2-monb,cephosd3-monc mdsmap e21: 1/1/1 up {0=0=up:active(laggy or crashed)} osdmap e196: 5 osds: 5 up, 5 in pgmap v21836: 512 pgs, 5 pools, 3115 MB data, 805 objects 9623 MB used, 92721 MB / 102344 MB avail 512 active+clean___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] what does monitor data directory include?
hi, joao,mark nelson, both of you. where monmap is stored? how to dump monitor's data in /var/lib/ceph/mon/ceph-cephosd1-mona/store.db/? thanks At 2014-08-28 09:00:41, "Mark Nelson" wrote: >On 08/28/2014 07:48 AM, yuelongguang wrote: >> hi,all >> what is in directory, /var/lib/ceph/mon/ceph-cephosd1-mona/store.db/ >> how to dump? >> where monmap is stored? > >That directory is typically a leveldb store, though potentially could be >rocksdb or maybe something else after firefly. You can use the leveldb >api to access it. There may be other convenience tools to extract the >data out of it too. Joao may know more. > > >> thanks >> >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] what does monitor data directory include?
hi,all what is in directory, /var/lib/ceph/mon/ceph-cephosd1-mona/store.db/ how to dump? where monmap is stored? thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph can not repair itself after accidental power down, half of pgs are peering
the next day it returnes to normal. i have no idea. At 2014-08-27 00:38:29, "Michael" wrote: How far out are your clocks? It's showing a clock skew, if they're too far out it can cause issues with cephx. Otherwise you're probably going to need to check your cephx auth keys. -Michael On 26/08/2014 12:26, yuelongguang wrote: hi,all i have 5 osds and 3 mons. its status is ok then. to be mentioned , this cluster has no any data. i just deploy it and to be familar with some command lines. what is the probpem and how to fix? thanks ---environment- ceph-release-1-0.el6.noarch ceph-deploy-1.5.11-0.noarch ceph-0.81.0-5.el6.x86_64 ceph-libs-0.81.0-5.el6.x86_64 -ceph -s -- [root@cephosd1-mona ~]# ceph -s cluster 508634f6-20c9-43bb-bc6f-b777f4bb1651 health HEALTH_WARN 183 pgs peering; 183 pgs stuck inactive; 183 pgs stuck unclean; clock skew detected on mon.cephosd2-monb, mon.cephosd3-monc monmap e13: 3 mons at {cephosd1-mona=10.154.249.3:6789/0,cephosd2-monb=10.154.249.4:6789/0,cephosd3-monc=10.154.249.5:6789/0}, election epoch 74, quorum 0,1,2 cephosd1-mona,cephosd2-monb,cephosd3-monc osdmap e151: 5 osds: 5 up, 5 in pgmap v499: 384 pgs, 4 pools, 0 bytes data, 0 objects 201 MB used, 102143 MB / 102344 MB avail 167 peering 201 active+clean 16 remapped+peering --log--osd.0 2014-08-26 19:16:13.926345 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:16:13.926355 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2a80 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d5960).accept: got bad authorizer 2014-08-26 19:16:28.928023 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:16:28.928050 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2800 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d56a0).accept: got bad authorizer 2014-08-26 19:16:28.929139 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:16:28.929237 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :38071 s=1 pgs=0 cs=0 l=0 c=0x45d23c0).failed verifying authorize reply 2014-08-26 19:16:43.930846 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:16:43.930899 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2580 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d0b00).accept: got bad authorizer 2014-08-26 19:16:43.932204 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:16:43.932230 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :38073 s=1 pgs=0 cs=0 l=0 c=0x45d23c0).failed verifying authorize reply 2014-08-26 19:16:58.933526 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:16:58.935094 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2300 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d0840).accept: got bad authorizer 2014-08-26 19:16:58.936239 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:16:58.936261 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :38074 s=1 pgs=0 cs=0 l=0 c=0x45d23c0).failed verifying authorize reply 2014-08-26 19:17:13.937335 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:17:13.937368 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2080 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d1b80).accept: got bad authorizer 2014-08-26 19:17:13.937923 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:17:13.937933 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :38075 s=1 pgs=0 cs=0 l=0 c=0x45d23c0).failed verifying authorize reply 2014-08-26 19:17:28.939439 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block paddi
[ceph-users] ceph can not repair itself after accidental power down, half of pgs are peering
hi,all i have 5 osds and 3 mons. its status is ok then. to be mentioned , this cluster has no any data. i just deploy it and to be familar with some command lines. what is the probpem and how to fix? thanks ---environment- ceph-release-1-0.el6.noarch ceph-deploy-1.5.11-0.noarch ceph-0.81.0-5.el6.x86_64 ceph-libs-0.81.0-5.el6.x86_64 -ceph -s -- [root@cephosd1-mona ~]# ceph -s cluster 508634f6-20c9-43bb-bc6f-b777f4bb1651 health HEALTH_WARN 183 pgs peering; 183 pgs stuck inactive; 183 pgs stuck unclean; clock skew detected on mon.cephosd2-monb, mon.cephosd3-monc monmap e13: 3 mons at {cephosd1-mona=10.154.249.3:6789/0,cephosd2-monb=10.154.249.4:6789/0,cephosd3-monc=10.154.249.5:6789/0}, election epoch 74, quorum 0,1,2 cephosd1-mona,cephosd2-monb,cephosd3-monc osdmap e151: 5 osds: 5 up, 5 in pgmap v499: 384 pgs, 4 pools, 0 bytes data, 0 objects 201 MB used, 102143 MB / 102344 MB avail 167 peering 201 active+clean 16 remapped+peering --log--osd.0 2014-08-26 19:16:13.926345 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:16:13.926355 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2a80 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d5960).accept: got bad authorizer 2014-08-26 19:16:28.928023 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:16:28.928050 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2800 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d56a0).accept: got bad authorizer 2014-08-26 19:16:28.929139 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:16:28.929237 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :38071 s=1 pgs=0 cs=0 l=0 c=0x45d23c0).failed verifying authorize reply 2014-08-26 19:16:43.930846 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:16:43.930899 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2580 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d0b00).accept: got bad authorizer 2014-08-26 19:16:43.932204 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:16:43.932230 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :38073 s=1 pgs=0 cs=0 l=0 c=0x45d23c0).failed verifying authorize reply 2014-08-26 19:16:58.933526 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:16:58.935094 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2300 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d0840).accept: got bad authorizer 2014-08-26 19:16:58.936239 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:16:58.936261 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :38074 s=1 pgs=0 cs=0 l=0 c=0x45d23c0).failed verifying authorize reply 2014-08-26 19:17:13.937335 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:17:13.937368 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc2080 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d1b80).accept: got bad authorizer 2014-08-26 19:17:13.937923 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:17:13.937933 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :38075 s=1 pgs=0 cs=0 l=0 c=0x45d23c0).failed verifying authorize reply 2014-08-26 19:17:28.939439 7f114a8d2700 0 cephx: verify_authorizer could not decrypt ticket info: error: decryptor.MessageEnd::Exception: StreamTransformationFilter: invalid PKCS #7 block padding found 2014-08-26 19:17:28.939455 7f114a8d2700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x4dc1e00 sd=25 :6800 s=0 pgs=0 cs=0 l=0 c=0x45d5540).accept: got bad authorizer 2014-08-26 19:17:28.939716 7f114c009700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2014-08-26 19:17:28.939731 7f114c009700 0 -- 11.154.249.2:6800/1667 >> 11.154.249.7:6800/1599 pipe(0x3edb700 sd=24 :3807
Re: [ceph-users] enrich ceph test methods, what is your concern about ceph. thanks
thanks Irek Fasikhov. is it the only way to test ceph-rbd? and an important aim of the test is to find where the bottleneck is. qemu/librbd/ceph. could you share your test result with me? thanks 在 2014-08-26 04:22:22,"Irek Fasikhov" 写道: Hi. I and many people use fio. For ceph rbd has a special engine: https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html 2014-08-26 12:15 GMT+04:00 yuelongguang : hi,all i am planning to do a test on ceph, include performance, throughput, scalability,availability. in order to get a full test result, i hope you all can give me some advice. meanwhile i can send the result to you,if you like. as for each category test( performance, throughput, scalability,availability) , do you have some some test idea and test tools? basicly i have know some tools to test throughtput,iops . but you can tell the tools you prefer and the result you expect. thanks very much ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] enrich ceph test methods, what is your concern about ceph. thanks
hi,all i am planning to do a test on ceph, include performance, throughput, scalability,availability. in order to get a full test result, i hope you all can give me some advice. meanwhile i can send the result to you,if you like. as for each category test( performance, throughput, scalability,availability) , do you have some some test idea and test tools? basicly i have know some tools to test throughtput,iops . but you can tell the tools you prefer and the result you expect. thanks very much ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] question about getting rbd.ko and ceph.ko
hi,all is there a way to get rbd,ko and ceph.ko for centos 6.X. or i have to build them from source code? which is the least kernel version? thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] help to confirm if journal includes everything a OP has
hi,all By reading the code , i notice everything of a OP is encoded into Transaction which is writed into journal later. does journal record everything(meta,xattr,file data...) of a OP. if so everything is writed into disk twice, and journal always reaches full state, right? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] can osd start up if journal is lost and it has not been replayed?
hi could you tell the reason, why 'the journal is lost, the OSD is lost'? if journal is lost, actually it only lost part which ware not replayed. let take a similar case as example, a osd is down for some time , its journal is out of date(lose part of journal), but it can catch up with other osds. why? that example can tell that either outdated osd can get all journal from others or 'catch up' has different theory with journal. could you explain? thanks At 2014-08-14 05:21:20, "Craig Lewis" wrote: If the journal is lost, the OSD is lost. This can be a problem if you use 1 SSD for journals for many OSDs. There has been some discussion about making the OSDs able to recover from a lost journal, but I haven't heard anything else about it. I haven't been paying much attention to the developer mailing list though. For your second question, I'd start by looking at the source code in src/osd/ReplicatedPG.cc (for standard replication), or src/osd/ECBackend.cc (for Erasure Coding). I'm not a Ceph developer though, so that might not be the right place to start. On Tue, Aug 12, 2014 at 7:08 PM, yuelongguang wrote: hi,all 1. can osd start up if journal is lost and it has not been replayed? 2. how it catchs up latest epoch? take osd as example, where is the code? it better you consider journal is lost or not. in my mind journal only includes meta/R/W operations, does not include data(file data). thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] could you tell the call flow of pg state migration from log
2014-08-11 10:17:04.591497 7f0ec9b4f7a0 10 osd.0 pg_epoch: 153 pg[5.63( empty local-les=153 n=0 ec=81 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] null 2014-08-11 10:17:04.591501 7f0eb2b8f700 5 osd.0 pg_epoch: 155 pg[0.10( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] enter Started 2014-08-11 10:17:04.591501 7f0ec9b4f7a0 10 osd.0 pg_epoch: 153 pg[2.65( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] null 2014-08-11 10:17:04.591505 7f0eb2b8f700 5 osd.0 pg_epoch: 155 pg[0.10( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] enter Start 2014-08-11 10:17:04.591508 7f0ec9b4f7a0 10 osd.0 pg_epoch: 153 pg[1.66( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] null 2014-08-11 10:17:04.591509 7f0eb2b8f700 1 osd.0 pg_epoch: 155 pg[0.10( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] state: transitioning to Primary 2014-08-11 10:17:04.591513 7f0ec9b4f7a0 10 osd.0 pg_epoch: 153 pg[3.64( empty local-les=153 n=0 ec=76 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] null 2014-08-11 10:17:04.591517 7f0ec9b4f7a0 10 osd.0 pg_epoch: 153 pg[4.63( empty local-les=153 n=0 ec=79 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] null 2014-08-11 10:17:04.591518 7f0eb2b8f700 5 osd.0 pg_epoch: 155 pg[0.10( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] exit Start 0.13 0 0.00 2014-08-11 10:17:04.591521 7f0ec9b4f7a0 10 osd.0 pg_epoch: 153 pg[4.6c( empty local-les=153 n=0 ec=79 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] null 2014-08-11 10:17:04.591524 7f0eb2b8f700 5 osd.0 pg_epoch: 155 pg[0.10( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] enter Started/Primary 2014-08-11 10:17:04.591526 7f0ec9b4f7a0 10 osd.0 pg_epoch: 153 pg[1.68( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] null 2014-08-11 10:17:04.591529 7f0eb2b8f700 5 osd.0 pg_epoch: 155 pg[0.10( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] enter Started/Primary/Peering 2014-08-11 10:17:04.591531 7f0ec9b4f7a0 10 osd.0 pg_epoch: 153 pg[1.6b( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 inactive] null 2014-08-11 10:17:04.591535 7f0eb2b8f700 5 osd.0 pg_epoch: 155 pg[0.10( empty local-les=153 n=0 ec=1 les/c 153/153 152/152/152) [0] r=0 lpr=153 crt=0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetInfo hi,all pg's state changes , Start --> Started/Primary-> Started/Primary/Peering/GetInfo . why is that? what makes it do such change, and it is which thread handles the change? the code thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] can osd start up if journal is lost and it has not been replayed?
hi,all 1. can osd start up if journal is lost and it has not been replayed? 2. how it catchs up latest epoch? take osd as example, where is the code? it better you consider journal is lost or not. in my mind journal only includes meta/R/W operations, does not include data(file data). thanks___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph network
hi,all i know ceph differentiates network, mostly it uses public and cluster ,heartbeat network. do mon and mds have those network? i only know osd has. is there a place to introduce ceph's network? thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] best practice of installing ceph(large-scale deployment)
hi,all i am using ceph-rbd with openstack as its backends storage. is there a best practice? 1. it needs at least how many osds,mons, and their proportion ? 2. how you deploy the network?public , cluster network... 3.as for performance, what do you do? journal.. 4. anything it promotes ceph performance. thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] what is collection(COLL) and cid
hi,all look at the code. case Transaction::OP_MKCOLL: { coll_t cid = i.get_cid(); ...} 1 . what is COLL and cid? is coll is a pg and cid is pgid? 2. what is the relation between cid and 'current/meta'? or what is in current/meta? thanks very much.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] question about ApplyManager, SubmitManager and FileJournal
hi,all recently i dive into the source code, i am a little confused about them, maybe because of many threads,wait,seq. 1. what does apply_manager do? it is related to filestore and filejournal. 2. what does SubmitManager do? 3. how they interact and work together? what a big question :), thanks very much. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how ceph store xattr
hi,all 1. it seems that there are 2 kinds of function that get/set xattrs. one kind start with collection_*,the another one start with omap_*. what is the differences between them, and what xattrs use which kind of function? 2. there is a xattr that tells whethe xattrs are stored on leveldb , what is that xattr? thanks.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com