Re: [ceph-users] cache-tier do not evict
What ceph version do you use? Regards, On 9 Apr 2015 18:58, "Patrik Plank" wrote: > Hi, > > > i have build a cach-tier pool (replica 2) with 3 x 512gb ssd for my kvm > pool. > > these are my settings : > > > ceph osd tier add kvm cache-pool > > ceph osd tier cache-mode cache-pool writeback > > ceph osd tier set-overlay kvm cache-pool > > > ceph osd pool set cache-pool hit_set_type bloom > > ceph osd pool set cache-pool hit_set_count 1 > > ceph osd pool set cache-pool hit set period 3600 > > > ceph osd pool set cache-pool target_max_bytes 751619276800 > > ceph osd pool set cache-pool target_max_objects 100 > > > ceph osd pool set cache-pool cache_min_flush_age 1800 > > ceph osd pool set cache-pool cache_min_evict_age 600 > > > ceph osd pool cache-pool cache_target_dirty_ratio .4 > > ceph osd pool cache-pool cache target_full_ratio .8 > > > So the problem is, the cache-tier do no evict automatically. > > If i copy some kvm images to the ceph cluster, the cache osds always run > full. > > > Is that normal? > > Is there a miss configuration? > > > thanks > > best regards > > Patrik > > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -s slow return result
Thank you very much! On 29 Mar 2015 11:25, "Kobi Laredo" wrote: > I'm glad it worked. > You can set a warning to catch this early next time (1GB) > > *mon leveldb size warn = 10* > > > > *Kobi Laredo* > *Cloud Systems Engineer* | (*408) 409-KOBI* > > On Fri, Mar 27, 2015 at 5:45 PM, Chu Duc Minh > wrote: > >> @Kobi Laredo: thank you! It's exactly my problem. >> # du -sh /var/lib/ceph/mon/ >> *2.6G * /var/lib/ceph/mon/ >> # ceph tell mon.a compact >> compacted leveldb in 10.197506 >> # du -sh /var/lib/ceph/mon/ >> *461M*/var/lib/ceph/mon/ >> Now my "ceph -s" return result immediately. >> >> Maybe monitors' LevelDB store grow so big because i pushed 13 millions >> file into a bucket (over radosgw). >> When have extreme large number of files in a bucket, the state of ceph >> cluster could become unstable? (I'm running Giant) >> >> Regards, >> >> On Sat, Mar 28, 2015 at 12:57 AM, Kobi Laredo >> wrote: >> >>> What's the current health of the cluster? >>> It may help to compact the monitors' LevelDB store if they have grown in >>> size >>> http://www.sebastien-han.fr/blog/2014/10/27/ceph-mon-store-taking-up-a-lot-of-space/ >>> Depends on the size of the mon's store size it may take some time to >>> compact, make sure to do only one at a time. >>> >>> *Kobi Laredo* >>> *Cloud Systems Engineer* | (*408) 409-KOBI* >>> >>> On Fri, Mar 27, 2015 at 10:31 AM, Chu Duc Minh >>> wrote: >>> >>>> All my monitors running. >>>> But i deleting pool .rgw.buckets, now having 13 million objects (just >>>> test data). >>>> The reason that i must delete this pool is my cluster become unstable, >>>> and sometimes an OSD down, PG peering, incomplete,... >>>> Therefore i must delete this pool to re-stablize my cluster. (radosgw >>>> is too slow for delete objects when one of my bucket reachs few million >>>> objects). >>>> >>>> Regards, >>>> >>>> >>>> On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum >>>> wrote: >>>> >>>>> Are all your monitors running? Usually a temporary hang means that the >>>>> Ceph client tries to reach a monitor that isn't up, then times out and >>>>> contacts a different one. >>>>> >>>>> I have also seen it just be slow if the monitors are processing so >>>>> many updates that they're behind, but that's usually on a very unhappy >>>>> cluster. >>>>> -Greg >>>>> On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh >>>>> wrote: >>>>> >>>>>> On my CEPH cluster, "ceph -s" return result quite slow. >>>>>> Sometimes it return result immediately, sometimes i hang few seconds >>>>>> before return result. >>>>>> >>>>>> Do you think this problem (ceph -s slow return) only relate to >>>>>> ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? >>>>>> (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util >>>>>> quite high) >>>>>> >>>>>> Regards, >>>>>> ___ >>>>>> ceph-users mailing list >>>>>> ceph-users@lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> >>>> >>>> ___ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -s slow return result
@Kobi Laredo: thank you! It's exactly my problem. # du -sh /var/lib/ceph/mon/ *2.6G * /var/lib/ceph/mon/ # ceph tell mon.a compact compacted leveldb in 10.197506 # du -sh /var/lib/ceph/mon/ *461M*/var/lib/ceph/mon/ Now my "ceph -s" return result immediately. Maybe monitors' LevelDB store grow so big because i pushed 13 millions file into a bucket (over radosgw). When have extreme large number of files in a bucket, the state of ceph cluster could become unstable? (I'm running Giant) Regards, On Sat, Mar 28, 2015 at 12:57 AM, Kobi Laredo wrote: > What's the current health of the cluster? > It may help to compact the monitors' LevelDB store if they have grown in > size > http://www.sebastien-han.fr/blog/2014/10/27/ceph-mon-store-taking-up-a-lot-of-space/ > Depends on the size of the mon's store size it may take some time to > compact, make sure to do only one at a time. > > *Kobi Laredo* > *Cloud Systems Engineer* | (*408) 409-KOBI* > > On Fri, Mar 27, 2015 at 10:31 AM, Chu Duc Minh > wrote: > >> All my monitors running. >> But i deleting pool .rgw.buckets, now having 13 million objects (just >> test data). >> The reason that i must delete this pool is my cluster become unstable, >> and sometimes an OSD down, PG peering, incomplete,... >> Therefore i must delete this pool to re-stablize my cluster. (radosgw is >> too slow for delete objects when one of my bucket reachs few million >> objects). >> >> Regards, >> >> >> On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum >> wrote: >> >>> Are all your monitors running? Usually a temporary hang means that the >>> Ceph client tries to reach a monitor that isn't up, then times out and >>> contacts a different one. >>> >>> I have also seen it just be slow if the monitors are processing so many >>> updates that they're behind, but that's usually on a very unhappy cluster. >>> -Greg >>> On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh >>> wrote: >>> >>>> On my CEPH cluster, "ceph -s" return result quite slow. >>>> Sometimes it return result immediately, sometimes i hang few seconds >>>> before return result. >>>> >>>> Do you think this problem (ceph -s slow return) only relate to >>>> ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? >>>> (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite >>>> high) >>>> >>>> Regards, >>>> ___ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -s slow return result
All my monitors running. But i deleting pool .rgw.buckets, now having 13 million objects (just test data). The reason that i must delete this pool is my cluster become unstable, and sometimes an OSD down, PG peering, incomplete,... Therefore i must delete this pool to re-stablize my cluster. (radosgw is too slow for delete objects when one of my bucket reachs few million objects). Regards, On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum wrote: > Are all your monitors running? Usually a temporary hang means that the > Ceph client tries to reach a monitor that isn't up, then times out and > contacts a different one. > > I have also seen it just be slow if the monitors are processing so many > updates that they're behind, but that's usually on a very unhappy cluster. > -Greg > On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh > wrote: > >> On my CEPH cluster, "ceph -s" return result quite slow. >> Sometimes it return result immediately, sometimes i hang few seconds >> before return result. >> >> Do you think this problem (ceph -s slow return) only relate to >> ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? >> (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite >> high) >> >> Regards, >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph -s slow return result
On my CEPH cluster, "ceph -s" return result quite slow. Sometimes it return result immediately, sometimes i hang few seconds before return result. Do you think this problem (ceph -s slow return) only relate to ceph-mon(s) process? or maybe it relate to ceph-osd(s) too? (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite high) Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [SPAM] Changing pg_num => RBD VM down !
@Michael Kuriger: when ceph/librbd operate normally, i know that double the pg_num is the safe way. But when it has problem, i think double it can make many many VMs die (maybe >= 50%?) On Mon, Mar 16, 2015 at 9:53 PM, Michael Kuriger wrote: > I always keep my pg number a power of 2. So I’d go from 2048 to 4096. > I’m not sure if this is the safest way, but it’s worked for me. > > > > [image: yp] > > > > Michael Kuriger > > Sr. Unix Systems Engineer > > * mk7...@yp.com |( 818-649-7235 > > From: Chu Duc Minh > Date: Monday, March 16, 2015 at 7:49 AM > To: Florent B > Cc: "ceph-users@lists.ceph.com" > Subject: Re: [ceph-users] [SPAM] Changing pg_num => RBD VM down ! > >I'm using the latest Giant and have the same issue. When i increase > PG_num of a pool from 2048 to 2148, my VMs is still ok. When i increase > from 2148 to 2400, some VMs die (Qemu-kvm process die). > My physical servers (host VMs) running kernel 3.13 and use librbd. > I think it's a bug in librbd with crushmap. > (I set crush_tunables3 on my ceph cluster, does it make sense?) > > Do you know a way to safely increase PG_num? (I don't think increase > PG_num 100 each times is a safe & good way) > > Regards, > > On Mon, Mar 16, 2015 at 8:50 PM, Florent B wrote: > >> We are on Giant. >> >> On 03/16/2015 02:03 PM, Azad Aliyar wrote: >> > >> > May I know your ceph version.?. The latest version of firefly 80.9 has >> > patches to avoid excessive data migrations during rewighting osds. You >> > may need set a tunable inorder make this patch active. >> > >> > This is a bugfix release for firefly. It fixes a performance regression >> > in librbd, an important CRUSH misbehavior (see below), and several RGW >> > bugs. We have also backported support for flock/fcntl locks to >> ceph-fuse >> > and libcephfs. >> > >> > We recommend that all Firefly users upgrade. >> > >> > For more detailed information, see >> > http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_master_-5Fdownloads_v0.80.9.txt&d=AwMFaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=0MEOMMXqQGLq4weFd85B2Bxn5uBH9V9uMiuajNVb7o0&s=-HHkWm2cMQZ06FKpWF4Ai-YkFb9lUR_tH_KR0eITbuU&e=> >> > >> > Adjusting CRUSH maps >> > >> > >> > * This point release fixes several issues with CRUSH that trigger >> > excessive data migration when adjusting OSD weights. These are most >> > obvious when a very small weight change (e.g., a change from 0 to >> > .01) triggers a large amount of movement, but the same set of bugs >> > can also lead to excessive (though less noticeable) movement in >> > other cases. >> > >> > However, because the bug may already have affected your cluster, >> > fixing it may trigger movement *back* to the more correct location. >> > For this reason, you must manually opt-in to the fixed behavior. >> > >> > In order to set the new tunable to correct the behavior:: >> > >> > ceph osd crush set-tunable straw_calc_version 1 >> > >> > Note that this change will have no immediate effect. However, from >> > this point forward, any 'straw' bucket in your CRUSH map that is >> > adjusted will get non-buggy internal weights, and that transition >> > may trigger some rebalancing. >> > >> > You can estimate how much rebalancing will eventually be necessary >> > on your cluster with:: >> > >> > ceph osd getcrushmap -o /tmp/cm >> > crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a >> 2>&1 >> > crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2 >> > crushtool -i /tmp/cm2 --reweight -o /tmp/cm2 >> > crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b >> > 2>&1 >> > wc -l /tmp/a # num total mappings >> > diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings >> > >> >Divide the total number of lines in /tmp/a with the number of lines >> >changed. We've found that most clusters are under 10%. >> > >> >You can force all of this rebalancing to happen at once with:: >> > >> > ceph osd crush reweight-all >> > >> >Otherwise, it will happen at some unknown
Re: [ceph-users] [SPAM] Changing pg_num => RBD VM down !
I'm using the latest Giant and have the same issue. When i increase PG_num of a pool from 2048 to 2148, my VMs is still ok. When i increase from 2148 to 2400, some VMs die (Qemu-kvm process die). My physical servers (host VMs) running kernel 3.13 and use librbd. I think it's a bug in librbd with crushmap. (I set crush_tunables3 on my ceph cluster, does it make sense?) Do you know a way to safely increase PG_num? (I don't think increase PG_num 100 each times is a safe & good way) Regards, On Mon, Mar 16, 2015 at 8:50 PM, Florent B wrote: > We are on Giant. > > On 03/16/2015 02:03 PM, Azad Aliyar wrote: > > > > May I know your ceph version.?. The latest version of firefly 80.9 has > > patches to avoid excessive data migrations during rewighting osds. You > > may need set a tunable inorder make this patch active. > > > > This is a bugfix release for firefly. It fixes a performance regression > > in librbd, an important CRUSH misbehavior (see below), and several RGW > > bugs. We have also backported support for flock/fcntl locks to ceph-fuse > > and libcephfs. > > > > We recommend that all Firefly users upgrade. > > > > For more detailed information, see > > http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt > > > > Adjusting CRUSH maps > > > > > > * This point release fixes several issues with CRUSH that trigger > > excessive data migration when adjusting OSD weights. These are most > > obvious when a very small weight change (e.g., a change from 0 to > > .01) triggers a large amount of movement, but the same set of bugs > > can also lead to excessive (though less noticeable) movement in > > other cases. > > > > However, because the bug may already have affected your cluster, > > fixing it may trigger movement *back* to the more correct location. > > For this reason, you must manually opt-in to the fixed behavior. > > > > In order to set the new tunable to correct the behavior:: > > > > ceph osd crush set-tunable straw_calc_version 1 > > > > Note that this change will have no immediate effect. However, from > > this point forward, any 'straw' bucket in your CRUSH map that is > > adjusted will get non-buggy internal weights, and that transition > > may trigger some rebalancing. > > > > You can estimate how much rebalancing will eventually be necessary > > on your cluster with:: > > > > ceph osd getcrushmap -o /tmp/cm > > crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a > 2>&1 > > crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2 > > crushtool -i /tmp/cm2 --reweight -o /tmp/cm2 > > crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b > > 2>&1 > > wc -l /tmp/a # num total mappings > > diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings > > > >Divide the total number of lines in /tmp/a with the number of lines > >changed. We've found that most clusters are under 10%. > > > >You can force all of this rebalancing to happen at once with:: > > > > ceph osd crush reweight-all > > > >Otherwise, it will happen at some unknown point in the future when > >CRUSH weights are next adjusted. > > > > Notable Changes > > --- > > > > * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum) > > * crush: fix straw bucket weight calculation, add straw_calc_version > > tunable (#10095 Sage Weil) > > * crush: fix tree bucket (Rongzu Zhu) > > * crush: fix underflow of tree weights (Loic Dachary, Sage Weil) > > * crushtool: add --reweight (Sage Weil) > > * librbd: complete pending operations before losing image (#10299 Jason > > Dillaman) > > * librbd: fix read caching performance regression (#9854 Jason Dillaman) > > * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman) > > * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil) > > * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai) > > * osd: handle no-op write with snapshot (#10262 Sage Weil) > > * radosgw-admi > > > > > > > > > > On 03/16/2015 12:37 PM, Alexandre DERUMIER wrote: > > >>> VMs are running on the same nodes than OSD > > > Are you sure that you didn't some kind of out of memory. > > > pg rebalance can be memory hungry. (depend how many osd you have). > > > > 2 OSD per host, and 5 hosts in this cluster. > > hosts h > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [URGENT] My CEPH cluster is dying (due to "incomplete" PG)
I have no choice except re-create this PG: # ceph pg force_create_pg 6.9d8 But it still stuck at creating: # ceph pg dump | grep creating dumped all in format plain 6.9d8 0 0 0 0 0 0 0 0 creating2014-11-09 03:27:23.611838 0'0 0:0 [] -1 [] -1 0'0 0.000'0 0.00 Do you have any suggestion? Thank you very much indeed! On Sun, Nov 9, 2014 at 12:52 AM, Chu Duc Minh wrote: > My ceph cluster have a pg in state "incomplete" and i can not query them > any more. > > *# ceph pg 6.9d8 query* (hang forever) > > All my volumes may be lost data because of this PG. > > # ceph pg dump_stuck inactive > ok > pg_stat objects mip degrmispunf bytes log disklog > state state_stamp v reportedup up_primary > acting acting_primary last_scrub scrub_stamp last_deep_scrub > deep_scrub_stamp > 6.9d8 30730 0 0 0 12678708736 4865 > 4865incomplete 2014-11-09 00:07:20.838531 109334'8040089 > 110376:1118 *[88,43,32] *88 [88,43,32] 88 > 91654'7460936 2014-10-10 10:36:25.433016 81667'5815892 2014-08-29 > 09:44:14.012219 > > My folders .../current/6.9d8_head still have data in some OSDs (osd.43), > how i can force CEPH to use data in osd.43 for this PG? I tried repair, > mark lost osd, v.v... > But i can't help. > > Can you give me some suggesstions. My data is dying :( > > Thank you very much! > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [URGENT] My CEPH cluster is dying (due to "incomplete" PG)
My ceph cluster have a pg in state "incomplete" and i can not query them any more. *# ceph pg 6.9d8 query* (hang forever) All my volumes may be lost data because of this PG. # ceph pg dump_stuck inactive ok pg_stat objects mip degrmispunf bytes log disklog state state_stamp v reportedup up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 6.9d8 30730 0 0 0 12678708736 4865 4865incomplete 2014-11-09 00:07:20.838531 109334'8040089 110376:1118 *[88,43,32] *88 [88,43,32] 88 91654'7460936 2014-10-10 10:36:25.433016 81667'5815892 2014-08-29 09:44:14.012219 My folders .../current/6.9d8_head still have data in some OSDs (osd.43), how i can force CEPH to use data in osd.43 for this PG? I tried repair, mark lost osd, v.v... But i can't help. Can you give me some suggesstions. My data is dying :( Thank you very much! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD command crash & can't delete volume!
Hi, i will start a new email thread, but i think it related to this rbd bug. Do you have any suggestion about quick fix for this buggy volume (eg: way to safely delete it,...)? Maybe it is a reason to make me can not start the last OSD. Thank you very much! On Fri, Nov 7, 2014 at 10:14 PM, Jason Dillaman wrote: > It appears you have discovered a bug in librbd that occurs when a child's > parent image is missing or corrupt. I have opened the following ticket for > this issue: http://tracker.ceph.com/issues/10030. > > For the OSD failure, can you start a new email thread with the supporting > details of that issue? > > -- > > Jason Dillaman > Red Hat > dilla...@redhat.com > http://www.redhat.com > > > -- > *From: *"Chu Duc Minh" > *To: *ceph-de...@vger.kernel.org, "ceph-users@lists.ceph.com >> > ceph-users@lists.ceph.com" > *Sent: *Friday, November 7, 2014 7:05:58 AM > *Subject: *[ceph-users] RBD command crash & can't delete volume! > > > Hi folks, some volumes in my ceph cluster have problem and I can NOT > delete it by rbd command. When i show info or try to delete it, rbd comand > crash. > > Command i used: > > *# rbd -p volumes info volume-e110b0a5-5116-46f2-99c7-84bb546f15c2# rbd -p > volumes rm volume-e110b0a5-5116-46f2-99c7-84bb546f15c2* > > I attach crash-dump log in this email, pls help me investigate it. > > PS: This problem makes me unable to start an osd too. (If i star this osd, > it die few minutes after that. I retried many times but got the same > result.) > > Thank you! > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD command crash & can't delete volume!
Hi folks, some volumes in my ceph cluster have problem and I can NOT delete it by rbd command. When i show info or try to delete it, rbd comand crash. Command i used: *# rbd -p volumes info volume-e110b0a5-5116-46f2-99c7-84bb546f15c2# rbd -p volumes rm volume-e110b0a5-5116-46f2-99c7-84bb546f15c2* I attach crash-dump log in this email, pls help me investigate it. PS: This problem makes me unable to start an osd too. (If i star this osd, it die few minutes after that. I retried many times but got the same result.) Thank you! root@ceph-mon-02:~# rbd -p volumes info volume-e110b0a5-5116-46f2-99c7-84bb546f15c2 [331/331] 2014-11-07 18:53:28.075090 7fa78f90d780 -1 librbd::ImageCtx: error reading immutable metadata: (2) No such file or directory 2014-11-07 18:53:28.075397 7fa78f90d780 -1 librbd: error opening parent image: (2) No such file or directory ./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread 7fa78f90d780 time 2014-11-07 18:53:28. 075425 ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size()) ceph version 0.87-6-gdba7def (dba7defc623474ad17263c9fccfec60fe7a439f0) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7fa78c9f39ff] 2: (()+0x2eca0) [0x7fa78edf8ca0] 3: (()+0x36679) [0x7fa78ee00679] 4: (librbd::close_image(librbd::ImageCtx*)+0x36) [0x7fa78ee13146] 5: (librbd::open_parent(librbd::ImageCtx*)+0x663) [0x7fa78ee1a7f3] 6: (librbd::refresh_parent(librbd::ImageCtx*)+0x138) [0x7fa78ee1aca8] 7: (librbd::ictx_refresh(librbd::ImageCtx*)+0xa41) [0x7fa78ee0e361] 8: (librbd::open_image(librbd::ImageCtx*)+0x1a4) [0x7fa78ee19fc4] 9: (librbd::RBD::open_read_only(librados::IoCtx&, librbd::Image&, char const*, char const*)+0x8a) [0x7fa78edf989a] 10: (main()+0x14b3) [0x40e0b3] 11: (__libc_start_main()+0xed) [0x7fa78bad276d] 12: rbd() [0x413d49] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. 2014-11-07 18:53:28.076481 7fa78f90d780 -1 ./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' th read 7fa78f90d780 time 2014-11-07 18:53:28.075425 ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size()) ceph version 0.87-6-gdba7def (dba7defc623474ad17263c9fccfec60fe7a439f0) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7fa78c9f39ff] 2: (()+0x2eca0) [0x7fa78edf8ca0] 3: (()+0x36679) [0x7fa78ee00679] 4: (librbd::close_image(librbd::ImageCtx*)+0x36) [0x7fa78ee13146] 5: (librbd::open_parent(librbd::ImageCtx*)+0x663) [0x7fa78ee1a7f3] 6: (librbd::refresh_parent(librbd::ImageCtx*)+0x138) [0x7fa78ee1aca8] 7: (librbd::ictx_refresh(librbd::ImageCtx*)+0xa41) [0x7fa78ee0e361] 8: (librbd::open_image(librbd::ImageCtx*)+0x1a4) [0x7fa78ee19fc4] 9: (librbd::RBD::open_read_only(librados::IoCtx&, librbd::Image&, char const*, char const*)+0x8a) [0x7fa78edf989a] 10: (main()+0x14b3) [0x40e0b3] 11: (__libc_start_main()+0xed) [0x7fa78bad276d] 12: rbd() [0x413d49] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -82> 2014-11-07 18:53:27.989029 7fa78f90d780 5 asok(0x23d6c60) register_command perfcounters_dump hook 0x23d8af0 -81> 2014-11-07 18:53:27.989059 7fa78f90d780 5 asok(0x23d6c60) register_command 1 hook 0x23d8af0 -80> 2014-11-07 18:53:27.989063 7fa78f90d780 5 asok(0x23d6c60) register_command perf dump hook 0x23d8af0 -79> 2014-11-07 18:53:27.989069 7fa78f90d780 5 asok(0x23d6c60) register_command perfcounters_schema hook 0x23d8af0 -78> 2014-11-07 18:53:27.989074 7fa78f90d780 5 asok(0x23d6c60) register_command 2 hook 0x23d8af0 -77> 2014-11-07 18:53:27.989082 7fa78f90d780 5 asok(0x23d6c60) register_command perf schema hook 0x23d8af0 -76> 2014-11-07 18:53:27.989092 7fa78f90d780 5 asok(0x23d6c60) register_command config show hook 0x23d8af0 -75> 2014-11-07 18:53:27.989095 7fa78f90d780 5 asok(0x23d6c60) register_command config set hook 0x23d8af0 -74> 2014-11-07 18:53:27.989102 7fa78f90d780 5 asok(0x23d6c60) register_command config get hook 0x23d8af0 -73> 2014-11-07 18:53:27.989107 7fa78f90d780 5 asok(0x23d6c60) register_command config diff hook 0x23d8af0 -72> 2014-11-07 18:53:27.989110 7fa78f90d780 5 asok(0x23d6c60) register_command log flush hook 0x23d8af0 -71> 2014-11-07 18:53:27.989116 7fa78f90d780 5 asok(0x23d6c60) register_command log dump hook 0x23d8af0 -70> 2014-11-07 18:53:27.989120 7fa78f90d780 5 asok(0x23d6c60) register_command log reopen hook 0x23d8af0 -69> 2014-11-07 18:53:27.995981 7fa78f90d780 10 monclient(hunting): build_initial_monmap -68> 2014-11-07 18:53:27.996113 7fa78f90d780 1 librados: starting msgr at :/0 -67> 2014-11-07 18:53:27.996121 7fa78f90d780 1 librados: starting objecter -66> 2014-11-07 18:53:27.996189 7fa78f90d780 5 asok(0x23d6c60) register_command objecter_requests hook
Re: [ceph-users] OSDs vanishing from Ceph cluster?
@font-face{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}You should run "ceph osd tree" and post the output here. BR,On March 28, 2014, at 5:54AM, Dan Koren wrote:Just ran into this problem: a week ago I set up a Ceph cluster on 4 systems, with one admin node and 3 mon+osd nodes, then ran a fewcasual IO tests. I returned to work after a few days out of town at a conference, and now my Ceph cluster appears to have no OSDs! root@rts24:/var/log/ceph# ceph status cluster 284dbfe0-e612-4732-9d26-2c5909f0fbd1 health HEALTH_ERR 119 pgs degraded; 192 pgs stale; 192 pgs stuck stale; 119 pgs stuck unclean; recovery 2/4 objects degraded (50.000%); no osds monmap e1: 3 mons at {rts21=http://172.29.0.21:6789/0,rts22=172.29.0.22:6789/0,rts23=172.29.0.23:6789/0";>172.29.0.21:6789/0,rts22=172.29.0.22:6789/0,rts23=172.29.0.23:6789/0}, election epoch 32, quorum 0,1,2 rts21,rts22,rts23 osdmap e33: 0 osds: 0 up, 0 in pgmap v2774: 192 pgs, 3 pools, 135 bytes data, 2 objects 0 kB used, 0 kB / 0 kB avail 2/4 objects degraded (50.000%) 73 stale+active+clean 119 stale+active+degraded I would appreciate if anyone could explain how can something likethis happen, or where to look for any evidence that might help me understand what happened. The log files in /var/log/ceph/ show no activity except for the monitors' Paxos chatter. Thx,Dan KorenDirector of SoftwareDATERA | 650.210.7910 | @dateranews mailto:d...@datera.io"; target="_blank">dnk@datera.io This email and any attachments thereto may contain private,confidential, and privileged material for the sole use of the intended recipient. Any review, copying, or distribution ofthis email (or any attachments thereto) by others is strictlyprohibited. If you are not the intended recipient, pleasecontact the sender immediately and permanently delete the original and any copies of this email and any attachmentsthereto. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] "rbd import" so slow
When using RBD backend for Openstack volume, i can easily surpass 200MB/s. But when using "rbd import" command, eg: # rbd import --pool test Debian.raw volume-Debian-1 --new-format --id volumes I only can import at speed ~ 30MB/s I don't know why rbd import slow? What can i do to improve import speed? Thanks you very much! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Speed limit on RadosGW?
My cluster has 3 MON nodes & 6 DATA nodes, all nodes have 2Gbps connectivity (bonding). Each Data node has 14 SATA HDD (osd), each journal on the same disk as OSD. Each MON node run RadosGW too. On Oct 15, 2013 12:34 AM, "Kyle Bader" wrote: > I've personally saturated 1Gbps links on multiple radosgw nodes on a large > cluster, if I remember correctly, Yehuda has tested it up into the 7Gbps > range with 10Gbps gear. Could you describe your clusters hardware and > connectivity? > > > On Mon, Oct 14, 2013 at 3:34 AM, Chu Duc Minh wrote: > >> Hi sorry, i missed this mail. >> >> >> > During writes, does the CPU usage on your RadosGW node go way up? >> No, CPU stay the same & very low (< 10%) >> >> When upload small files(300KB/file) over RadosGW: >> - using 1 process: upload bandwidth ~ 3MB/s >> - using 100 processes: upload bandwidth ~ 15MB/s >> >> When upload big files(3GB/file) over RadosGW: >> - using 1 process: upload bandwidth ~ 70MB/s >> (Therefore i don't upload big files using multi-processes any more :D) >> >> Maybe, RadosGW have a problem when write many smail files. Or it's a >> problem of CEPH when simultaneously write many smail files into a bucket, >> that already have millions files? >> >> >> On Wed, Sep 25, 2013 at 7:24 PM, Mark Nelson wrote: >> >>> On 09/25/2013 02:49 AM, Chu Duc Minh wrote: >>> >>>> I have a CEPH cluster with 9 nodes (6 data nodes & 3 mon/mds nodes) >>>> And i setup 4 separate nodes to test performance of Rados-GW: >>>> - 2 node run Rados-GW >>>> - 2 node run multi-process put file to [multi] Rados-GW >>>> >>>> Result: >>>> a) When i use 1 RadosGW node & 1 upload-node, speed upload = 50MB/s >>>> /upload-node, Rados-GW input/output speed = 50MB/s >>>> >>>> b) When i use 2 RadosGW node & 1 upload-node, speed upload = 50MB/s >>>> /upload-node; each RadosGW have input/output = 25MB/s ==> sum >>>> input/ouput of 2 Rados-GW = 50MB/s >>>> >>>> c) When i use 1 RadosGW node & 2 upload-node, speed upload = 25MB/s >>>> /upload-node ==> sum output of 2 upload-node = 50MB/s, RadosGW have >>>> input/output = 50MB/s >>>> >>>> d) When i use 2 RadosGW node & 2 upload-node, speed upload = 25MB/s >>>> /upload-node ==> sum output of 2 upload-node = 50MB/s; each RadosGW have >>>> input/output = 25MB/s ==> sum input/ouput of 2 Rados-GW = 50MB/s >>>> >>>> _*Problem*_: i can pass limit 50MB/s when put file over Rados-GW, >>>> >>>> regardless of the number Rados-GW nodes and upload-nodes. >>>> When i use this CEPH cluster over librados (openstack/kvm), i can easily >>>> achieve > 300MB/s >>>> >>>> I don't know why performance of RadosGW is so low. What's bottleneck? >>>> >>> >>> During writes, does the CPU usage on your RadosGW node go way up? >>> >>> If this is a test cluster, you might want to try the wip-6286 build from >>> our gitbuilder site. There is a fix that depending on the size of your >>> objects, could have a big impact on performance. We're currently >>> investigating some other radosgw performance issues as well, so stay tuned. >>> :) >>> >>> Mark >>> >>> >>> >>>> Thank you very much! >>>> >>>> >>>> >>>> >>>> __**_ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>>> >>>> >>> __**_ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > > Kyle > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Speed limit on RadosGW?
Hi sorry, i missed this mail. > During writes, does the CPU usage on your RadosGW node go way up? No, CPU stay the same & very low (< 10%) When upload small files(300KB/file) over RadosGW: - using 1 process: upload bandwidth ~ 3MB/s - using 100 processes: upload bandwidth ~ 15MB/s When upload big files(3GB/file) over RadosGW: - using 1 process: upload bandwidth ~ 70MB/s (Therefore i don't upload big files using multi-processes any more :D) Maybe, RadosGW have a problem when write many smail files. Or it's a problem of CEPH when simultaneously write many smail files into a bucket, that already have millions files? On Wed, Sep 25, 2013 at 7:24 PM, Mark Nelson wrote: > On 09/25/2013 02:49 AM, Chu Duc Minh wrote: > >> I have a CEPH cluster with 9 nodes (6 data nodes & 3 mon/mds nodes) >> And i setup 4 separate nodes to test performance of Rados-GW: >> - 2 node run Rados-GW >> - 2 node run multi-process put file to [multi] Rados-GW >> >> Result: >> a) When i use 1 RadosGW node & 1 upload-node, speed upload = 50MB/s >> /upload-node, Rados-GW input/output speed = 50MB/s >> >> b) When i use 2 RadosGW node & 1 upload-node, speed upload = 50MB/s >> /upload-node; each RadosGW have input/output = 25MB/s ==> sum >> input/ouput of 2 Rados-GW = 50MB/s >> >> c) When i use 1 RadosGW node & 2 upload-node, speed upload = 25MB/s >> /upload-node ==> sum output of 2 upload-node = 50MB/s, RadosGW have >> input/output = 50MB/s >> >> d) When i use 2 RadosGW node & 2 upload-node, speed upload = 25MB/s >> /upload-node ==> sum output of 2 upload-node = 50MB/s; each RadosGW have >> input/output = 25MB/s ==> sum input/ouput of 2 Rados-GW = 50MB/s >> >> _*Problem*_: i can pass limit 50MB/s when put file over Rados-GW, >> >> regardless of the number Rados-GW nodes and upload-nodes. >> When i use this CEPH cluster over librados (openstack/kvm), i can easily >> achieve > 300MB/s >> >> I don't know why performance of RadosGW is so low. What's bottleneck? >> > > During writes, does the CPU usage on your RadosGW node go way up? > > If this is a test cluster, you might want to try the wip-6286 build from > our gitbuilder site. There is a fix that depending on the size of your > objects, could have a big impact on performance. We're currently > investigating some other radosgw performance issues as well, so stay tuned. > :) > > Mark > > > >> Thank you very much! >> >> >> >> >> __**_ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> > __**_ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Speed limit on RadosGW?
I have a CEPH cluster with 9 nodes (6 data nodes & 3 mon/mds nodes) And i setup 4 separate nodes to test performance of Rados-GW: - 2 node run Rados-GW - 2 node run multi-process put file to [multi] Rados-GW Result: a) When i use 1 RadosGW node & 1 upload-node, speed upload = 50MB/s /upload-node, Rados-GW input/output speed = 50MB/s b) When i use 2 RadosGW node & 1 upload-node, speed upload = 50MB/s /upload-node; each RadosGW have input/output = 25MB/s ==> sum input/ouput of 2 Rados-GW = 50MB/s c) When i use 1 RadosGW node & 2 upload-node, speed upload = 25MB/s /upload-node ==> sum output of 2 upload-node = 50MB/s, RadosGW have input/output = 50MB/s d) When i use 2 RadosGW node & 2 upload-node, speed upload = 25MB/s /upload-node ==> sum output of 2 upload-node = 50MB/s; each RadosGW have input/output = 25MB/s ==> sum input/ouput of 2 Rados-GW = 50MB/s *Problem*: i can pass limit 50MB/s when put file over Rados-GW, regardless of the number Rados-GW nodes and upload-nodes. When i use this CEPH cluster over librados (openstack/kvm), i can easily achieve > 300MB/s I don't know why performance of RadosGW is so low. What's bottleneck? Thank you very much! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com