[ceph-users] Multiple OSD's in a Each node with replica 2
I have a doubt . In a scenario (3nodes x 4osd each x 2replica) I tested with a node down and as long as you have space available all objects were there. Is it possible all replicas of an object to be saved in the same node? Is it possible to lose any? Is there a mechanism that prevents replicas to be stored in another osd in the same node? I would love someone to answer it and any information is highly appreciated. -- Warm Regards, Azad Aliyar Linux Server Engineer *Email* : azad.ali...@sparksupport.com *|* *Skype* : spark.azad http://www.sparksupport.com http://www.sparkmycloud.com https://www.facebook.com/sparksupport http://www.linkedin.com/company/244846 https://twitter.com/sparksupport 3rd Floor, Leela Infopark, Phase -2,Kakanad, Kochi-30, Kerala, India *Phone*:+91 484 6561696 , *Mobile*:91-8129270421. *Confidentiality Notice:* Information in this e-mail is proprietary to SparkSupport. and is intended for use only by the addressed, and may contain information that is privileged, confidential or exempt from disclosure. If you are not the intended recipient, you are notified that any use of this information in any manner is strictly prohibited. Please delete this mail notify us immediately at i...@sparksupport.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [SPAM] Changing pg_num = RBD VM down !
May I know your ceph version.?. The latest version of firefly 80.9 has patches to avoid excessive data migrations during rewighting osds. You may need set a tunable inorder make this patch active. This is a bugfix release for firefly. It fixes a performance regression in librbd, an important CRUSH misbehavior (see below), and several RGW bugs. We have also backported support for flock/fcntl locks to ceph-fuse and libcephfs. We recommend that all Firefly users upgrade. For more detailed information, see http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt Adjusting CRUSH maps * This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases. However, because the bug may already have affected your cluster, fixing it may trigger movement *back* to the more correct location. For this reason, you must manually opt-in to the fixed behavior. In order to set the new tunable to correct the behavior:: ceph osd crush set-tunable straw_calc_version 1 Note that this change will have no immediate effect. However, from this point forward, any 'straw' bucket in your CRUSH map that is adjusted will get non-buggy internal weights, and that transition may trigger some rebalancing. You can estimate how much rebalancing will eventually be necessary on your cluster with:: ceph osd getcrushmap -o /tmp/cm crushtool -i /tmp/cm --num-rep 3 --test --show-mappings /tmp/a 21 crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2 crushtool -i /tmp/cm2 --reweight -o /tmp/cm2 crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings /tmp/b 21 wc -l /tmp/a # num total mappings diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings Divide the total number of lines in /tmp/a with the number of lines changed. We've found that most clusters are under 10%. You can force all of this rebalancing to happen at once with:: ceph osd crush reweight-all Otherwise, it will happen at some unknown point in the future when CRUSH weights are next adjusted. Notable Changes --- * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum) * crush: fix straw bucket weight calculation, add straw_calc_version tunable (#10095 Sage Weil) * crush: fix tree bucket (Rongzu Zhu) * crush: fix underflow of tree weights (Loic Dachary, Sage Weil) * crushtool: add --reweight (Sage Weil) * librbd: complete pending operations before losing image (#10299 Jason Dillaman) * librbd: fix read caching performance regression (#9854 Jason Dillaman) * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman) * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil) * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai) * osd: handle no-op write with snapshot (#10262 Sage Weil) * radosgw-admi On 03/16/2015 12:37 PM, Alexandre DERUMIER wrote: VMs are running on the same nodes than OSD Are you sure that you didn't some kind of out of memory. pg rebalance can be memory hungry. (depend how many osd you have). 2 OSD per host, and 5 hosts in this cluster. hosts h ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Doesn't Support Qcow2 Disk images
Community please explain the 2nd warning on this page: http://ceph.com/docs/master/rbd/rbd-openstack/ Important Ceph doesn’t support QCOW2 for hosting a virtual machine disk. Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot from volume), the Glance image format must be RAW. -- Warm Regards, Azad Aliyar Linux Server Engineer *Email* : azad.ali...@sparksupport.com *|* *Skype* : spark.azad http://www.sparksupport.com http://www.sparkmycloud.com https://www.facebook.com/sparksupport http://www.linkedin.com/company/244846 https://twitter.com/sparksupport 3rd Floor, Leela Infopark, Phase -2,Kakanad, Kochi-30, Kerala, India *Phone*:+91 484 6561696 , *Mobile*:91-8129270421. *Confidentiality Notice:* Information in this e-mail is proprietary to SparkSupport. and is intended for use only by the addressed, and may contain information that is privileged, confidential or exempt from disclosure. If you are not the intended recipient, you are notified that any use of this information in any manner is strictly prohibited. Please delete this mail notify us immediately at i...@sparksupport.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread
*Check Max Threadcount:* If you have a node with a lot of OSDs, you may be hitting the default maximum number of threads (e.g., usually 32k), especially during recovery. You can increase the number of threads using sysctl to see if increasing the maximum number of threads to the maximum possible number of threads allowed (i.e., 4194303) will help. For example: sysctl -w kernel.pid_max=4194303 If increasing the maximum thread count resolves the issue, you can make it permanent by including a kernel.pid_max setting in the /etc/sysctl.conf file. For example: kernel.pid_max = 4194303 On Mon, Mar 9, 2015 at 4:11 PM, Karan Singh karan.si...@csc.fi wrote: Hello Community need help to fix a long going Ceph problem. Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to restart OSD’s i am getting this error *2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc http://Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970* *common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)* *Environment *: 4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 , 3.17.2-1.el6.elrepo.x86_64 Tried upgrading from 0.80.7 to 0.80.8 but no Luck Tried centOS stock kernel 2.6.32 but no Luck Memory is not a problem more then 150+GB is free Did any one every faced this problem ?? *Cluster status * * cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33* * health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs incomplete; 1735 pgs peering; 8938 pgs stale; 1* *736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean; recovery 6061/31080 objects degraded (19* *.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02, mon.pouta-s03* * monmap e3: 3 mons at {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789* */0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03* * osdmap e26633: 239 osds: 85 up, 196 in* * pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects* *4699 GB used, 707 TB / 711 TB avail* *6061/31080 objects degraded (19.501%)* * 14 down+remapped+peering* * 39 active* *3289 active+clean* * 547 peering* * 663 stale+down+peering* * 705 stale+active+remapped* * 1 active+degraded+remapped* * 1 stale+down+incomplete* * 484 down+peering* * 455 active+remapped* *3696 stale+active+degraded* * 4 remapped+peering* * 23 stale+down+remapped+peering* * 51 stale+active* *3637 active+degraded* *3799 stale+active+clean* *OSD : Logs * *2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc http://Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970* *common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)* * ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)* * 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]* * 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]* * 3: (Accepter::entry()+0x265) [0xb5c635]* * 4: /lib64/libpthread.so.0() [0x3c8a6079d1]* * 5: (clone()+0x6d) [0x3c8a2e89dd]* * NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this.* *More information at Ceph Tracker Issue : * http://tracker.ceph.com/issues/10988#change-49018 Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Warm Regards, Azad Aliyar Linux Server Engineer *Email* : azad.ali...@sparksupport.com *|* *Skype* : spark.azad http://www.sparksupport.com http://www.sparkmycloud.com https://www.facebook.com/sparksupport http://www.linkedin.com/company/244846 https://twitter.com/sparksupport 3rd Floor, Leela Infopark, Phase -2,Kakanad, Kochi-30, Kerala, India *Phone*:+91 484 6561696 , *Mobile*:91-8129270421. *Confidentiality Notice:* Information in this e-mail is proprietary to SparkSupport. and is intended for use only by the addressed, and may contain information that is privileged, confidential or exempt from disclosure. If you are not the intended recipient, you are notified that any use of this information in any manner is strictly prohibited. Please delete this mail notify us immediately at i...@sparksupport.com
Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread
Great Karan. On Mon, Mar 9, 2015 at 9:32 PM, Karan Singh karan.si...@csc.fi wrote: Thanks Guys kernel.pid_max=4194303 did the trick. - Karan - On 09 Mar 2015, at 14:48, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi Karan, as you are actually writing in your own book, the problem is the sysctl setting kernel.pid_max. I've seen in your bug report that you were setting it to 65536, which is still to low for high density hardware. In our cluster, one OSD server has in an idle situation about 66.000 Threads (60 OSDs per Server). The number of threads increases when you increase the number of placement groups in the cluster, which I think has triggered your problem. Set the kernel.pid_max setting to 4194303 (the maximum) like Azad Aliyar suggested, and the problem should be gone. Regards, Christian Am 09.03.2015 11:41, schrieb Karan Singh: Hello Community need help to fix a long going Ceph problem. Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to restart OSD’s i am getting this error /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc http://Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970/ /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/ *Environment *: 4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 , 3.17.2-1.el6.elrepo.x86_64 Tried upgrading from 0.80.7 to 0.80.8 but no Luck Tried centOS stock kernel 2.6.32 but no Luck Memory is not a problem more then 150+GB is free Did any one every faced this problem ?? *Cluster status * * * / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/ / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs incomplete; 1735 pgs peering; 8938 pgs stale; 1/ /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean; recovery 6061/31080 objects degraded (19/ /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02, mon.pouta-s03/ / monmap e3: 3 mons at {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/ //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/ / * osdmap e26633: 239 osds: 85 up, 196 in*/ / pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/ /4699 GB used, 707 TB / 711 TB avail/ /6061/31080 objects degraded (19.501%)/ / 14 down+remapped+peering/ / 39 active/ /3289 active+clean/ / 547 peering/ / 663 stale+down+peering/ / 705 stale+active+remapped/ / 1 active+degraded+remapped/ / 1 stale+down+incomplete/ / 484 down+peering/ / 455 active+remapped/ /3696 stale+active+degraded/ / 4 remapped+peering/ / 23 stale+down+remapped+peering/ / 51 stale+active/ /3637 active+degraded/ /3799 stale+active+clean/ *OSD : Logs * /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc http://Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970/ /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/ / / / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/ / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/ / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]/ / 3: (Accepter::entry()+0x265) [0xb5c635]/ / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/ / 5: (clone()+0x6d) [0x3c8a2e89dd]/ / NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this./ *More information at Ceph Tracker Issue : *http://tracker.ceph.com/issues/10988#change-49018 Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian Eichelmann Systemadministrator 11 Internet AG - IT Operations Mail Media Advertising Targeting Brauerstraße 48 · DE-76135 Karlsruhe Telefon: +49 721 91374-8026 christian.eichelm...@1und1.de Amtsgericht Montabaur / HRB 6484 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen Aufsichtsratsvorsitzender: Michael Scheeren -- Warm Regards, Azad Aliyar Linux Server Engineer *Email* : azad.ali...@sparksupport.com *|* *Skype* : spark.azad http://www.sparksupport.com http
[ceph-users] Multiple OSD's in a Each node with replica 2
I have a doubt . In a scenario (3nodes x 4osd each x 2replica) I tested with a node down and as long as you have space available all objects were there. Is it possible all replicas of an object to be saved in the same node? Is it possible to lose any? Is there a mechanism that prevents replicas to be stored in another osd in the same node? I would love someone to answer it and any information is highly appreciated. -- Warm Regards, Azad Aliyar Linux Server Engineer *Email* : azad.ali...@sparksupport.com *|* *Skype* : spark.azad http://www.sparksupport.com http://www.sparkmycloud.com https://www.facebook.com/sparksupport http://www.linkedin.com/company/244846 https://twitter.com/sparksupport 3rd Floor, Leela Infopark, Phase -2,Kakanad, Kochi-30, Kerala, India *Phone*:+91 484 6561696 , *Mobile*:91-8129270421. *Confidentiality Notice:* Information in this e-mail is proprietary to SparkSupport. and is intended for use only by the addressed, and may contain information that is privileged, confidential or exempt from disclosure. If you are not the intended recipient, you are notified that any use of this information in any manner is strictly prohibited. Please delete this mail notify us immediately at i...@sparksupport.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com