Re: [ceph-users] tcmalloc use a lot of CPU
Hi, Is this phenomenon normal?Is there any idea about this problem? It's a known problem with tcmalloc (search on the ceph mailing). starting osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M environnement variable should help. Another way, is to compile ceph with jemalloc instead tcmalloc (./configure --with-jemalloc ...) - Mail original - De: YeYin ey...@qq.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 17 Août 2015 11:58:26 Objet: [ceph-users] tcmalloc use a lot of CPU Hi, all, When I do performance test with rados bench, I found tcmalloc consumed a lot of CPU: Samples: 265K of event 'cycles', Event count (approx.): 104385445900 + 27.58% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::FetchFromSpans() + 15.25% libtcmalloc.so.4.1.0 [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, + 12.20% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*) + 1.63% perf [.] append_chain + 1.39% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*) + 1.02% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) + 0.85% libtcmalloc.so.4.1.0 [.] 0x00017e6f + 0.75% libtcmalloc.so.4.1.0 [.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked() + 0.67% libc-2.12.so [.] memcpy + 0.53% libtcmalloc.so.4.1.0 [.] operator delete(void*) Ceph version: # ceph --version ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e) Kernel version: 3.10.83 Is this phenomenon normal? Is there any idea about this problem? Thanks. Ye ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] НА: tcmalloc use a lot of CPU
Hi! We also observe the same behavior on our test Hammer install, and I wrote about it some time ago: http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609 Jan Schremes give us some suggestions in thread, but we still not got any positive results - TCMalloc usage is high. The usage is lowered to 10%, when disable crc in messages, disable debug and disable cephx auth, but this is od course not for production use. Also we got a different trace, while performin FIO-RBD benchmarks on ssd pool: --- 46,07% [kernel] [k] _raw_spin_lock 6,51% [kernel] [k] mb_cache_entry_alloc 5,74% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) 5,50% libtcmalloc.so.4.2.2 [.] tcmalloc::SLL_Next(void*) 3,86% libtcmalloc.so.4.2.2 [.] TCMalloc_PageMap335::get(unsigned long) const 2,73% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*) 0,69% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*) 0,69% libtcmalloc.so.4.2.2 [.] tcmalloc::PageHeap::GetDescriptor(unsigned long) const 0,64% libtcmalloc.so.4.2.2 [.] tcmalloc::SLL_PopRange(void**, int, void**, void**) --- I dont clearly understand, what's happening in this case: ssd pool is connected to the same host, but different controller (C60X onboard instead of LSI2208), io scheduler set to noop, pool is gathered from 4х400Gb Intel DC S3700 and have to perform better, I think - more than 30-40 kops. But we got the trace above and no more then 12-15 kiops. Where can be a problem? Megov Igor CIO, Yuterra От: ceph-users ceph-users-boun...@lists.ceph.com от имени YeYin ey...@qq.com Отправлено: 17 августа 2015 г. 12:58 Кому: ceph-users Тема: [ceph-users] tcmalloc use a lot of CPU Hi, all, When I do performance test with rados bench, I found tcmalloc consumed a lot of CPU: Samples: 265K of event 'cycles', Event count (approx.): 104385445900 + 27.58% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::FetchFromSpans() + 15.25% libtcmalloc.so.4.1.0[.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, + 12.20% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::ReleaseToSpans(void*) + 1.63% perf[.] append_chain + 1.39% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*) + 1.02% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) + 0.85% libtcmalloc.so.4.1.0[.] 0x00017e6f + 0.75% libtcmalloc.so.4.1.0[.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked() + 0.67% libc-2.12.so[.] memcpy + 0.53% libtcmalloc.so.4.1.0[.] operator delete(void*) Ceph version: # ceph --version ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e) Kernel version: 3.10.83 Is this phenomenon normal?Is there any idea about this problem? Thanks. Ye ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] НА: НА: CEPH cache layer. Very slow
Hi! 6 nodes, 70 OSDs (1-2-4Tb sata drives). Ceph used as RBD backstore for VM images (~100VMs). Megov Igor CIO, Yuterra От: Ben Hines bhi...@gmail.com Отправлено: 14 августа 2015 г. 21:01 Кому: Межов Игорь Александрович Копия: Voloshanenko Igor; ceph-users@lists.ceph.com Тема: Re: [ceph-users] НА: CEPH cache layer. Very slow Nice to hear that you have no SSD failures yet in 10months. How many OSDs are you running, and what is your primary ceph workload? (RBD, rgw, etc?) -Ben On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович me...@yuterra.ru wrote: Hi! Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for ceph journals and DC S3700 400Gb in the SSD pool: same hosts, separate root in crushmap. SSD pool are not yet in production, journаlling SSDs works under production load for 10 months. They're in good condition - no faults, no degradation. We specially take 200Gb SSD for journals to reduce costs, and also have a higher than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended 1/3 to 1/6. So, as a conclusion - I'll recommend you to get a bigger budget and buy durable and fast SSDs for Ceph. Megov Igor CIO, Yuterra От: ceph-users ceph-users-boun...@lists.ceph.com от имени Voloshanenko Igor igor.voloshane...@gmail.com Отправлено: 13 августа 2015 г. 15:54 Кому: Jan Schermer Копия: ceph-users@lists.ceph.com Тема: Re: [ceph-users] CEPH cache layer. Very slow So, good, but price for 845 DC PRO 400 GB higher in about 2x times than intel S3500 240G ((( Any other models? ((( 2015-08-13 15:45 GMT+03:00 Jan Schermer j...@schermer.cz: I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO and not just PRO or DC EVO!). Those were very cheap but are out of stock at the moment (here). Faster than Intels, cheaper, and slightly different technology (3D V-NAND) which IMO makes them superior without needing many tricks to do its job. Jan On 13 Aug 2015, at 14:40, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Tnx, Irek! Will try! but another question to all, which SSD good enough for CEPH now? I'm looking into S3500 240G (I have some S3500 120G which show great results. Around 8x times better than Samsung) Possible you can give advice about other vendors/models with same or below price level as S3500 240G? 2015-08-13 12:11 GMT+03:00 Irek Fasikhov malm...@gmail.com: Hi, Igor. Try to roll the patch here: http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov P.S. I am no longer tracks changes in this direction(kernel), because we use already recommended SSD С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: So, after testing SSD (i wipe 1 SSD, and used it for tests) root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800] ting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta 00m:00s] journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 10:46:42 2015 write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 clat percentiles (usec): | 1.00th=[ 2704], 5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928], | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408], | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016], | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048], | 99.99th=[14912] bw (KB /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, stdev=34.31 lat (msec) : 4=94.99%, 10=4.96%, 20=0.05% cpu : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s, mint=60001msec, maxt=60001msec Disk stats (read/write): sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30% So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s I try to change cache mode : echo temporary write through /sys/class/scsi_disk/2:0:0:0/cache_type echo temporary write through /sys/class/scsi_disk/3:0:0:0/cache_type no luck, still same shit results, also i found this article: https://lkml.org/lkml/2013/11/20/264 pointed
Re: [ceph-users] ceph distributed osd
Hi All, Anybody can help on this issue. Regards Prabu On Mon, 17 Aug 2015 12:08:28 +0530 gjprabu lt;gjpr...@zohocorp.comgt; wrote Hi All, Also please find osd information. ceph osd dump | grep 'replicated size' pool 2 'repo' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 126 pgp_num 126 last_change 21573 flags hashpspool stripe_width 0 Regards Prabu On Mon, 17 Aug 2015 11:58:55 +0530 gjprabu lt;gjpr...@zohocorp.comgt; wrote Hi All, We need to test three OSD and one image with replica 2(size 1GB). While testing data is not writing above 1GB. Is there any option to write on third OSD. ceph osd pool get repo pg_num pg_num: 126 # rbd showmapped id pool image snap device 0 rbd integdownloads -/dev/rbd0 -- Already one 2 repo integrepotest -/dev/rbd2 -- newly created [root@hm2 repository]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda5ext4 289G 18G 257G 7% / devtmpfs devtmpfs 252G 0 252G 0% /dev tmpfstmpfs 252G 0 252G 0% /dev/shm tmpfstmpfs 252G 538M 252G 1% /run tmpfstmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sda2ext4 488M 212M 241M 47% /boot /dev/sda4ext4 1.9T 20G 1.8T 2% /var /dev/mapper/vg0-zoho ext4 8.6T 1.7T 6.5T 21% /zoho /dev/rbd0ocfs2 977G 101G 877G 11% /zoho/build/downloads /dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository @:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/ root@integ-hm2's password: sample.txt 100% 1024MB 4.5MB/s 03:48 scp: /zoho/build/repository//sample.txt: No space left on device Regards Prabu On Thu, 13 Aug 2015 19:42:11 +0530 gjprabu lt;gjpr...@zohocorp.comgt; wrote Dear Team, We are using two ceph OSD with replica 2 and it is working properly. Here my doubt is (Pool A -image size will be 10GB) and its replicated with two OSD, what will happen suppose if the size reached the limit, Is there any chance to make the data to continue writing in another two OSD's. Regards Prabu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] tcmalloc use a lot of CPU
Hi, all, When I do performance test with rados bench, I found tcmalloc consumed a lot of CPU: Samples: 265K of event 'cycles', Event count (approx.): 104385445900 + 27.58% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::FetchFromSpans() + 15.25% libtcmalloc.so.4.1.0[.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, + 12.20% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::ReleaseToSpans(void*) + 1.63% perf[.] append_chain + 1.39% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*) + 1.02% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) + 0.85% libtcmalloc.so.4.1.0[.] 0x00017e6f + 0.75% libtcmalloc.so.4.1.0[.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked() + 0.67% libc-2.12.so[.] memcpy + 0.53% libtcmalloc.so.4.1.0[.] operator delete(void*) Ceph version: # ceph --version ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e) Kernel version: 3.10.83 Is this phenomenon normal?Is there any idea about this problem? Thanks. Ye___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to improve single thread sequential reads?
Thanks for the replies guys. The client is set to 4MB, I haven't played with the OSD side yet as I wasn't sure if it would make much difference, but I will give it a go. If the client is already passing a 4MB request down through to the OSD, will it be able to readahead any further? The next 4MB object in theory will be on another OSD and so I'm not sure if reading ahead any further on the OSD side would help. How I see the problem is that the RBD client will only read 1 OSD at a time as the RBD readahead can't be set any higher than max_hw_sectors_kb, which is the object size of the RBD. Please correct me if I'm wrong on this. If you could set the RBD readahead to much higher than the object size, then this would probably give the desired effect where the buffer could be populated by reading from several OSD's in advance to give much higher performance. That or wait for striping to appear in the Kernel client. I've also found that BareOS (fork of Bacula) seems to has a direct RADOS feature that supports radosstriper. I might try this and see how it performs as well. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Somnath Roy Sent: 17 August 2015 03:36 To: Alex Gorbachev a...@iss-integration.com; Nick Fisk n...@fisk.me.uk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to improve single thread sequential reads? Have you tried setting read_ahead_kb to bigger number for both client/OSD side if you are using krbd ? In case of librbd, try the different config options for rbd cache.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alex Gorbachev Sent: Sunday, August 16, 2015 7:07 PM To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to improve single thread sequential reads? Hi Nick, On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nick Fisk Sent: 13 August 2015 18:04 To: ceph-users@lists.ceph.com Subject: [ceph-users] How to improve single thread sequential reads? Hi, I'm trying to use a RBD to act as a staging area for some data before pushing it down to some LTO6 tapes. As I cannot use striping with the kernel client I tend to be maxing out at around 80MB/s reads testing with DD. Has anyone got any clever suggestions of giving this a bit of a boost, I think I need to get it up to around 200MB/s to make sure there is always a steady flow of data to the tape drive. I've just tried the testing kernel with the blk-mq fixes in it for full size IO's, this combined with bumping readahead up to 4MB, is now getting me on average 150MB/s to 200MB/s so this might suffice. On a personal interest, I would still like to know if anyone has ideas on how to really push much higher bandwidth through a RBD. Some settings in our ceph.conf that may help: osd_op_threads = 20 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k filestore_queue_max_ops = 9 filestore_flusher = false filestore_max_sync_interval = 10 filestore_sync_flush = false Regards, Alex Rbd-fuse seems to top out at 12MB/s, so there goes that option. I'm thinking mapping multiple RBD's and then combining them into a mdadm RAID0 stripe might work, but seems a bit messy. Any suggestions? Thanks, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] any recommendation of using EnhanceIO?
What about https://github.com/Frontier314/EnhanceIO? Last commit 2 months ago, but no external contributors :( The nice thing about EnhanceIO is there is no need to change device name, unlike bcache, flashcache etc. Best regards, Alex On Thu, Jul 23, 2015 at 11:02 AM, Daniel Gryniewicz d...@redhat.com wrote: I did some (non-ceph) work on these, and concluded that bcache was the best supported, most stable, and fastest. This was ~1 year ago, to take it with a grain of salt, but that's what I would recommend. Daniel From: Dominik Zalewski dzalew...@optlink.net To: German Anders gand...@despegar.com Cc: ceph-users ceph-users@lists.ceph.com Sent: Wednesday, July 1, 2015 5:28:10 PM Subject: Re: [ceph-users] any recommendation of using EnhanceIO? Hi, I’ve asked same question last weeks or so (just search the mailing list archives for EnhanceIO :) and got some interesting answers. Looks like the project is pretty much dead since it was bought out by HGST. Even their website has some broken links in regards to EnhanceIO I’m keen to try flashcache or bcache (its been in the mainline kernel for some time) Dominik On 1 Jul 2015, at 21:13, German Anders gand...@despegar.com wrote: Hi cephers, Is anyone out there that implement enhanceIO in a production environment? any recommendation? any perf output to share with the diff between using it and not? Thanks in advance, German ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is there a way to configure a cluster_network for a running cluster?
Thinking this through, pretty sure you would need to take your cluster offline to do this. I can¹t think of a scenario where you could reliably keep quorum as you swap your monitors to use the cluster network. On 8/10/15, 8:59 AM, Daniel Marks daniel.ma...@codecentric.de wrote: Hi all, we just found out that our ceph-cluster communicates over the ceph public network only. Looks like we forgot to configure the cluster_network parameter during deployment ( :facepalm: ). We are running ceph version 0.94.1 on ubuntu 14.04.1 Is there any documentation or any known procedure to properly configure a ceph_cluster network for a running cluster (maybe via injectargs)? In which order should OSDs, MONs and MDSs be configured? Best regards, Daniel Marks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Question
Hi, Maybe this seems like a strange question but i could not find this info in the docs , i have following question, For the ceph cluster you need osd daemons and monitor daemons, On a host you can run several osd daemons (best one per drive as read in the docs) on one host But now my question can you run on the same host where you run already some osd daemons the monitor daemon Is this possible and what are the implications of doing this Met Vriendelijke Groeten Cordialement Kind Regards Cordialmente С приятелски поздрави [cid:D87E97BC-3D4F-4F8A-AC12-37B6FD3C2E40] This message (including any attachments) may be privileged or confidential. If you have received it by mistake, please notify the sender by return e-mail and delete this message from your system. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. S3S rejects any liability for the improper, incomplete or delayed transmission of the information contained in this message, as well as for damages resulting from this e-mail message. S3S cannot guarantee that the message received by you has not been intercepted by third parties and/or manipulated by computer programs used to transmit messages and viruses. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question
yes. The issue is resource sharing as usual: the MONs will use disk I/O, memory and CPU. If the cluster is small (test?) then there's no problem in using the same disks. If the cluster starts to get bigger you may want to dedicate resources (e.g. the disk for the MONs isn't used by an OSD). If the cluster is big enough you may want to dedicate a node for being a MON. On Mon, Aug 17, 2015 at 2:56 PM, Kris Vaes k...@s3s.eu wrote: Hi, Maybe this seems like a strange question but i could not find this info in the docs , i have following question, For the ceph cluster you need osd daemons and monitor daemons, On a host you can run several osd daemons (best one per drive as read in the docs) on one host But now my question can you run on the same host where you run already some osd daemons the monitor daemon Is this possible and what are the implications of doing this Met Vriendelijke Groeten Cordialement Kind Regards Cordialmente С приятелски поздрави This message (including any attachments) may be privileged or confidential. If you have received it by mistake, please notify the sender by return e-mail and delete this message from your system. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. S3S rejects any liability for the improper, incomplete or delayed transmission of the information contained in this message, as well as for damages resulting from this e-mail message. S3S cannot guarantee that the message received by you has not been intercepted by third parties and/or manipulated by computer programs used to transmit messages and viruses. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw keystone integration
After setting up radosgw federated configuration last week and integrating with openstack keystone auth, I have a question regarding the configuration. In the Keystone setup instructions for Kilo, the admin token auth method is disabled: http://docs.openstack.org/kilo/install-guide/install/apt/content/keystone-verify.html For security reasons, disable the temporary authentication token mechanism: Edit the /etc/keystone/keystone-paste.ini file and remove admin_token_auth from the [pipeline:public_api], [pipeline:admin_api], and [pipeline:api_v3] sections. So after using this setup guide for kilo, the environment is not compatible with radosgw because apparently radosgw requires admin token auth. This is not documented at http://ceph.com/docs/master/radosgw/keystone/ and resulted in a really frustrating day of troubleshooting why keystone was rejecting radosgw's attempts to load the token revocation list. So first, I think this requirement should be listed on the radosgw/keystone integration setup instructions. Long term, I am curious if ceph intends to continue using this temporary authentication mechanism that is recommended to be disabled after bootstrapping Keystone's setup by openstack. For reference, these are the kinds of errors seen when the admin token auth is disabled as recommended: ceph rgw node: T 10.13.32.6:42533 - controller:5000 [AP] GET /v2.0/tokens/revoked HTTP/1.1..Host: controller:5000..Accept: */*..Transfer-Encoding: chunked..X-Auth-Token: removed..Expect: 100-continue ## T controller:5000 - 10.13.32.6:42533 [AP] HTTP/1.1 100 Continue ## T 10.13.32.6:42533 - controller:5000 [AP] 0 # T controller:5000 - 10.13.32.6:42533 [AP] HTTP/1.1 403 Forbidden..Date: Sat, 15 Aug 2015 00:46:58 GMT..Server: Apache/2.4.7 (Ubuntu)..Vary: X-Auth-Token..X-Distribution: Ubuntu..x-openstack-request-id: req-869523c8-12bb-46d4-9d5b -89e0efd1dc38..Content-Length: 141..Content-Type: application/json{error: {message: You are not authorized to perform the requested action: identity:revocation_list, code: 403 , title: Forbidden}} root@radosgw-template:~# radosgw --id radosgw.us-dfw-1 -d 2015-08-15 00:51:17.992497 7ff2281e0840 0 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3), process radosgw, pid 15381 2015-08-15 00:51:18.515909 7ff2281e0840 0 framework: fastcgi 2015-08-15 00:51:18.515927 7ff2281e0840 0 framework: civetweb 2015-08-15 00:51:18.515946 7ff2281e0840 0 framework conf key: port, val: 7480 2015-08-15 00:51:18.515958 7ff2281e0840 0 starting handler: civetweb 2015-08-15 00:51:18.529113 7ff2281e0840 0 starting handler: fastcgi 2015-08-15 00:51:18.541553 7ff1a67fc700 0 revoked tokens response is missing signed section 2015-08-15 00:51:18.541573 7ff1a67fc700 0 ERROR: keystone revocation processing returned error r=-22 2015-08-15 00:51:21.222619 7ff1a6ffd700 0 ERROR: can't read user header: ret=-2 2015-08-15 00:51:21.222648 7ff1a6ffd700 0 ERROR: sync_user() failed, user=us-dfw ret=-2 keystone error log: 2015-08-14 19:46:58.582172 2015-08-14 19:46:58.582 8782 WARNING keystone.common.wsgi [-] You are not authorized to perform the requested action: identity:revocation_list ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] НА: tcmalloc use a lot of CPU
How big are those OPS? Are they random? How many nodes? How many SSDs/OSDs? What are you using to make the tests? Using atop on the OSD nodes where is your bottleneck? On Mon, Aug 17, 2015 at 1:05 PM, Межов Игорь Александрович me...@yuterra.ru wrote: Hi! We also observe the same behavior on our test Hammer install, and I wrote about it some time ago: http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609 http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609 Jan Schremes give us some suggestions in thread, but we still not got any positive results - TCMalloc usage is high. The usage is lowered to 10%, when disable crc in messages, disable debug and disable cephx auth, but this is od course not for production use. Also we got a different trace, while performin FIO-RBD benchmarks on ssd pool: --- 46,07% [kernel] [k] _raw_spin_lock 6,51% [kernel] [k] mb_cache_entry_alloc 5,74% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) 5,50% libtcmalloc.so.4.2.2 [.] tcmalloc::SLL_Next(void*) 3,86% libtcmalloc.so.4.2.2 [.] TCMalloc_PageMap335::get(unsigned long) const 2,73% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*) 0,69% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*) 0,69% libtcmalloc.so.4.2.2 [.] tcmalloc::PageHeap::GetDescriptor(unsigned long) const 0,64% libtcmalloc.so.4.2.2 [.] tcmalloc::SLL_PopRange(void**, int, void**, void**) --- I dont clearly understand, what's happening in this case: ssd pool is connected to the same host, but different controller (C60X onboard instead of LSI2208), io scheduler set to noop, pool is gathered from 4х400Gb Intel DC S3700 and have to perform better, I think - more than 30-40 kops. But we got the trace above and no more then 12-15 kiops. Where can be a problem? Megov Igor CIO, Yuterra -- *От:* ceph-users ceph-users-boun...@lists.ceph.com от имени YeYin ey...@qq.com *Отправлено:* 17 августа 2015 г. 12:58 *Кому:* ceph-users *Тема:* [ceph-users] tcmalloc use a lot of CPU Hi, all, When I do performance test with rados bench, I found tcmalloc consumed a lot of CPU: Samples: 265K of event 'cycles', Event count (approx.): 104385445900 + 27.58% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::FetchFromSpans() + 15.25% libtcmalloc.so.4.1.0[.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, + 12.20% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::ReleaseToSpans(void*) + 1.63% perf[.] append_chain + 1.39% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*) + 1.02% libtcmalloc.so.4.1.0[.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) + 0.85% libtcmalloc.so.4.1.0[.] 0x00017e6f + 0.75% libtcmalloc.so.4.1.0[.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked() + 0.67% libc-2.12.so[.] memcpy + 0.53% libtcmalloc.so.4.1.0[.] operator delete(void*) Ceph version: # ceph --version ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e) Kernel version: 3.10.83 Is this phenomenon normal?Is there any idea about this problem? Thanks. Ye ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph distributed osd
I don't understand your question? You created a 1G RBD/disk and it's full. You are able to grow it though - but that's a Linux management issue, not ceph. As everything is thin-provisioned you can create a RBD with an arbitrary size - I've create one with 1PB when the cluster only had 600G/Raw available. On Mon, Aug 17, 2015 at 1:18 PM, gjprabu gjpr...@zohocorp.com wrote: Hi All, Anybody can help on this issue. Regards Prabu On Mon, 17 Aug 2015 12:08:28 +0530 *gjprabu gjpr...@zohocorp.com gjpr...@zohocorp.com* wrote Hi All, Also please find osd information. ceph osd dump | grep 'replicated size' pool 2 'repo' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 126 pgp_num 126 last_change 21573 flags hashpspool stripe_width 0 Regards Prabu On Mon, 17 Aug 2015 11:58:55 +0530 *gjprabu gjpr...@zohocorp.com gjpr...@zohocorp.com* wrote Hi All, We need to test three OSD and one image with replica 2(size 1GB). While testing data is not writing above 1GB. Is there any option to write on third OSD. *ceph osd pool get repo pg_num* *pg_num: 126* *# rbd showmapped * *id pool image snap device* *0 rbd integdownloads -/dev/rbd0 *-- *Already one* *2 repo integrepotest -/dev/rbd2 -- newly created* [root@hm2 repository]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda5ext4 289G 18G 257G 7% / devtmpfs devtmpfs 252G 0 252G 0% /dev tmpfstmpfs 252G 0 252G 0% /dev/shm tmpfstmpfs 252G 538M 252G 1% /run tmpfstmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sda2ext4 488M 212M 241M 47% /boot /dev/sda4ext4 1.9T 20G 1.8T 2% /var /dev/mapper/vg0-zoho ext4 8.6T 1.7T 6.5T 21% /zoho /dev/rbd0ocfs2 977G 101G 877G 11% /zoho/build/downloads */dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository* @:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/ root@integ-hm2's password: sample.txt 100% 1024MB 4.5MB/s 03:48 scp: /zoho/build/repository//sample.txt: *No space left on device* Regards Prabu On Thu, 13 Aug 2015 19:42:11 +0530 *gjprabu gjpr...@zohocorp.com gjpr...@zohocorp.com* wrote Dear Team, We are using two ceph OSD with replica 2 and it is working properly. Here my doubt is (Pool A -image size will be 10GB) and its replicated with two OSD, what will happen suppose if the size reached the limit, Is there any chance to make the data to continue writing in another two OSD's. Regards Prabu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stuck creating pg
1) ~# ceph pg 5.6c7 query Error ENOENT: i don't have pgid 5.6c7 In the osd log: 2015-08-17 16:11:45.185363 7f311be40700 0 osd.19 64706 do_command r=-2 i don't have pgid 5.6c7 2015-08-17 16:11:45.185380 7f311be40700 0 log_channel(cluster) log [INF] : i don't have pgid 5.6c7 2) I do not see anything wrong with this rule: { rule_id: 0, rule_name: data, ruleset: 0, type: 1, min_size: 1, max_size: 10, steps: [ { op: take, item: -1, item_name: default }, { op: chooseleaf_firstn, num: 0, type: host }, { op: emit } ] }, 3) I rebooted all machines in the cluster and increased the replication level of the affected pool to 3, to be more sure. After recovery of this reboot we are currently in the current state: HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; 103 requests are blocked 32 sec; 2 osds have slow requests; pool volumes pg_num 2048 pgp_num 1400 pg 5.6c7 is stuck inactive since forever, current state creating, last acting [19,25,17] pg 5.6c7 is stuck unclean since forever, current state creating, last acting [19,25,17] 103 ops are blocked 524.288 sec 19 ops are blocked 524.288 sec on osd.19 84 ops are blocked 524.288 sec on osd.25 2 osds have slow requests pool volumes pg_num 2048 pgp_num 1400 Thanks, Bart On 08/17/2015 03:44 PM, minchen wrote: It looks like the crushrule does't work properly by osdmap changed, there are 3 pgs unclean: 5.6c7 5.2c7 15.2bd I think you can try follow method to help locate the problem 1st, ceph pg pgid query to lookup detail of pg state, eg, blocked by which osd? 2st, check the crushrule ceph osd crush rule dump and check the crush_ruleset for pools: 5 , 15 eg, the chooseleaf may be not choose the right osd ? minchen -- Original -- *From: * Bart Vanbrabant;b...@vanbrabant.eu; *Date: * Sun, Aug 16, 2015 07:27 PM *To: * ceph-usersceph-users@lists.ceph.com; *Subject: * [ceph-users] Stuck creating pg Hi, I have a ceph cluster with 26 osd's in 4 hosts only use for rbd for an OpenStack cluster (started at 0.48 I think), currently running 0.94.2 on Ubuntu 14.04. A few days ago one of the osd's was at 85% disk usage while only 30% of the raw disk space is used. I ran reweight-by-utilization with 150 was cutoff level. This reshuffled the data. I also noticed that the number of pg was still at the level when there were less disks in the cluster (1300). Based on the current guidelines I increased pg_num to 2048. It created the placement groups except for the last one. To try to force the creation of the pg I removed the OSD's (ceph osd out) assigned to that pg but that makes no difference. Currently all OSD's are back in and two pg's are also stuck in an unclean state: ceph health detail: HEALTH_WARN 2 pgs degraded; 2 pgs stale; 2 pgs stuck degraded; 1 pgs stuck inactive; 2 pgs stuck stale; 3 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 59 requests are blocked 32 sec; 3 osds have slow requests; recovery 221/549658 objects degraded (0.040%); recovery 221/549658 objects misplaced (0.040%); pool volumes pg_num 2048 pgp_num 1400 pg 5.6c7 is stuck inactive since forever, current state creating, last acting [19,25] pg 5.6c7 is stuck unclean since forever, current state creating, last acting [19,25] pg 5.2c7 is stuck unclean for 313513.609864, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck unclean for 313513.610368, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck undersized for 308381.750768, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck undersized for 308381.751913, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck degraded for 308381.750876, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck degraded for 308381.752021, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck stale for 281750.295301, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck stale for 281750.295293, current state stale+active+undersized+degraded+remapped, last acting [9] 16 ops are blocked 268435 sec 10 ops are blocked 134218 sec 10 ops are blocked 1048.58 sec 23 ops are blocked 524.288 sec 16 ops are blocked 268435 sec on osd.1 8 ops are blocked 134218 sec on osd.17 2 ops are blocked 134218 sec on osd.19 10 ops are blocked 1048.58 sec on osd.19 23 ops are blocked 524.288 sec on osd.19 3 osds have slow requests recovery 221/549658 objects degraded (0.040%) recovery 221/549658 objects misplaced (0.040%) pool volumes pg_num 2048 pgp_num
Re: [ceph-users] tcmalloc use a lot of CPU
On 08/17/2015 07:03 AM, Alexandre DERUMIER wrote: Hi, Is this phenomenon normal?Is there any idea about this problem? It's a known problem with tcmalloc (search on the ceph mailing). starting osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M environnement variable should help. Note that this only works if you use a version of gperftools/tcmalloc newer than 2.1. Another way, is to compile ceph with jemalloc instead tcmalloc (./configure --with-jemalloc ...) Yep! At least from what I've seen so far, jemalloc is still a little faster for 4k random writes even compared to tcmalloc with the patch + 128MB thread cache. Should have some data soon (mostly just a reproduction of Sandisk and Intel's work). - Mail original - De: YeYin ey...@qq.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 17 Août 2015 11:58:26 Objet: [ceph-users] tcmalloc use a lot of CPU Hi, all, When I do performance test with rados bench, I found tcmalloc consumed a lot of CPU: Samples: 265K of event 'cycles', Event count (approx.): 104385445900 + 27.58% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::FetchFromSpans() + 15.25% libtcmalloc.so.4.1.0 [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, + 12.20% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*) + 1.63% perf [.] append_chain + 1.39% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*) + 1.02% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) + 0.85% libtcmalloc.so.4.1.0 [.] 0x00017e6f + 0.75% libtcmalloc.so.4.1.0 [.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked() + 0.67% libc-2.12.so [.] memcpy + 0.53% libtcmalloc.so.4.1.0 [.] operator delete(void*) Ceph version: # ceph --version ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e) Kernel version: 3.10.83 Is this phenomenon normal? Is there any idea about this problem? Thanks. Ye ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Hi, Igor. You need to repair the PG. for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair' | awk {'print$1'}`;do ceph pg repair $i;done С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Hi Irek, Please read careful ))) You proposal was the first, i try to do... That's why i asked about help... ( 2015-08-18 8:34 GMT+03:00 Irek Fasikhov malm...@gmail.com: Hi, Igor. You need to repair the PG. for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair' | awk {'print$1'}`;do ceph pg repair $i;done С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Repair inconsistent pgs..
Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped
I added a couple OSD's and rebalanced, as well as added a new pool (id 10). # ceph health detail HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 5 pgs stuck unclean; 1 pgs stuck undersized; 1 pgs undersized; recovery 24379/66089446 objects misplaced (0.037%) pg 10.4f is stuck unclean since forever, current state active+undersized+degraded, last acting [35] pg 2.e7f is stuck unclean for 500733.746009, current state active+remapped, last acting [58,5] pg 2.b16 is stuck unclean for 263130.699428, current state active+remapped, last acting [40,90] pg 10.668 is stuck unclean for 253554.833477, current state active+remapped, last acting [34,101] pg 2.782 is stuck unclean for 253561.405193, current state active+remapped, last acting [76,101] pg 10.4f is stuck undersized for 300.523795, current state active+undersized+degraded, last acting [35] pg 10.4f is stuck degraded for 300.523977, current state active+undersized+degraded, last acting [35] pg 10.4f is active+undersized+degraded, acting [35] recovery 24379/66089446 objects misplaced (0.037%) I figured the logs for osd.35 might be most interesting first as it doesn't come out of a degraded state. After setting debug to 0/5 on osd.35 and restarting the osd I grep'd for the degraded placement group: # grep 10.4f\( ceph-osd.35.log 2015-08-17 09:27:03.945350 7f0eb1a7f700 30 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] lock 2015-08-17 09:27:03.945357 7f0eb1a7f700 10 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] on_shutdown 2015-08-17 09:27:03.945371 7f0eb1a7f700 10 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] cancel_copy_ops 2015-08-17 09:27:03.945378 7f0eb1a7f700 10 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] cancel_flush_ops 2015-08-17 09:27:03.945387 7f0eb1a7f700 10 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] cancel_proxy_read_ops 2015-08-17 09:27:03.945392 7f0eb1a7f700 10 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] on_change 2015-08-17 09:27:03.945397 7f0eb1a7f700 10 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] clear_primary_state 2015-08-17 09:27:03.945404 7f0eb1a7f700 20 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] agent_stop 2015-08-17 09:27:03.945409 7f0eb1a7f700 10 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] cancel_recovery 2015-08-17 09:27:03.945413 7f0eb1a7f700 10 osd.35 pg_epoch: 186424 pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079 185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0 active+undersized+degraded] clear_recovery_state Full logs of osd.35 part1: http://pastebin.com/6ymD4Gx6 part2: http://pastebin.com/h4aRwniF osd.76 # grep 2.782 /var/log/ceph/ceph-osd.76.log 2015-08-17 09:52:21.205316 7fc3b6cce700 20 osd.76 186548 kicking pg 2.782 2015-08-17 09:52:21.205319 7fc3b6cce700 30 osd.76 pg_epoch: 186548 pg[2.782( v 185988'161310 (183403'153055,185988'161310] local-les=186320 n=8163 ec=736 les/c 186320/186320 186318/186319/185008) [76]/[76,101] r=0 lpr=186319 crt=185986'161303 lcod 185988'161309 mlcod 0'0 active+remapped] lock 2015-08-17 09:52:21.205338 7fc3b6cce700 10 osd.76 pg_epoch: 186548 pg[2.782( v 185988'161310 (183403'153055,185988'161310] local-les=186320 n=8163 ec=736 les/c 186320/186320 186318/186319/185008) [76]/[76,101] r=0 lpr=186319 crt=185986'161303 lcod 185988'161309 mlcod 0'0 active+remapped] on_shutdown 2015-08-17 09:52:21.205347 7fc3b6cce700 10 osd.76 pg_epoch: 186548 pg[2.782( v 185988'161310 (183403'153055,185988'161310] local-les=186320 n=8163 ec=736 les/c 186320/186320 186318/186319/185008) [76]/[76,101] r=0 lpr=186319 crt=185986'161303 lcod 185988'161309 mlcod 0'0 active+remapped] cancel_copy_ops 2015-08-17 09:52:21.205354 7fc3b6cce700 10 osd.76 pg_epoch: 186548 pg[2.782( v 185988'161310 (183403'153055,185988'161310] local-les=186320 n=8163 ec=736 les/c 186320/186320 186318/186319/185008)
[ceph-users] docker distribution
Hi, Docker changed the old docker-registry project to docker-distribution and its API to v2. It now uses librados instead of radosgw to save data. In some ceph installations it is easier to get access to radosgw than to the cluster, so I've made a pull request to add radosgw support, it would be great if you test it. https://hub.docker.com/r/lorieri/docker-distribution-generic-s3/ Note: if you already use the old docker-registry you must create another bucket and push the images again, the API changed to v2. There is a shellscript to help https://github.com/docker/migrator How I tested it: docker run -d -p 5000:5000 -e REGISTRY_STORAGE=s3 \ -e REGISTRY_STORAGE_S3_REGION=generic \ -e REGISTRY_STORAGE_S3_REGIONENDPOINT=http://myradosgw.mydomain.com; \ -e REGISTRY_STORAGE_S3_BUCKET=registry \ -e REGISTRY_STORAGE_S3_ACCESSKEY=XXX \ -e REGISTRY_STORAGE_S3_SECRETKEY=XXX \ -e REGISTRY_STORAGE_S3_SECURE=false \ -e REGISTRY_STORAGE_S3_ENCRYPT=false \ -e REGISTRY_STORAGE_S3_REGIONSUPPORTSHEAD=false \ lorieri/docker-distribution-generic-s3 thanks, -lorieri ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stuck creating pg
Many operations in the OpenStack cluster are stuck because of this. For example, a VM cannot be removed because of operations stuck on osd.19: 2015-08-17 09:34:08.116274 7fa61e57a700 0 log_channel(cluster) log [WRN] : slow request 1920.261825 seconds old, received at 2015-08-17 09:02:07.853997: osd_op(client.4705573.0:4384 rbd_data.47a42a1fba00d3.110d [delete] 5.283a4ec7 ack+ondisk+write+known_if_redirected e61799) currently no flag points reached 2015-08-17 09:34:08.116279 7fa61e57a700 0 log_channel(cluster) log [WRN] : slow request 1920.157696 seconds old, received at 2015-08-17 09:02:07.958126: osd_op(client.4705573.0:4897 rbd_data.47a42a1fba00d3.130e [delete] 5.868caac7 ack+ondisk+write+known_if_redirected e61799) currently no flag points reached 2015-08-17 09:34:09.116537 7fa61e57a700 0 log_channel(cluster) log [WRN] : 38 slow requests, 9 included below; oldest blocked for 68721.775549 secs 2015-08-17 09:34:09.116553 7fa61e57a700 0 log_channel(cluster) log [WRN] : slow request 1920.842824 seconds old, received at 2015-08-17 09:02:08.273620: osd_op(client.4705573.0:5846 rbd_data.47a42a1fba00d3.16c3 [delete] 5.dbd736c7 ack+ondisk+write+known_if_redirected e61799) currently no flag points reached rbd_data.47a42a1fba00d3.130e is an object in an VM disk that openstack is trying to delete. gr, Bart On Sun, Aug 16, 2015 at 1:27 PM Bart Vanbrabant b...@vanbrabant.eu wrote: Hi, I have a ceph cluster with 26 osd's in 4 hosts only use for rbd for an OpenStack cluster (started at 0.48 I think), currently running 0.94.2 on Ubuntu 14.04. A few days ago one of the osd's was at 85% disk usage while only 30% of the raw disk space is used. I ran reweight-by-utilization with 150 was cutoff level. This reshuffled the data. I also noticed that the number of pg was still at the level when there were less disks in the cluster (1300). Based on the current guidelines I increased pg_num to 2048. It created the placement groups except for the last one. To try to force the creation of the pg I removed the OSD's (ceph osd out) assigned to that pg but that makes no difference. Currently all OSD's are back in and two pg's are also stuck in an unclean state: ceph health detail: HEALTH_WARN 2 pgs degraded; 2 pgs stale; 2 pgs stuck degraded; 1 pgs stuck inactive; 2 pgs stuck stale; 3 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 59 requests are blocked 32 sec; 3 osds have slow requests; recovery 221/549658 objects degraded (0.040%); recovery 221/549658 objects misplaced (0.040%); pool volumes pg_num 2048 pgp_num 1400 pg 5.6c7 is stuck inactive since forever, current state creating, last acting [19,25] pg 5.6c7 is stuck unclean since forever, current state creating, last acting [19,25] pg 5.2c7 is stuck unclean for 313513.609864, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck unclean for 313513.610368, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck undersized for 308381.750768, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck undersized for 308381.751913, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck degraded for 308381.750876, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck degraded for 308381.752021, current state stale+active+undersized+degraded+remapped, last acting [9] pg 5.2c7 is stuck stale for 281750.295301, current state stale+active+undersized+degraded+remapped, last acting [9] pg 15.2bd is stuck stale for 281750.295293, current state stale+active+undersized+degraded+remapped, last acting [9] 16 ops are blocked 268435 sec 10 ops are blocked 134218 sec 10 ops are blocked 1048.58 sec 23 ops are blocked 524.288 sec 16 ops are blocked 268435 sec on osd.1 8 ops are blocked 134218 sec on osd.17 2 ops are blocked 134218 sec on osd.19 10 ops are blocked 1048.58 sec on osd.19 23 ops are blocked 524.288 sec on osd.19 3 osds have slow requests recovery 221/549658 objects degraded (0.040%) recovery 221/549658 objects misplaced (0.040%) pool volumes pg_num 2048 pgp_num 1400 OSD 9 was the one that was the primary when the pg creation process got stuck. This OSD has been removed and added again (not only osd out but also removed from the crush map and added again) The bad data distribution was probably caused by the low number of pg's and mainly bad weighing of the OSD. I changed the crush map to give the same weight to each of the OSD's but that does not change these problems either: ceph osd tree: ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 6.5 pool default -6 2.0 host droplet4 16 0.25000 osd.16 up 1.0 1.0 20 0.25000 osd.20 up 1.0 1.0 21 0.25000 osd.21
Re: [ceph-users] rbd map failed
On Thu, Aug 13, 2015 at 1:59 PM, Adir Lev ad...@mellanox.com wrote: Hi, I have a CEPH cluster running on 4 physical servers, the cluster is up and healthy So far I was unable to connect any client to the cluster using krbd or fio rbd plugin. My clients can see and create images in rbd pool but cannot map root@r-dcs68 ~ # rbd ls fio_test foo foo1 foo_test root@r-dcs68 ~ # rbd map foo rbd: sysfs write failed rbd: map failed: (95) Operation not supported using strace I see that there is no write permissions to /sys/bus/rbd/add root@r-dcs68 ~ # echo 192.168.57.102:16789 name=admin,key=client.admin rbd foo - /sys/bus/rbd/add -bash: echo: write error: Operation not permitted It doesn't look like a file permissions problem, please paste the entire strace output. What's your ceph version (ceph --version)? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Memory-Usage
Hi, have a ceph cluster witch tree nodes and 32 osds. The tree nodes have 16Gb memory but only 5Gb is in use. Nodes are Dell Poweredge R510. my ceph.conf: [global] mon_initial_members = ceph01 mon_host = 10.0.0.20,10.0.0.21,10.0.0.22 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true filestore_op_threads = 32 public_network = 10.0.0.0/24 cluster_network = 10.0.1.0/24 osd_pool_default_size = 3 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 4096 osd_pool_default_pgp_num = 4096 osd_max_write_size = 200 osd_map_cache_size = 1024 osd_map_cache_bl_size = 128 osd_recovery_op_priority = 1 osd_max_recovery_max_active = 1 osd_recovery_max_backfills = 1 osd_op_threads = 32 osd_disk_threads = 8 is that normal or a bottleneck? best regards Patrik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Broken snapshots... CEPH 0.94.2
Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9) - up ([12,44,23], p12) acting ([12,44,23], p12) Also we use cache layer, which in current moment - in forward mode... Can you please help me with this.. As my brain stop to understand what is going on... Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph distributed osd
Hi All, We need to test three OSD and one image with replica 2(size 1GB). While testing data is not writing above 1GB. Is there any option to write on third OSD. ceph osd pool get repo pg_num pg_num: 126 # rbd showmapped id pool image snap device 0 rbd integdownloads -/dev/rbd0 -- Already one 2 repo integrepotest -/dev/rbd2 -- newly created [root@hm2 repository]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda5ext4 289G 18G 257G 7% / devtmpfs devtmpfs 252G 0 252G 0% /dev tmpfstmpfs 252G 0 252G 0% /dev/shm tmpfstmpfs 252G 538M 252G 1% /run tmpfstmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sda2ext4 488M 212M 241M 47% /boot /dev/sda4ext4 1.9T 20G 1.8T 2% /var /dev/mapper/vg0-zoho ext4 8.6T 1.7T 6.5T 21% /zoho /dev/rbd0ocfs2 977G 101G 877G 11% /zoho/build/downloads /dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository @:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/ root@integ-hm2's password: sample.txt 100% 1024MB 4.5MB/s 03:48 scp: /zoho/build/repository//sample.txt: No space left on device Regards Prabu On Thu, 13 Aug 2015 19:42:11 +0530 gjprabu lt;gjpr...@zohocorp.comgt; wrote Dear Team, We are using two ceph OSD with replica 2 and it is working properly. Here my doubt is (Pool A -image size will be 10GB) and its replicated with two OSD, what will happen suppose if the size reached the limit, Is there any chance to make the data to continue writing in another two OSD's. Regards Prabu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph distributed osd
Hi All, Also please find osd information. ceph osd dump | grep 'replicated size' pool 2 'repo' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 126 pgp_num 126 last_change 21573 flags hashpspool stripe_width 0 Regards Prabu On Mon, 17 Aug 2015 11:58:55 +0530 gjprabu lt;gjpr...@zohocorp.comgt; wrote Hi All, We need to test three OSD and one image with replica 2(size 1GB). While testing data is not writing above 1GB. Is there any option to write on third OSD. ceph osd pool get repo pg_num pg_num: 126 # rbd showmapped id pool image snap device 0 rbd integdownloads -/dev/rbd0 -- Already one 2 repo integrepotest -/dev/rbd2 -- newly created [root@hm2 repository]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda5ext4 289G 18G 257G 7% / devtmpfs devtmpfs 252G 0 252G 0% /dev tmpfstmpfs 252G 0 252G 0% /dev/shm tmpfstmpfs 252G 538M 252G 1% /run tmpfstmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sda2ext4 488M 212M 241M 47% /boot /dev/sda4ext4 1.9T 20G 1.8T 2% /var /dev/mapper/vg0-zoho ext4 8.6T 1.7T 6.5T 21% /zoho /dev/rbd0ocfs2 977G 101G 877G 11% /zoho/build/downloads /dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository @:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/ root@integ-hm2's password: sample.txt 100% 1024MB 4.5MB/s 03:48 scp: /zoho/build/repository//sample.txt: No space left on device Regards Prabu On Thu, 13 Aug 2015 19:42:11 +0530 gjprabu lt;gjpr...@zohocorp.comgt; wrote Dear Team, We are using two ceph OSD with replica 2 and it is working properly. Here my doubt is (Pool A -image size will be 10GB) and its replicated with two OSD, what will happen suppose if the size reached the limit, Is there any chance to make the data to continue writing in another two OSD's. Regards Prabu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com