Re: [ceph-users] tcmalloc use a lot of CPU

2015-08-17 Thread Alexandre DERUMIER
Hi,

Is this phenomenon normal?Is there any idea about this problem?

It's a known problem with tcmalloc (search on the ceph mailing).

starting osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M environnement 
variable should help.


Another way, is to compile ceph with jemalloc instead tcmalloc (./configure 
--with-jemalloc ...)



- Mail original -
De: YeYin ey...@qq.com
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Lundi 17 Août 2015 11:58:26
Objet: [ceph-users] tcmalloc use a lot of CPU

Hi, all, 
When I do performance test with rados bench, I found tcmalloc consumed a lot of 
CPU: 

Samples: 265K of event 'cycles', Event count (approx.): 104385445900 
+ 27.58% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::FetchFromSpans() 
+ 15.25% libtcmalloc.so.4.1.0 [.] 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long, 
+ 12.20% libtcmalloc.so.4.1.0 [.] 
tcmalloc::CentralFreeList::ReleaseToSpans(void*) 
+ 1.63% perf [.] append_chain 
+ 1.39% libtcmalloc.so.4.1.0 [.] 
tcmalloc::CentralFreeList::ReleaseListToSpans(void*) 
+ 1.02% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::RemoveRange(void**, 
void**, int) 
+ 0.85% libtcmalloc.so.4.1.0 [.] 0x00017e6f 
+ 0.75% libtcmalloc.so.4.1.0 [.] 
tcmalloc::ThreadCache::IncreaseCacheLimitLocked() 
+ 0.67% libc-2.12.so [.] memcpy 
+ 0.53% libtcmalloc.so.4.1.0 [.] operator delete(void*) 

Ceph version: 
# ceph --version 
ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e) 

Kernel version: 
3.10.83 

Is this phenomenon normal? Is there any idea about this problem? 

Thanks. 
Ye 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] НА: tcmalloc use a lot of CPU

2015-08-17 Thread Межов Игорь Александрович
Hi!

We also observe the same behavior on our test Hammer install, and I wrote about 
it some time ago:

http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609

Jan Schremes give us some suggestions in thread, but we still not got any 
positive results - TCMalloc usage is
high. The usage is lowered to 10%, when disable crc in messages, disable debug 
and disable cephx auth,
but this is od course not for production use. Also we got a different trace, 
while performin FIO-RBD benchmarks
on ssd pool:
---
  46,07%  [kernel]  [k] _raw_spin_lock
   6,51%  [kernel]  [k] mb_cache_entry_alloc
   5,74%  libtcmalloc.so.4.2.2  [.] 
tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
   5,50%  libtcmalloc.so.4.2.2  [.] tcmalloc::SLL_Next(void*)
   3,86%  libtcmalloc.so.4.2.2  [.] TCMalloc_PageMap335::get(unsigned long) 
const
   2,73%  libtcmalloc.so.4.2.2  [.] 
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
   0,69%  libtcmalloc.so.4.2.2  [.] 
tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
   0,69%  libtcmalloc.so.4.2.2  [.] tcmalloc::PageHeap::GetDescriptor(unsigned 
long) const
   0,64%  libtcmalloc.so.4.2.2  [.] tcmalloc::SLL_PopRange(void**, int, void**, 
void**)
---

I dont clearly understand, what's happening in this case: ssd pool is connected 
to the same host,
but different controller (C60X onboard instead of LSI2208), io scheduler set to 
noop, pool is gathered
from 4х400Gb Intel DC S3700 and have to perform better, I think - more than 
30-40 kops.
But we got the trace above and no more then 12-15 kiops. Where can be a problem?






Megov Igor
CIO, Yuterra


От: ceph-users ceph-users-boun...@lists.ceph.com от имени YeYin ey...@qq.com
Отправлено: 17 августа 2015 г. 12:58
Кому: ceph-users
Тема: [ceph-users] tcmalloc use a lot of CPU

Hi, all,
  When I do performance test with rados bench, I found tcmalloc consumed a lot 
of CPU:

Samples: 265K of event 'cycles', Event count (approx.): 104385445900
+  27.58%  libtcmalloc.so.4.1.0[.] 
tcmalloc::CentralFreeList::FetchFromSpans()
+  15.25%  libtcmalloc.so.4.1.0[.] 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long,
+  12.20%  libtcmalloc.so.4.1.0[.] 
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
+   1.63%  perf[.] append_chain
+   1.39%  libtcmalloc.so.4.1.0[.] 
tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
+   1.02%  libtcmalloc.so.4.1.0[.] 
tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
+   0.85%  libtcmalloc.so.4.1.0[.] 0x00017e6f
+   0.75%  libtcmalloc.so.4.1.0[.] 
tcmalloc::ThreadCache::IncreaseCacheLimitLocked()
+   0.67%  libc-2.12.so[.] memcpy
+   0.53%  libtcmalloc.so.4.1.0[.] operator delete(void*)

Ceph version:
# ceph --version
ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e)

Kernel version:
3.10.83

Is this phenomenon normal?Is there any idea about this problem?

Thanks.
Ye

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] НА: НА: CEPH cache layer. Very slow

2015-08-17 Thread Межов Игорь Александрович
Hi!

6 nodes, 70 OSDs (1-2-4Tb sata drives).
Ceph used as RBD backstore for VM images (~100VMs).

Megov Igor
CIO, Yuterra



От: Ben Hines bhi...@gmail.com
Отправлено: 14 августа 2015 г. 21:01
Кому: Межов Игорь Александрович
Копия: Voloshanenko Igor; ceph-users@lists.ceph.com
Тема: Re: [ceph-users] НА: CEPH cache layer. Very slow

Nice to hear that you have no SSD failures yet in 10months.

How many OSDs are you running, and what is your primary ceph workload?
(RBD, rgw, etc?)

-Ben

On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович
me...@yuterra.ru wrote:
 Hi!


 Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for ceph
 journals
 and DC S3700 400Gb in the SSD pool: same hosts, separate root in crushmap.

 SSD pool are not yet in production, journаlling SSDs works under production
 load
 for 10 months. They're in good condition - no faults, no degradation.

 We specially take 200Gb SSD for journals to reduce costs, and also have a
 higher
 than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended
 1/3 to 1/6.

 So, as a conclusion - I'll recommend you to get a bigger budget and buy
 durable
 and fast SSDs for Ceph.

 Megov Igor
 CIO, Yuterra

 
 От: ceph-users ceph-users-boun...@lists.ceph.com от имени Voloshanenko
 Igor igor.voloshane...@gmail.com
 Отправлено: 13 августа 2015 г. 15:54
 Кому: Jan Schermer
 Копия: ceph-users@lists.ceph.com
 Тема: Re: [ceph-users] CEPH cache layer. Very slow

 So, good, but price for 845 DC PRO 400 GB higher in about 2x times than
 intel S3500 240G (((

 Any other models? (((

 2015-08-13 15:45 GMT+03:00 Jan Schermer j...@schermer.cz:

 I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO
 and not just PRO or DC EVO!).
 Those were very cheap but are out of stock at the moment (here).
 Faster than Intels, cheaper, and slightly different technology (3D V-NAND)
 which IMO makes them superior without needing many tricks to do its job.

 Jan

 On 13 Aug 2015, at 14:40, Voloshanenko Igor igor.voloshane...@gmail.com
 wrote:

 Tnx, Irek! Will try!

 but another question to all, which SSD good enough for CEPH now?

 I'm looking into S3500 240G (I have some S3500 120G which show great
 results. Around 8x times better than Samsung)

 Possible you can give advice about other vendors/models with same or below
 price level as S3500 240G?

 2015-08-13 12:11 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 Hi, Igor.
 Try to roll the patch here:

 http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov

 P.S. I am no longer tracks changes in this direction(kernel), because we
 use already recommended SSD

 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757

 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor
 igor.voloshane...@gmail.com:

 So, after testing SSD (i wipe 1 SSD, and used it for tests)

 root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
 --gr[53/1800]
 ting --name=journal-test
 journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
 iodepth=1
 fio-2.1.3
 Starting 1 process
 Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
 00m:00s]
 journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
 10:46:42 2015
   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
 clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
  lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
 clat percentiles (usec):
  |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[
 2928],
  | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[
 3408],
  | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[
 4016],
  | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792],
 99.95th=[10048],
  | 99.99th=[14912]
 bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
 stdev=34.31
 lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
   cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
 =64=0.0%
  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
  issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0

 Run status group 0 (all jobs):
   WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
 mint=60001msec, maxt=60001msec

 Disk stats (read/write):
   sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576,
 util=99.30%

 So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s

 I try to change cache mode :
 echo temporary write through  /sys/class/scsi_disk/2:0:0:0/cache_type
 echo temporary write through  /sys/class/scsi_disk/3:0:0:0/cache_type

 no luck, still same shit results, also i found this article:
 https://lkml.org/lkml/2013/11/20/264 pointed 

Re: [ceph-users] ceph distributed osd

2015-08-17 Thread gjprabu
Hi All,



   Anybody can help on this issue.



Regards

Prabu


  On Mon, 17 Aug 2015 12:08:28 +0530 gjprabu lt;gjpr...@zohocorp.comgt; 
wrote 




Hi All,



   Also please find osd information.



ceph osd dump | grep 'replicated size'

pool 2 'repo' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 126 pgp_num 126 last_change 21573 flags hashpspool stripe_width 0



Regards

Prabu










 On Mon, 17 Aug 2015 11:58:55 +0530 gjprabu lt;gjpr...@zohocorp.comgt; 
wrote 











Hi All,



   We need to test three OSD and one image with replica 2(size 1GB). While 
testing data is not writing above 1GB. Is there any option to write on third 
OSD.



ceph osd pool get  repo  pg_num

pg_num: 126



# rbd showmapped 

id pool image  snap device

0  rbd  integdownloads -/dev/rbd0 -- Already one

2  repo integrepotest  -/dev/rbd2  -- newly created





[root@hm2 repository]# df -Th

Filesystem   Type  Size  Used Avail Use% Mounted on

/dev/sda5ext4  289G   18G  257G   7% /

devtmpfs devtmpfs  252G 0  252G   0% /dev

tmpfstmpfs 252G 0  252G   0% /dev/shm

tmpfstmpfs 252G  538M  252G   1% /run

tmpfstmpfs 252G 0  252G   0% /sys/fs/cgroup

/dev/sda2ext4  488M  212M  241M  47% /boot

/dev/sda4ext4  1.9T   20G  1.8T   2% /var

/dev/mapper/vg0-zoho ext4  8.6T  1.7T  6.5T  21% /zoho

/dev/rbd0ocfs2 977G  101G  877G  11% /zoho/build/downloads

/dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository



@:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/

root@integ-hm2's password: 

sample.txt  
   100% 1024MB   4.5MB/s   03:48

scp: /zoho/build/repository//sample.txt: No space left on device



Regards

Prabu










 On Thu, 13 Aug 2015 19:42:11 +0530 gjprabu lt;gjpr...@zohocorp.comgt; 
wrote 











Dear Team,



 We are using two ceph OSD with replica 2 and it is working properly. 
Here my doubt is (Pool A -image size will be 10GB) and its replicated with two 
OSD, what will happen suppose if the size reached the limit, Is there any 
chance to make the data to continue writing in another two OSD's.



Regards

Prabu


















___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] tcmalloc use a lot of CPU

2015-08-17 Thread YeYin
Hi, all,
  When I do performance test with rados bench, I found tcmalloc consumed a lot 
of CPU:


Samples: 265K of event 'cycles', Event count (approx.): 104385445900
+  27.58%  libtcmalloc.so.4.1.0[.] 
tcmalloc::CentralFreeList::FetchFromSpans()
+  15.25%  libtcmalloc.so.4.1.0[.] 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long,
+  12.20%  libtcmalloc.so.4.1.0[.] 
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
+   1.63%  perf[.] append_chain
+   1.39%  libtcmalloc.so.4.1.0[.] 
tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
+   1.02%  libtcmalloc.so.4.1.0[.] 
tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
+   0.85%  libtcmalloc.so.4.1.0[.] 0x00017e6f
+   0.75%  libtcmalloc.so.4.1.0[.] 
tcmalloc::ThreadCache::IncreaseCacheLimitLocked()
+   0.67%  libc-2.12.so[.] memcpy
+   0.53%  libtcmalloc.so.4.1.0[.] operator delete(void*)



Ceph version:
# ceph --version
ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e)



Kernel version:
3.10.83


Is this phenomenon normal?Is there any idea about this problem?


Thanks.
Ye___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to improve single thread sequential reads?

2015-08-17 Thread Nick Fisk
Thanks for the replies guys.

The client is set to 4MB, I haven't played with the OSD side yet as I wasn't
sure if it would make much difference, but I will give it a go. If the
client is already passing a 4MB request down through to the OSD, will it be
able to readahead any further? The next 4MB object in theory will be on
another OSD and so I'm not sure if reading ahead any further on the OSD side
would help.

How I see the problem is that the RBD client will only read 1 OSD at a time
as the RBD readahead can't be set any higher than max_hw_sectors_kb, which
is the object size of the RBD. Please correct me if I'm wrong on this.

If you could set the RBD readahead to much higher than the object size, then
this would probably give the desired effect where the buffer could be
populated by reading from several OSD's in advance to give much higher
performance. That or wait for striping to appear in the Kernel client.

I've also found that BareOS (fork of Bacula) seems to has a direct RADOS
feature that supports radosstriper. I might try this and see how it performs
as well.


 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Somnath Roy
 Sent: 17 August 2015 03:36
 To: Alex Gorbachev a...@iss-integration.com; Nick Fisk n...@fisk.me.uk
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] How to improve single thread sequential reads?
 
 Have you tried setting read_ahead_kb to bigger number for both client/OSD
 side if you are using krbd ?
 In case of librbd, try the different config options for rbd cache..
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Alex Gorbachev
 Sent: Sunday, August 16, 2015 7:07 PM
 To: Nick Fisk
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] How to improve single thread sequential reads?
 
 Hi Nick,
 
 On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk n...@fisk.me.uk wrote:
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
  Of Nick Fisk
  Sent: 13 August 2015 18:04
  To: ceph-users@lists.ceph.com
  Subject: [ceph-users] How to improve single thread sequential reads?
 
  Hi,
 
  I'm trying to use a RBD to act as a staging area for some data before
  pushing
  it down to some LTO6 tapes. As I cannot use striping with the kernel
  client I
  tend to be maxing out at around 80MB/s reads testing with DD. Has
  anyone got any clever suggestions of giving this a bit of a boost, I
  think I need
  to get it
  up to around 200MB/s to make sure there is always a steady flow of
  data to the tape drive.
 
  I've just tried the testing kernel with the blk-mq fixes in it for
  full size IO's, this combined with bumping readahead up to 4MB, is now
  getting me on average 150MB/s to 200MB/s so this might suffice.
 
  On a personal interest, I would still like to know if anyone has ideas
  on how to really push much higher bandwidth through a RBD.
 
 Some settings in our ceph.conf that may help:
 
 osd_op_threads = 20
 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k
 filestore_queue_max_ops = 9 filestore_flusher = false
 filestore_max_sync_interval = 10 filestore_sync_flush = false
 
 Regards,
 Alex
 
 
 
  Rbd-fuse seems to top out at 12MB/s, so there goes that option.
 
  I'm thinking mapping multiple RBD's and then combining them into a
  mdadm
  RAID0 stripe might work, but seems a bit messy.
 
  Any suggestions?
 
  Thanks,
  Nick
 
 
 
 
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message is
 intended only for the use of the designated recipient(s) named above. If
the
 reader of this message is not the intended recipient, you are hereby
notified
 that you have received this message in error and that any review,
 dissemination, distribution, or copying of this message is strictly
prohibited. If
 you have received this communication in error, please notify the sender by
 telephone or e-mail (as shown above) immediately and destroy any and all
 copies of this message in your possession (whether hard copies or
 electronically stored copies).
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] any recommendation of using EnhanceIO?

2015-08-17 Thread Alex Gorbachev
What about https://github.com/Frontier314/EnhanceIO?  Last commit 2
months ago, but no external contributors :(

The nice thing about EnhanceIO is there is no need to change device
name, unlike bcache, flashcache etc.

Best regards,
Alex

On Thu, Jul 23, 2015 at 11:02 AM, Daniel Gryniewicz d...@redhat.com wrote:
 I did some (non-ceph) work on these, and concluded that bcache was the best
 supported, most stable, and fastest.  This was ~1 year ago, to take it with
 a grain of salt, but that's what I would recommend.

 Daniel


 
 From: Dominik Zalewski dzalew...@optlink.net
 To: German Anders gand...@despegar.com
 Cc: ceph-users ceph-users@lists.ceph.com
 Sent: Wednesday, July 1, 2015 5:28:10 PM
 Subject: Re: [ceph-users] any recommendation of using EnhanceIO?


 Hi,

 I’ve asked same question last weeks or so (just search the mailing list
 archives for EnhanceIO :) and got some interesting answers.

 Looks like the project is pretty much dead since it was bought out by HGST.
 Even their website has some broken links in regards to EnhanceIO

 I’m keen to try flashcache or bcache (its been in the mainline kernel for
 some time)

 Dominik

 On 1 Jul 2015, at 21:13, German Anders gand...@despegar.com wrote:

 Hi cephers,

Is anyone out there that implement enhanceIO in a production environment?
 any recommendation? any perf output to share with the diff between using it
 and not?

 Thanks in advance,

 German
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is there a way to configure a cluster_network for a running cluster?

2015-08-17 Thread Will . Boege
Thinking this through, pretty sure you would need to take your cluster
offline to do this.  I can¹t think of a scenario where you could reliably
keep quorum as you swap your monitors to use the cluster network.

On 8/10/15, 8:59 AM, Daniel Marks daniel.ma...@codecentric.de wrote:

Hi all,

we just found out that our ceph-cluster communicates over the ceph public
network only. Looks like we forgot to configure the cluster_network
parameter during deployment ( :facepalm: ). We are running ceph version
0.94.1 on ubuntu 14.04.1

Is there any documentation or any known procedure to properly configure a
ceph_cluster network for a running cluster (maybe via injectargs)? In
which order should OSDs, MONs and MDSs be configured?

Best regards,
Daniel Marks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Question

2015-08-17 Thread Kris Vaes
Hi,

Maybe this seems like a strange question but i could not find this info in the 
docs , i have following question,

For the ceph cluster you need osd daemons and monitor daemons,

On a host you can run several osd daemons (best one per drive as read in the 
docs) on one host

But now my question  can you run on the same host where you run already some 
osd daemons the monitor daemon

Is this possible and what are the implications of doing this



Met Vriendelijke Groeten
Cordialement
Kind Regards
Cordialmente
С приятелски поздрави

[cid:D87E97BC-3D4F-4F8A-AC12-37B6FD3C2E40]

This message (including any attachments) may be privileged or confidential. If 
you have received it by mistake, please notify the sender by return e-mail and 
delete this message from your system. Any unauthorized use or dissemination of 
this message in whole or in part is strictly prohibited. S3S rejects any 
liability for the improper, incomplete or delayed transmission of the 
information contained in this message, as well as for damages resulting from 
this e-mail message. S3S cannot guarantee that the message received by you has 
not been intercepted by third parties and/or manipulated by computer programs 
used to transmit messages and viruses.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question

2015-08-17 Thread Luis Periquito
yes. The issue is resource sharing as usual: the MONs will use disk I/O,
memory and CPU. If the cluster is small (test?) then there's no problem in
using the same disks. If the cluster starts to get bigger you may want to
dedicate resources (e.g. the disk for the MONs isn't used by an OSD). If
the cluster is big enough you may want to dedicate a node for being a MON.

On Mon, Aug 17, 2015 at 2:56 PM, Kris Vaes k...@s3s.eu wrote:

 Hi,

 Maybe this seems like a strange question but i could not find this info in
 the docs , i have following question,

 For the ceph cluster you need osd daemons and monitor daemons,

 On a host you can run several osd daemons (best one per drive as read in
 the docs) on one host

 But now my question  can you run on the same host where you run already
 some osd daemons the monitor daemon

 Is this possible and what are the implications of doing this



 Met Vriendelijke Groeten
 Cordialement
 Kind Regards
 Cordialmente
 С приятелски поздрави


 This message (including any attachments) may be privileged or
 confidential. If you have received it by mistake, please notify the sender
 by return e-mail and delete this message from your system. Any unauthorized
 use or dissemination of this message in whole or in part is strictly
 prohibited. S3S rejects any liability for the improper, incomplete or
 delayed transmission of the information contained in this message, as well
 as for damages resulting from this e-mail message. S3S cannot guarantee
 that the message received by you has not been intercepted by third parties
 and/or manipulated by computer programs used to transmit messages and
 viruses.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw keystone integration

2015-08-17 Thread Logan V.
After setting up radosgw federated configuration last week and
integrating with openstack keystone auth, I have a question regarding
the configuration.

In the Keystone setup instructions for Kilo, the admin token auth
method is disabled:
http://docs.openstack.org/kilo/install-guide/install/apt/content/keystone-verify.html
For security reasons, disable the temporary authentication token mechanism:

Edit the /etc/keystone/keystone-paste.ini file and remove
admin_token_auth from the [pipeline:public_api], [pipeline:admin_api],
and [pipeline:api_v3] sections.

So after using this setup guide for kilo, the environment is not
compatible with radosgw because apparently radosgw requires admin
token auth. This is not documented at
http://ceph.com/docs/master/radosgw/keystone/ and resulted in a really
frustrating day of troubleshooting why keystone was rejecting
radosgw's attempts to load the token revocation list.

So first, I think this requirement should be listed on the
radosgw/keystone integration setup instructions.

Long term, I am curious if ceph intends to continue using this
temporary authentication mechanism that is recommended to be
disabled after bootstrapping Keystone's setup by openstack.

For reference, these are the kinds of errors seen when the admin token
auth is disabled as recommended:
ceph rgw node:
T 10.13.32.6:42533 - controller:5000 [AP]
  GET /v2.0/tokens/revoked HTTP/1.1..Host: controller:5000..Accept:
*/*..Transfer-Encoding: chunked..X-Auth-Token: removed..Expect:
100-continue
##
T controller:5000 - 10.13.32.6:42533 [AP]
  HTTP/1.1 100 Continue
##
T 10.13.32.6:42533 - controller:5000 [AP]
  0
#
T controller:5000 - 10.13.32.6:42533 [AP]
  HTTP/1.1 403 Forbidden..Date: Sat, 15 Aug 2015 00:46:58 GMT..Server:
Apache/2.4.7 (Ubuntu)..Vary: X-Auth-Token..X-Distribution:
Ubuntu..x-openstack-request-id: req-869523c8-12bb-46d4-9d5b
  -89e0efd1dc38..Content-Length: 141..Content-Type:
application/json{error: {message: You are not authorized to
perform the requested action: identity:revocation_list, code: 403
  , title: Forbidden}}

root@radosgw-template:~# radosgw --id radosgw.us-dfw-1 -d
2015-08-15 00:51:17.992497 7ff2281e0840  0 ceph version 0.94.2
(5fb85614ca8f354284c713a2f9c610860720bbf3), process radosgw, pid 15381
2015-08-15 00:51:18.515909 7ff2281e0840  0 framework: fastcgi
2015-08-15 00:51:18.515927 7ff2281e0840  0 framework: civetweb
2015-08-15 00:51:18.515946 7ff2281e0840  0 framework conf key: port, val: 7480
2015-08-15 00:51:18.515958 7ff2281e0840  0 starting handler: civetweb
2015-08-15 00:51:18.529113 7ff2281e0840  0 starting handler: fastcgi
2015-08-15 00:51:18.541553 7ff1a67fc700  0 revoked tokens response is
missing signed section
2015-08-15 00:51:18.541573 7ff1a67fc700  0 ERROR: keystone revocation
processing returned error r=-22
2015-08-15 00:51:21.222619 7ff1a6ffd700  0 ERROR: can't read user header: ret=-2
2015-08-15 00:51:21.222648 7ff1a6ffd700  0 ERROR: sync_user() failed,
user=us-dfw ret=-2


keystone error log:
2015-08-14 19:46:58.582172 2015-08-14 19:46:58.582 8782 WARNING
keystone.common.wsgi [-] You are not authorized to perform the
requested action: identity:revocation_list
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] НА: tcmalloc use a lot of CPU

2015-08-17 Thread Luis Periquito
How big are those OPS? Are they random? How many nodes? How many SSDs/OSDs?
What are you using to make the tests? Using atop on the OSD nodes where is
your bottleneck?

On Mon, Aug 17, 2015 at 1:05 PM, Межов Игорь Александрович me...@yuterra.ru
 wrote:

 Hi!

 We also observe the same behavior on our test Hammer install, and I wrote
 about it some time ago:

 http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609
 http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609

 Jan Schremes give us some suggestions in thread, but we still not got any
 positive results - TCMalloc usage is
 high. The usage is lowered to 10%, when disable crc in messages, disable
 debug and disable cephx auth,
 but this is od course not for production use. Also we got a different
 trace, while performin FIO-RBD benchmarks
 on ssd pool:
 ---
   46,07%  [kernel]  [k] _raw_spin_lock
6,51%  [kernel]  [k] mb_cache_entry_alloc
5,74%  libtcmalloc.so.4.2.2  [.]
 tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
5,50%  libtcmalloc.so.4.2.2  [.] tcmalloc::SLL_Next(void*)
3,86%  libtcmalloc.so.4.2.2  [.] TCMalloc_PageMap335::get(unsigned
 long) const
2,73%  libtcmalloc.so.4.2.2  [.]
 tcmalloc::CentralFreeList::ReleaseToSpans(void*)
0,69%  libtcmalloc.so.4.2.2  [.]
 tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
0,69%  libtcmalloc.so.4.2.2  [.]
 tcmalloc::PageHeap::GetDescriptor(unsigned long) const
0,64%  libtcmalloc.so.4.2.2  [.] tcmalloc::SLL_PopRange(void**, int,
 void**, void**)
 ---

 I dont clearly understand, what's happening in this case: ssd pool is
 connected to the same host,
 but different controller (C60X onboard instead of LSI2208), io scheduler
 set to noop, pool is gathered
 from 4х400Gb Intel DC S3700 and have to perform better, I think - more
 than 30-40 kops.
 But we got the trace above and no more then 12-15 kiops. Where can be a
 problem?






 Megov Igor
 CIO, Yuterra

 --
 *От:* ceph-users ceph-users-boun...@lists.ceph.com от имени YeYin 
 ey...@qq.com
 *Отправлено:* 17 августа 2015 г. 12:58
 *Кому:* ceph-users
 *Тема:* [ceph-users] tcmalloc use a lot of CPU

 Hi, all,
   When I do performance test with rados bench, I found tcmalloc consumed a
 lot of CPU:

 Samples: 265K of event 'cycles', Event count (approx.): 104385445900
 +  27.58%  libtcmalloc.so.4.1.0[.]
 tcmalloc::CentralFreeList::FetchFromSpans()
 +  15.25%  libtcmalloc.so.4.1.0[.]
 tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
 unsigned long,
 +  12.20%  libtcmalloc.so.4.1.0[.]
 tcmalloc::CentralFreeList::ReleaseToSpans(void*)
 +   1.63%  perf[.] append_chain
 +   1.39%  libtcmalloc.so.4.1.0[.]
 tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
 +   1.02%  libtcmalloc.so.4.1.0[.]
 tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
 +   0.85%  libtcmalloc.so.4.1.0[.] 0x00017e6f
 +   0.75%  libtcmalloc.so.4.1.0[.]
 tcmalloc::ThreadCache::IncreaseCacheLimitLocked()
 +   0.67%  libc-2.12.so[.] memcpy
 +   0.53%  libtcmalloc.so.4.1.0[.] operator delete(void*)

 Ceph version:
 # ceph --version
 ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e)

 Kernel version:
 3.10.83

 Is this phenomenon normal?Is there any idea about this problem?

 Thanks.
 Ye


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph distributed osd

2015-08-17 Thread Luis Periquito
I don't understand your question? You created a 1G RBD/disk and it's full.
You are able to grow it though - but that's a Linux management issue, not
ceph.

As everything is thin-provisioned you can create a RBD with an arbitrary
size - I've create one with 1PB when the cluster only had 600G/Raw
available.

On Mon, Aug 17, 2015 at 1:18 PM, gjprabu gjpr...@zohocorp.com wrote:

 Hi All,

Anybody can help on this issue.

 Regards
 Prabu

  On Mon, 17 Aug 2015 12:08:28 +0530 *gjprabu gjpr...@zohocorp.com
 gjpr...@zohocorp.com* wrote 

 Hi All,

Also please find osd information.

 ceph osd dump | grep 'replicated size'
 pool 2 'repo' replicated size 2 min_size 2 crush_ruleset 0 object_hash
 rjenkins pg_num 126 pgp_num 126 last_change 21573 flags hashpspool
 stripe_width 0

 Regards
 Prabu




  On Mon, 17 Aug 2015 11:58:55 +0530 *gjprabu gjpr...@zohocorp.com
 gjpr...@zohocorp.com* wrote 



 Hi All,

We need to test three OSD and one image with replica 2(size 1GB). While
 testing data is not writing above 1GB. Is there any option to write on
 third OSD.

 *ceph osd pool get  repo  pg_num*
 *pg_num: 126*

 *# rbd showmapped *
 *id pool image  snap device*
 *0  rbd  integdownloads -/dev/rbd0 *-- *Already one*
 *2  repo integrepotest  -/dev/rbd2  -- newly created*


 [root@hm2 repository]# df -Th
 Filesystem   Type  Size  Used Avail Use% Mounted on
 /dev/sda5ext4  289G   18G  257G   7% /
 devtmpfs devtmpfs  252G 0  252G   0% /dev
 tmpfstmpfs 252G 0  252G   0% /dev/shm
 tmpfstmpfs 252G  538M  252G   1% /run
 tmpfstmpfs 252G 0  252G   0% /sys/fs/cgroup
 /dev/sda2ext4  488M  212M  241M  47% /boot
 /dev/sda4ext4  1.9T   20G  1.8T   2% /var
 /dev/mapper/vg0-zoho ext4  8.6T  1.7T  6.5T  21% /zoho
 /dev/rbd0ocfs2 977G  101G  877G  11% /zoho/build/downloads
 */dev/rbd2ocfs21000M 1000M 0 100%
 /zoho/build/repository*

 @:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/
 root@integ-hm2's password:
 sample.txt
 100% 1024MB   4.5MB/s   03:48
 scp: /zoho/build/repository//sample.txt: *No space left on device*

 Regards
 Prabu




  On Thu, 13 Aug 2015 19:42:11 +0530 *gjprabu gjpr...@zohocorp.com
 gjpr...@zohocorp.com* wrote 



 Dear Team,

  We are using two ceph OSD with replica 2 and it is working
 properly. Here my doubt is (Pool A -image size will be 10GB) and its
 replicated with two OSD, what will happen suppose if the size reached the
 limit, Is there any chance to make the data to continue writing in another
 two OSD's.

 Regards
 Prabu







 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck creating pg

2015-08-17 Thread Bart Vanbrabant

1)

~# ceph pg 5.6c7 query
Error ENOENT: i don't have pgid 5.6c7

In the osd log:

2015-08-17 16:11:45.185363 7f311be40700  0 osd.19 64706 do_command r=-2 
i don't have pgid 5.6c7
2015-08-17 16:11:45.185380 7f311be40700  0 log_channel(cluster) log 
[INF] : i don't have pgid 5.6c7


2) I do not see anything wrong with this rule:

{
rule_id: 0,
rule_name: data,
ruleset: 0,
type: 1,
min_size: 1,
max_size: 10,
steps: [
{
op: take,
item: -1,
item_name: default
},
{
op: chooseleaf_firstn,
num: 0,
type: host
},
{
op: emit
}
]
},

3) I rebooted all machines in the cluster and increased the replication 
level of the affected pool to 3, to be more sure. After recovery of this 
reboot we are currently in the current state:


HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; 103 requests are 
blocked  32 sec; 2 osds have slow requests; pool volumes pg_num 2048  
pgp_num 1400
pg 5.6c7 is stuck inactive since forever, current state creating, last 
acting [19,25,17]
pg 5.6c7 is stuck unclean since forever, current state creating, last 
acting [19,25,17]

103 ops are blocked  524.288 sec
19 ops are blocked  524.288 sec on osd.19
84 ops are blocked  524.288 sec on osd.25
2 osds have slow requests
pool volumes pg_num 2048  pgp_num 1400

Thanks,

Bart

On 08/17/2015 03:44 PM, minchen wrote:

It looks like the crushrule does't work properly by osdmap changed,
 there are 3 pgs unclean: 5.6c7  5.2c7  15.2bd
I think you can try follow method to help locate the problem
1st,  ceph pg pgid query to lookup detail of pg state,
eg, blocked by which osd?
2st, check the crushrule
ceph osd crush rule dump
and check the crush_ruleset for pools: 5 , 15
eg,  the chooseleaf may be not choose the right osd ?
minchen
-- Original --
*From: * Bart Vanbrabant;b...@vanbrabant.eu;
*Date: * Sun, Aug 16, 2015 07:27 PM
*To: * ceph-usersceph-users@lists.ceph.com;
*Subject: * [ceph-users] Stuck creating pg

Hi,

I have a ceph cluster with 26 osd's in 4 hosts only use for rbd for an 
OpenStack cluster (started at 0.48 I think), currently running 0.94.2 
on Ubuntu 14.04. A few days ago one of the osd's was at 85% disk usage 
while only 30% of the raw disk space is used. I 
ran reweight-by-utilization with 150 was cutoff level. This reshuffled 
the data. I also noticed that the number of pg was still at the level 
when there were less disks in the cluster (1300).


Based on the current guidelines I increased pg_num to 2048. It created 
the placement groups except for the last one. To try to force the 
creation of the pg I removed the OSD's (ceph osd out) assigned to that 
pg but that makes no difference. Currently all OSD's are back in and 
two pg's are also stuck in an unclean state:


ceph health detail:

HEALTH_WARN 2 pgs degraded; 2 pgs stale; 2 pgs stuck degraded; 1 pgs 
stuck inactive; 2 pgs stuck stale; 3 pgs stuck unclean; 2 pgs stuck 
undersized; 2 pgs undersized; 59 requests are blocked  32 sec; 3 osds 
have slow requests; recovery 221/549658 objects degraded (0.040%); 
recovery 221/549658 objects misplaced (0.040%); pool volumes pg_num 
2048  pgp_num 1400
pg 5.6c7 is stuck inactive since forever, current state creating, last 
acting [19,25]
pg 5.6c7 is stuck unclean since forever, current state creating, last 
acting [19,25]
pg 5.2c7 is stuck unclean for 313513.609864, current state 
stale+active+undersized+degraded+remapped, last acting [9]
pg 15.2bd is stuck unclean for 313513.610368, current state 
stale+active+undersized+degraded+remapped, last acting [9]
pg 5.2c7 is stuck undersized for 308381.750768, current state 
stale+active+undersized+degraded+remapped, last acting [9]
pg 15.2bd is stuck undersized for 308381.751913, current state 
stale+active+undersized+degraded+remapped, last acting [9]
pg 5.2c7 is stuck degraded for 308381.750876, current state 
stale+active+undersized+degraded+remapped, last acting [9]
pg 15.2bd is stuck degraded for 308381.752021, current state 
stale+active+undersized+degraded+remapped, last acting [9]
pg 5.2c7 is stuck stale for 281750.295301, current state 
stale+active+undersized+degraded+remapped, last acting [9]
pg 15.2bd is stuck stale for 281750.295293, current state 
stale+active+undersized+degraded+remapped, last acting [9]

16 ops are blocked  268435 sec
10 ops are blocked  134218 sec
10 ops are blocked  1048.58 sec
23 ops are blocked  524.288 sec
16 ops are blocked  268435 sec on osd.1
8 ops are blocked  134218 sec on osd.17
2 ops are blocked  134218 sec on osd.19
10 ops are blocked  1048.58 sec on osd.19
23 ops are blocked  524.288 sec on osd.19
3 osds have slow requests
recovery 221/549658 objects degraded (0.040%)
recovery 221/549658 objects misplaced (0.040%)
pool volumes pg_num 2048  pgp_num 

Re: [ceph-users] tcmalloc use a lot of CPU

2015-08-17 Thread Mark Nelson

On 08/17/2015 07:03 AM, Alexandre DERUMIER wrote:

Hi,


Is this phenomenon normal?Is there any idea about this problem?


It's a known problem with tcmalloc (search on the ceph mailing).

starting osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M environnement 
variable should help.


Note that this only works if you use a version of gperftools/tcmalloc 
newer than 2.1.





Another way, is to compile ceph with jemalloc instead tcmalloc (./configure 
--with-jemalloc ...)


Yep!  At least from what I've seen so far, jemalloc is still a little 
faster for 4k random writes even compared to tcmalloc with the patch + 
128MB thread cache.  Should have some data soon (mostly just a 
reproduction of Sandisk and Intel's work).






- Mail original -
De: YeYin ey...@qq.com
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Lundi 17 Août 2015 11:58:26
Objet: [ceph-users] tcmalloc use a lot of CPU

Hi, all,
When I do performance test with rados bench, I found tcmalloc consumed a lot of 
CPU:

Samples: 265K of event 'cycles', Event count (approx.): 104385445900
+ 27.58% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::FetchFromSpans()
+ 15.25% libtcmalloc.so.4.1.0 [.] 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long,
+ 12.20% libtcmalloc.so.4.1.0 [.] 
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
+ 1.63% perf [.] append_chain
+ 1.39% libtcmalloc.so.4.1.0 [.] 
tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
+ 1.02% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::RemoveRange(void**, 
void**, int)
+ 0.85% libtcmalloc.so.4.1.0 [.] 0x00017e6f
+ 0.75% libtcmalloc.so.4.1.0 [.] 
tcmalloc::ThreadCache::IncreaseCacheLimitLocked()
+ 0.67% libc-2.12.so [.] memcpy
+ 0.53% libtcmalloc.so.4.1.0 [.] operator delete(void*)

Ceph version:
# ceph --version
ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e)

Kernel version:
3.10.83

Is this phenomenon normal? Is there any idea about this problem?

Thanks.
Ye


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-17 Thread Irek Fasikhov
Hi, Igor.

You need to repair the PG.

for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair' |
awk {'print$1'}`;do ceph pg repair $i;done

С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com:

 Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs
 in inconsistent state...

 root@temp:~# ceph health detail | grep inc
 HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
 pg 2.490 is active+clean+inconsistent, acting [56,15,29]
 pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

 From OSD logs, after recovery attempt:

 root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do
 ceph pg repair ${i} ; done
 dumped all in format plain
 instructing pg 2.490 on osd.56 to repair
 instructing pg 2.c4 on osd.56 to repair

 /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone
 90c59490/rbd_data.eb486436f2beb.7a65/141//2
 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone
 f5759490/rbd_data.1631755377d7e.04da/141//2
 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone
 fee49490/rbd_data.12483d3ba0794b.522f/141//2
 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone
 a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone
 bac19490/rbd_data.1238e82ae8944a.032e/141//2
 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone
 98519490/rbd_data.123e9c2ae8944a.0807/141//2
 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone
 c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone
 28809490/rbd_data.edea7460fe42b.01d9/141//2
 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700
 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors

 So, how i can solve expected clone situation by hand?
 Thank in advance!



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-17 Thread Voloshanenko Igor
Hi Irek, Please read careful )))
You proposal was the first, i try to do...  That's why i asked about
help... (

2015-08-18 8:34 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 Hi, Igor.

 You need to repair the PG.

 for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair'
 | awk {'print$1'}`;do ceph pg repair $i;done

 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757

 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com:

 Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs
 in inconsistent state...

 root@temp:~# ceph health detail | grep inc
 HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
 pg 2.490 is active+clean+inconsistent, acting [56,15,29]
 pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

 From OSD logs, after recovery attempt:

 root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do
 ceph pg repair ${i} ; done
 dumped all in format plain
 instructing pg 2.490 on osd.56 to repair
 instructing pg 2.c4 on osd.56 to repair

 /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone
 90c59490/rbd_data.eb486436f2beb.7a65/141//2
 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone
 f5759490/rbd_data.1631755377d7e.04da/141//2
 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone
 fee49490/rbd_data.12483d3ba0794b.522f/141//2
 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone
 a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone
 bac19490/rbd_data.1238e82ae8944a.032e/141//2
 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone
 98519490/rbd_data.123e9c2ae8944a.0807/141//2
 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone
 c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone
 28809490/rbd_data.edea7460fe42b.01d9/141//2
 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700
 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors

 So, how i can solve expected clone situation by hand?
 Thank in advance!



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Repair inconsistent pgs..

2015-08-17 Thread Voloshanenko Igor
Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs
in inconsistent state...

root@temp:~# ceph health detail | grep inc
HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
pg 2.490 is active+clean+inconsistent, acting [56,15,29]
pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

From OSD logs, after recovery attempt:

root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do
ceph pg repair ${i} ; done
dumped all in format plain
instructing pg 2.490 on osd.56 to repair
instructing pg 2.c4 on osd.56 to repair

/var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone
90c59490/rbd_data.eb486436f2beb.7a65/141//2
/var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone
f5759490/rbd_data.1631755377d7e.04da/141//2
/var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone
fee49490/rbd_data.12483d3ba0794b.522f/141//2
/var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone
a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
/var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone
bac19490/rbd_data.1238e82ae8944a.032e/141//2
/var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone
98519490/rbd_data.123e9c2ae8944a.0807/141//2
/var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone
c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
/var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone
28809490/rbd_data.edea7460fe42b.01d9/141//2
/var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1
log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors

So, how i can solve expected clone situation by hand?
Thank in advance!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-17 Thread Steve Dainard
I added a couple OSD's and rebalanced, as well as added a new pool (id 10).

# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 5 pgs stuck unclean;
1 pgs stuck undersized; 1 pgs undersized; recovery 24379/66089446
objects misplaced (0.037%)
pg 10.4f is stuck unclean since forever, current state
active+undersized+degraded, last acting [35]
pg 2.e7f is stuck unclean for 500733.746009, current state
active+remapped, last acting [58,5]
pg 2.b16 is stuck unclean for 263130.699428, current state
active+remapped, last acting [40,90]
pg 10.668 is stuck unclean for 253554.833477, current state
active+remapped, last acting [34,101]
pg 2.782 is stuck unclean for 253561.405193, current state
active+remapped, last acting [76,101]
pg 10.4f is stuck undersized for 300.523795, current state
active+undersized+degraded, last acting [35]
pg 10.4f is stuck degraded for 300.523977, current state
active+undersized+degraded, last acting [35]
pg 10.4f is active+undersized+degraded, acting [35]
recovery 24379/66089446 objects misplaced (0.037%)

I figured the logs for osd.35 might be most interesting first as it
doesn't come out of a degraded state. After setting debug to 0/5 on
osd.35 and restarting the osd I grep'd for the degraded placement
group:

# grep 10.4f\(  ceph-osd.35.log
2015-08-17 09:27:03.945350 7f0eb1a7f700 30 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] lock
2015-08-17 09:27:03.945357 7f0eb1a7f700 10 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] on_shutdown
2015-08-17 09:27:03.945371 7f0eb1a7f700 10 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] cancel_copy_ops
2015-08-17 09:27:03.945378 7f0eb1a7f700 10 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] cancel_flush_ops
2015-08-17 09:27:03.945387 7f0eb1a7f700 10 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] cancel_proxy_read_ops
2015-08-17 09:27:03.945392 7f0eb1a7f700 10 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] on_change
2015-08-17 09:27:03.945397 7f0eb1a7f700 10 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] clear_primary_state
2015-08-17 09:27:03.945404 7f0eb1a7f700 20 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] agent_stop
2015-08-17 09:27:03.945409 7f0eb1a7f700 10 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] cancel_recovery
2015-08-17 09:27:03.945413 7f0eb1a7f700 10 osd.35 pg_epoch: 186424
pg[10.4f( empty local-les=185079 n=0 ec=185075 les/c 185079/185079
185075/185075/185075) [35] r=0 lpr=185075 crt=0'0 mlcod 0'0
active+undersized+degraded] clear_recovery_state

Full logs of osd.35
part1: http://pastebin.com/6ymD4Gx6
part2: http://pastebin.com/h4aRwniF

osd.76

# grep 2.782 /var/log/ceph/ceph-osd.76.log
2015-08-17 09:52:21.205316 7fc3b6cce700 20 osd.76 186548  kicking pg 2.782
2015-08-17 09:52:21.205319 7fc3b6cce700 30 osd.76 pg_epoch: 186548
pg[2.782( v 185988'161310 (183403'153055,185988'161310]
local-les=186320 n=8163 ec=736 les/c 186320/186320
186318/186319/185008) [76]/[76,101] r=0 lpr=186319 crt=185986'161303
lcod 185988'161309 mlcod 0'0 active+remapped] lock
2015-08-17 09:52:21.205338 7fc3b6cce700 10 osd.76 pg_epoch: 186548
pg[2.782( v 185988'161310 (183403'153055,185988'161310]
local-les=186320 n=8163 ec=736 les/c 186320/186320
186318/186319/185008) [76]/[76,101] r=0 lpr=186319 crt=185986'161303
lcod 185988'161309 mlcod 0'0 active+remapped] on_shutdown
2015-08-17 09:52:21.205347 7fc3b6cce700 10 osd.76 pg_epoch: 186548
pg[2.782( v 185988'161310 (183403'153055,185988'161310]
local-les=186320 n=8163 ec=736 les/c 186320/186320
186318/186319/185008) [76]/[76,101] r=0 lpr=186319 crt=185986'161303
lcod 185988'161309 mlcod 0'0 active+remapped] cancel_copy_ops
2015-08-17 09:52:21.205354 7fc3b6cce700 10 osd.76 pg_epoch: 186548
pg[2.782( v 185988'161310 (183403'153055,185988'161310]
local-les=186320 n=8163 ec=736 les/c 186320/186320
186318/186319/185008) 

[ceph-users] docker distribution

2015-08-17 Thread Lorieri
Hi,

Docker changed the old docker-registry project to docker-distribution
and its API to v2.
It now uses librados instead of radosgw to save data.

In some ceph installations it is easier to get access to radosgw than
to the cluster, so I've made a pull request to add radosgw support, it
would be great if you test it.
https://hub.docker.com/r/lorieri/docker-distribution-generic-s3/

Note: if you already use the old docker-registry you must create
another bucket and push the images again, the API changed to v2.

There is a shellscript to help https://github.com/docker/migrator

How I tested it:

docker run -d -p 5000:5000 -e REGISTRY_STORAGE=s3 \
-e REGISTRY_STORAGE_S3_REGION=generic \
-e REGISTRY_STORAGE_S3_REGIONENDPOINT=http://myradosgw.mydomain.com; \
-e REGISTRY_STORAGE_S3_BUCKET=registry \
-e REGISTRY_STORAGE_S3_ACCESSKEY=XXX \
-e REGISTRY_STORAGE_S3_SECRETKEY=XXX \
-e REGISTRY_STORAGE_S3_SECURE=false \
-e REGISTRY_STORAGE_S3_ENCRYPT=false \
-e REGISTRY_STORAGE_S3_REGIONSUPPORTSHEAD=false \
lorieri/docker-distribution-generic-s3


thanks,
-lorieri
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck creating pg

2015-08-17 Thread Bart Vanbrabant
Many operations in the OpenStack cluster are stuck because of this. For
example, a VM cannot be removed because of operations stuck on osd.19:

2015-08-17 09:34:08.116274 7fa61e57a700  0 log_channel(cluster) log [WRN] :
slow request 1920.261825 seconds old, received at 2015-08-17
09:02:07.853997: osd_op(client.4705573.0:4384
rbd_data.47a42a1fba00d3.110d [delete] 5.283a4ec7
ack+ondisk+write+known_if_redirected e61799) currently no flag points
reached
2015-08-17 09:34:08.116279 7fa61e57a700  0 log_channel(cluster) log [WRN] :
slow request 1920.157696 seconds old, received at 2015-08-17
09:02:07.958126: osd_op(client.4705573.0:4897
rbd_data.47a42a1fba00d3.130e [delete] 5.868caac7
ack+ondisk+write+known_if_redirected e61799) currently no flag points
reached
2015-08-17 09:34:09.116537 7fa61e57a700  0 log_channel(cluster) log [WRN] :
38 slow requests, 9 included below; oldest blocked for  68721.775549 secs
2015-08-17 09:34:09.116553 7fa61e57a700  0 log_channel(cluster) log [WRN] :
slow request 1920.842824 seconds old, received at 2015-08-17
09:02:08.273620: osd_op(client.4705573.0:5846
rbd_data.47a42a1fba00d3.16c3 [delete] 5.dbd736c7
ack+ondisk+write+known_if_redirected e61799) currently no flag points
reached

rbd_data.47a42a1fba00d3.130e is an object in an VM disk that
openstack is trying to delete.

gr,
Bart

On Sun, Aug 16, 2015 at 1:27 PM Bart Vanbrabant b...@vanbrabant.eu wrote:

 Hi,

 I have a ceph cluster with 26 osd's in 4 hosts only use for rbd for an
 OpenStack cluster (started at 0.48 I think), currently running 0.94.2 on
 Ubuntu 14.04. A few days ago one of the osd's was at 85% disk usage while
 only 30% of the raw disk space is used. I ran reweight-by-utilization with
 150 was cutoff level. This reshuffled the data. I also noticed that the
 number of pg was still at the level when there were less disks in the
 cluster (1300).

 Based on the current guidelines I increased pg_num to 2048. It created the
 placement groups except for the last one. To try to force the creation of
 the pg I removed the OSD's (ceph osd out) assigned to that pg but that
 makes no difference. Currently all OSD's are back in and two pg's are also
 stuck in an unclean state:

 ceph health detail:

 HEALTH_WARN 2 pgs degraded; 2 pgs stale; 2 pgs stuck degraded; 1 pgs stuck
 inactive; 2 pgs stuck stale; 3 pgs stuck unclean; 2 pgs stuck undersized; 2
 pgs undersized; 59 requests are blocked  32 sec; 3 osds have slow
 requests; recovery 221/549658 objects degraded (0.040%); recovery
 221/549658 objects misplaced (0.040%); pool volumes pg_num 2048  pgp_num
 1400
 pg 5.6c7 is stuck inactive since forever, current state creating, last
 acting [19,25]
 pg 5.6c7 is stuck unclean since forever, current state creating, last
 acting [19,25]
 pg 5.2c7 is stuck unclean for 313513.609864, current state
 stale+active+undersized+degraded+remapped, last acting [9]
 pg 15.2bd is stuck unclean for 313513.610368, current state
 stale+active+undersized+degraded+remapped, last acting [9]
 pg 5.2c7 is stuck undersized for 308381.750768, current state
 stale+active+undersized+degraded+remapped, last acting [9]
 pg 15.2bd is stuck undersized for 308381.751913, current state
 stale+active+undersized+degraded+remapped, last acting [9]
 pg 5.2c7 is stuck degraded for 308381.750876, current state
 stale+active+undersized+degraded+remapped, last acting [9]
 pg 15.2bd is stuck degraded for 308381.752021, current state
 stale+active+undersized+degraded+remapped, last acting [9]
 pg 5.2c7 is stuck stale for 281750.295301, current state
 stale+active+undersized+degraded+remapped, last acting [9]
 pg 15.2bd is stuck stale for 281750.295293, current state
 stale+active+undersized+degraded+remapped, last acting [9]
 16 ops are blocked  268435 sec
 10 ops are blocked  134218 sec
 10 ops are blocked  1048.58 sec
 23 ops are blocked  524.288 sec
 16 ops are blocked  268435 sec on osd.1
 8 ops are blocked  134218 sec on osd.17
 2 ops are blocked  134218 sec on osd.19
 10 ops are blocked  1048.58 sec on osd.19
 23 ops are blocked  524.288 sec on osd.19
 3 osds have slow requests
 recovery 221/549658 objects degraded (0.040%)
 recovery 221/549658 objects misplaced (0.040%)
 pool volumes pg_num 2048  pgp_num 1400

 OSD 9 was the one that was the primary when the pg creation process got
 stuck. This OSD has been removed and added again (not only osd out but also
 removed from the crush map and added again)

 The bad data distribution was probably caused by the low number of pg's
 and mainly bad weighing of the OSD. I changed the crush map to give the
 same weight to each of the OSD's but that does not change these problems
 either:

 ceph osd tree:
 ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -1 6.5 pool default
 -6 2.0 host droplet4
 16 0.25000 osd.16 up  1.0  1.0
 20 0.25000 osd.20 up  1.0  1.0
 21 0.25000 osd.21  

Re: [ceph-users] rbd map failed

2015-08-17 Thread Ilya Dryomov
On Thu, Aug 13, 2015 at 1:59 PM, Adir Lev ad...@mellanox.com wrote:
 Hi,



 I have a CEPH cluster running on 4 physical servers, the cluster is up and
 healthy

 So far I was unable to connect any client to the cluster using krbd or fio
 rbd plugin.

 My clients can see and create images in rbd pool but cannot map

 root@r-dcs68 ~ # rbd ls

 fio_test

 foo

 foo1

 foo_test



 root@r-dcs68 ~ # rbd map foo

 rbd: sysfs write failed

 rbd: map failed: (95) Operation not supported



 using strace I see that there is no write permissions to /sys/bus/rbd/add

 root@r-dcs68 ~ # echo 192.168.57.102:16789 name=admin,key=client.admin rbd
 foo -  /sys/bus/rbd/add

 -bash: echo: write error: Operation not permitted

It doesn't look like a file permissions problem, please paste the
entire strace output.  What's your ceph version (ceph --version)?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Memory-Usage

2015-08-17 Thread Patrik Plank
Hi,



have a ceph cluster witch tree nodes and 32 osds.

The tree nodes have 16Gb memory but only 5Gb is in use.

Nodes are Dell Poweredge R510.



my ceph.conf:



[global]
mon_initial_members = ceph01
mon_host = 10.0.0.20,10.0.0.21,10.0.0.22
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
filestore_op_threads = 32
public_network = 10.0.0.0/24
cluster_network = 10.0.1.0/24
osd_pool_default_size = 3
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 4096
osd_pool_default_pgp_num = 4096
osd_max_write_size = 200
osd_map_cache_size = 1024
osd_map_cache_bl_size = 128
osd_recovery_op_priority = 1
osd_max_recovery_max_active = 1
osd_recovery_max_backfills = 1
osd_op_threads = 32
osd_disk_threads = 8


is that normal or a bottleneck?



best regards

Patrik

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-17 Thread Voloshanenko Igor
Hi all, can you please help me with unexplained situation...

All snapshot inside ceph broken...

So, as example, we have VM template, as rbd inside ceph.
We can map it and mount to check that all ok with it

root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
/dev/rbd0
root@test:~# parted /dev/rbd0 print
Model: Unknown (unknown)
Disk /dev/rbd0: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End SizeType File system  Flags
 1  1049kB  525MB   524MB   primary  ext4 boot
 2  525MB   10.7GB  10.2GB  primary   lvm

Than i want to create snap, so i do:
root@test:~# rbd snap create
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

And now i want to map it:

root@test:~# rbd map
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
/dev/rbd1
root@test:~# parted /dev/rbd1 print
Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
 /dev/rbd1 has been opened read-only.
Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
 /dev/rbd1 has been opened read-only.
Error: /dev/rbd1: unrecognised disk label

Even md5 different...
root@ix-s2:~# md5sum /dev/rbd0
9a47797a07fee3a3d71316e22891d752  /dev/rbd0
root@ix-s2:~# md5sum /dev/rbd1
e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1


Ok, now i protect snap and create clone... but same thing...
md5 for clone same as for snap,,

root@test:~# rbd unmap /dev/rbd1
root@test:~# rbd snap protect
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
root@test:~# rbd clone
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
cold-storage/test-image
root@test:~# rbd map cold-storage/test-image
/dev/rbd1
root@test:~# md5sum /dev/rbd1
e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1

 but it's broken...
root@test:~# parted /dev/rbd1 print
Error: /dev/rbd1: unrecognised disk label


=

tech details:

root@test:~# ceph -v
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)

We have 2 inconstistent pgs, but all images not placed on this pgs...

root@test:~# ceph health detail
HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
pg 2.490 is active+clean+inconsistent, acting [56,15,29]
pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
18 scrub errors



root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5
osdmap e16770 pool 'cold-storage' (2) object
'0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up
([37,15,14], p37) acting ([37,15,14], p37)
root@test:~# ceph osd map cold-storage
0e23c701-401d-4465-b9b4-c02939d57bb5@snap
osdmap e16770 pool 'cold-storage' (2) object
'0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up
([12,23,17], p12) acting ([12,23,17], p12)
root@test:~# ceph osd map cold-storage
0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
osdmap e16770 pool 'cold-storage' (2) object
'0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9)
- up ([12,44,23], p12) acting ([12,44,23], p12)


Also we use cache layer, which in current moment - in forward mode...

Can you please help me with this.. As my brain stop to understand what is
going on...

Thank in advance!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph distributed osd

2015-08-17 Thread gjprabu
Hi All,



   We need to test three OSD and one image with replica 2(size 1GB). While 
testing data is not writing above 1GB. Is there any option to write on third 
OSD.



ceph osd pool get  repo  pg_num

pg_num: 126



# rbd showmapped 

id pool image  snap device

0  rbd  integdownloads -/dev/rbd0 -- Already one

2  repo integrepotest  -/dev/rbd2  -- newly created





[root@hm2 repository]# df -Th

Filesystem   Type  Size  Used Avail Use% Mounted on

/dev/sda5ext4  289G   18G  257G   7% /

devtmpfs devtmpfs  252G 0  252G   0% /dev

tmpfstmpfs 252G 0  252G   0% /dev/shm

tmpfstmpfs 252G  538M  252G   1% /run

tmpfstmpfs 252G 0  252G   0% /sys/fs/cgroup

/dev/sda2ext4  488M  212M  241M  47% /boot

/dev/sda4ext4  1.9T   20G  1.8T   2% /var

/dev/mapper/vg0-zoho ext4  8.6T  1.7T  6.5T  21% /zoho

/dev/rbd0ocfs2 977G  101G  877G  11% /zoho/build/downloads

/dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository



@:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/

root@integ-hm2's password: 

sample.txt  
   100% 1024MB   4.5MB/s   03:48

scp: /zoho/build/repository//sample.txt: No space left on device



Regards

Prabu










  On Thu, 13 Aug 2015 19:42:11 +0530 gjprabu lt;gjpr...@zohocorp.comgt; 
wrote 




Dear Team,



 We are using two ceph OSD with replica 2 and it is working properly. 
Here my doubt is (Pool A -image size will be 10GB) and its replicated with two 
OSD, what will happen suppose if the size reached the limit, Is there any 
chance to make the data to continue writing in another two OSD's.



Regards

Prabu


















___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph distributed osd

2015-08-17 Thread gjprabu
Hi All,



   Also please find osd information.



ceph osd dump | grep 'replicated size'

pool 2 'repo' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 126 pgp_num 126 last_change 21573 flags hashpspool stripe_width 0



Regards

Prabu









  On Mon, 17 Aug 2015 11:58:55 +0530 gjprabu lt;gjpr...@zohocorp.comgt; 
wrote 




Hi All,



   We need to test three OSD and one image with replica 2(size 1GB). While 
testing data is not writing above 1GB. Is there any option to write on third 
OSD.



ceph osd pool get  repo  pg_num

pg_num: 126



# rbd showmapped 

id pool image  snap device

0  rbd  integdownloads -/dev/rbd0 -- Already one

2  repo integrepotest  -/dev/rbd2  -- newly created





[root@hm2 repository]# df -Th

Filesystem   Type  Size  Used Avail Use% Mounted on

/dev/sda5ext4  289G   18G  257G   7% /

devtmpfs devtmpfs  252G 0  252G   0% /dev

tmpfstmpfs 252G 0  252G   0% /dev/shm

tmpfstmpfs 252G  538M  252G   1% /run

tmpfstmpfs 252G 0  252G   0% /sys/fs/cgroup

/dev/sda2ext4  488M  212M  241M  47% /boot

/dev/sda4ext4  1.9T   20G  1.8T   2% /var

/dev/mapper/vg0-zoho ext4  8.6T  1.7T  6.5T  21% /zoho

/dev/rbd0ocfs2 977G  101G  877G  11% /zoho/build/downloads

/dev/rbd2ocfs21000M 1000M 0 100% /zoho/build/repository



@:~$ scp -r sample.txt root@integ-hm2:/zoho/build/repository/

root@integ-hm2's password: 

sample.txt  
   100% 1024MB   4.5MB/s   03:48

scp: /zoho/build/repository//sample.txt: No space left on device



Regards

Prabu










 On Thu, 13 Aug 2015 19:42:11 +0530 gjprabu lt;gjpr...@zohocorp.comgt; 
wrote 











Dear Team,



 We are using two ceph OSD with replica 2 and it is working properly. 
Here my doubt is (Pool A -image size will be 10GB) and its replicated with two 
OSD, what will happen suppose if the size reached the limit, Is there any 
chance to make the data to continue writing in another two OSD's.



Regards

Prabu


















___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com