Re: [ceph-users] RBD performance slowly degrades :-(

2015-08-12 Thread Pieter Koorts

Hi Irek,

Thanks for the link. I have removed the SSD's for now and performance is up to 
30MB/s on a benchmark now. To be honest, I new the Samsung SSD weren't great 
but did not expect them to be worse then just plain hard disks.

Pieter

On Aug 12, 2015, at 01:09 PM, Irek Fasikhov malm...@gmail.com wrote:

Hi.
Read this thread here:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg17360.html

С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

2015-08-12 14:52 GMT+03:00 Pieter Koorts pieter.koo...@me.com:
Hi

Something that's been bugging me for a while is I am trying to diagnose iowait 
time within KVM guests. Guests doing reads or writes tend do about 50% to 90% 
iowait but the host itself is only doing about 1% to 2% iowait. So the result 
is the guests are extremely slow.

I currently run 3x hosts each with a single SSD and single HDD OSD in 
cache-teir writeback mode. Although the SSD (Samsung 850 EVO 120GB) is not a 
great one it should at least perform reasonably compared to a hard disk and 
doing some direct SSD tests I get approximately 100MB/s write and 200MB/s read 
on each SSD.

When I run rados bench though, the benchmark starts with a not great but okay 
speed and as the benchmark progresses it just gets slower and slower till it's 
worse than a USB hard drive. The SSD cache pool is 120GB in size (360GB RAW) 
and in use at about 90GB. I have tried tuning the XFS mount options as well but 
it has had little effect.

Understandably the server spec is not great but I don't expect performance to 
be that bad.

OSD config:
[osd]
osd crush update on start = false
osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

Servers spec:
Dual Quad Core XEON E5410 and 32GB RAM in each server
10GBE @ 10G speed with 8000byte Jumbo Frames.

Rados bench result: (starts at 50MB/s average and plummets down to 11MB/s)
sudo rados bench -p rbd 50 write --no-cleanup -t 1
 Maintaining 1 concurrent writes of 4194304 bytes for up to 50 seconds or 0 
objects
 Object prefix: benchmark_data_osc-mgmt-1_10007
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1       1        14        13   51.9906        52 0.0671911  0.074661
     2       1        27        26   51.9908        52 0.0631836 0.0751152
     3       1        37        36   47.9921        40 0.0691167 0.0802425
     4       1        51        50   49.9922        56 0.0816432 0.0795869
     5       1        56        55   43.9934        20  0.208393  0.088523
     6       1        61        60    39.994        20  0.241164 0.0999179
     7       1        64        63   35.9934        12  0.239001  0.106577
     8       1        66        65   32.4942         8  0.214354  0.122767
     9       1        72        71     31.55        24  0.132588  0.125438
    10       1        77        76   30.3948        20  0.256474  0.128548
    11       1        79        78   28.3589         8  0.183564  0.138354
    12       1        82        81   26.9956        12  0.345809  0.145523
    13       1        85        84    25.842        12  0.373247  0.151291
    14       1        86        85   24.2819         4  0.950586  0.160694
    15       1        86        85   22.6632         0         -  0.160694
    16       1        90        89   22.2466         8  0.204714  0.178352
    17       1        94        93   21.8791        16  0.282236  0.180571
    18       1        98        97   21.5524        16  0.262566  0.183742
    19       1       101       100   21.0495        12  0.357659  0.187477
    20       1       104       103    20.597        12  0.369327  0.192479
    21       1       105       104   19.8066         4  0.373233  0.194217
    22       1       105       104   18.9064         0         -  0.194217
    23       1       106       105   18.2582         2   2.35078  0.214756
    24       1       107       106   17.6642         4  0.680246  0.219147
    25       1       109       108   17.2776         8  0.677688  0.229222
    26       1       113       112   17.2283        16   0.29171  0.230487
    27       1       117       116   17.1828        16  0.255915  0.231101
    28       1       120       119   16.9976        12  0.412411  0.235122
    29       1       120       119   16.4115         0         -  0.235122
    30       1       120       119   15.8645         0         -  0.235122
    31       1       120       119   15.3527         0         -  0.235122
    32       1       122       121   15.1229         2  0.319309  0.262822
    33       1       124       123   14.9071         8  0.344094  0.266201
    34       1       127       126   14.8215        12   0.33534  0.267913
    35       1       129       128   14.6266         8  0.355403  0.269241
    36       1       132       131   14.5536        12  0.581528  0.274327
    37       1       132       131   14.1603         0         -  0.274327
   

Re: [ceph-users] Is there a limit for object size in CephFS?

2015-08-12 Thread Hadi Montakhabi
4.0.6-300.fc22.x86_64

On Tue, Aug 11, 2015 at 10:24 PM, Yan, Zheng uker...@gmail.com wrote:

 On Wed, Aug 12, 2015 at 5:33 AM, Hadi Montakhabi h...@cs.uh.edu wrote:

 ​​
 [sequential read]
 readwrite=read
 size=2g
 directory=/mnt/mycephfs
 ioengine=libaio
 direct=1
 blocksize=${BLOCKSIZE}
 numjobs=1
 iodepth=1
 invalidate=1 # causes the kernel buffer and page cache to be invalidated
 #nrfiles=1
 [sequential write]
 readwrite=write # randread randwrite
 size=2g
 directory=/mnt/mycephfs
 ioengine=libaio
 direct=1
 blocksize=${BLOCKSIZE}
 numjobs=1
 iodepth=1
 invalidate=1
 [random read]
 readwrite=randread
 size=2g
 directory=/mnt/mycephfs
 ioengine=libaio
 direct=1
 blocksize=${BLOCKSIZE}
 numjobs=1
 iodepth=1
 invalidate=1
 [random write]
 readwrite=randwrite
 size=2g
 directory=/mnt/mycephfs
 ioengine=libaio
 direct=1
 blocksize=${BLOCKSIZE}
 numjobs=1
 iodepth=1
 invalidate=1


 I just tried 4.2-rc kernel, everything went well. which version of kernel
 were you using?






 On Sun, Aug 9, 2015 at 9:27 PM, Yan, Zheng uker...@gmail.com wrote:


 On Sun, Aug 9, 2015 at 8:57 AM, Hadi Montakhabi h...@cs.uh.edu wrote:

 I am using fio.
 I use the kernel module to Mount CephFS.


 please send fio job file to us



 On Aug 8, 2015 10:52 AM, Ketor D d.ke...@gmail.com wrote:

 Hi Haidi,
   Which bench tool do you use? And how you mount CephFS, ceph-fuse
 or kernel-cephfs?

 On Fri, Aug 7, 2015 at 11:50 PM, Hadi Montakhabi h...@cs.uh.edu
 wrote:

 Hello Cephers,

 I am benchmarking CephFS. In one of my experiments, I change the
 object size.
 I start from 64kb. Everytime I do different block size reads and
 writes.
 By increasing the object size to 64MB and increasing the block size
 to 64MB, CephFS crashes (shown in the chart below). What I mean by crash 
 is
 when I do ceph -s or ceph -w it gets into constantly reporting me
 reads, but it never finishes the operation (even after a few days!).
 I have repeated this experiment for different underlying file systems
 (xfs and btrfs), and the same thing happens in both cases.
 What could be the reason for crashing CephFS? Is there a limit for
 object size in CephFS?

 Thank you,
 Hadi

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD performance slowly degrades :-(

2015-08-12 Thread Max A. Krasilnikov
Здравствуйте! 

On Wed, Aug 12, 2015 at 02:30:59PM +, pieter.koorts wrote:

 Hi Irek,

 Thanks for the link. I have removed the SSD's for now and performance is up 
 to 30MB/s on a benchmark now. To be honest, I new the Samsung SSD weren't 
 great but did not expect them to be worse then just plain hard disks.

I had the same trouble with Samsung 840 EVO 1TB. 15 of 16 disks was terribly
slow (about 3000 iops and up to 200 MBps per drive). All the drives were
replased by 850 EVO 250 GB and problem was fixed.
My ssds had the latest firmware and was brand new at the moment of test.

 Pieter

 Something that's been bugging me for a while is I am trying to diagnose 
 iowait time within KVM guests. Guests doing reads or writes tend do about 50% 
 to 90% iowait but the host itself is only doing about 1% to 2% iowait. So the 
 result is the guests are extremely slow.

 I currently run 3x hosts each with a single SSD and single HDD OSD in 
 cache-teir writeback mode. Although the SSD (Samsung 850 EVO 120GB) is not a 
 great one it should at least perform reasonably compared to a hard disk and 
 doing some direct SSD tests I get approximately 100MB/s write and 200MB/s 
 read on each SSD.

 When I run rados bench though, the benchmark starts with a not great but okay 
 speed and as the benchmark progresses it just gets slower and slower till 
 it's worse than a USB hard drive. The SSD cache pool is 120GB in size (360GB 
 RAW) and in use at about 90GB. I have tried tuning the XFS mount options as 
 well but it has had little effect.

-- 
WBR, Max A. Krasilnikov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd rename snaps?

2015-08-12 Thread Stefan Priebe - Profihost AG
Hi,

for mds there is the ability to rename snapshots. But for rbd i can't
see one.

Is there a way to rename a snapshot?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH cache layer. Very slow

2015-08-12 Thread Voloshanenko Igor
Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12
disks on each, 10 HDD, 2 SSD)

Also we cover this with custom crushmap with 2 root leaf

ID   WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-100 5.0 root ssd
-102 1.0 host ix-s2-ssd
   2 1.0 osd.2   up  1.0  1.0
   9 1.0 osd.9   up  1.0  1.0
-103 1.0 host ix-s3-ssd
   3 1.0 osd.3   up  1.0  1.0
   7 1.0 osd.7   up  1.0  1.0
-104 1.0 host ix-s5-ssd
   1 1.0 osd.1   up  1.0  1.0
   6 1.0 osd.6   up  1.0  1.0
-105 1.0 host ix-s6-ssd
   4 1.0 osd.4   up  1.0  1.0
   8 1.0 osd.8   up  1.0  1.0
-106 1.0 host ix-s7-ssd
   0 1.0 osd.0   up  1.0  1.0
   5 1.0 osd.5   up  1.0  1.0
  -1 5.0 root platter
  -2 1.0 host ix-s2-platter
  13 1.0 osd.13  up  1.0  1.0
  17 1.0 osd.17  up  1.0  1.0
  21 1.0 osd.21  up  1.0  1.0
  27 1.0 osd.27  up  1.0  1.0
  32 1.0 osd.32  up  1.0  1.0
  37 1.0 osd.37  up  1.0  1.0
  44 1.0 osd.44  up  1.0  1.0
  48 1.0 osd.48  up  1.0  1.0
  55 1.0 osd.55  up  1.0  1.0
  59 1.0 osd.59  up  1.0  1.0
  -3 1.0 host ix-s3-platter
  14 1.0 osd.14  up  1.0  1.0
  18 1.0 osd.18  up  1.0  1.0
  23 1.0 osd.23  up  1.0  1.0
  28 1.0 osd.28  up  1.0  1.0
  33 1.0 osd.33  up  1.0  1.0
  39 1.0 osd.39  up  1.0  1.0
  43 1.0 osd.43  up  1.0  1.0
  47 1.0 osd.47  up  1.0  1.0
  54 1.0 osd.54  up  1.0  1.0
  58 1.0 osd.58  up  1.0  1.0
  -4 1.0 host ix-s5-platter
  11 1.0 osd.11  up  1.0  1.0
  16 1.0 osd.16  up  1.0  1.0
  22 1.0 osd.22  up  1.0  1.0
  26 1.0 osd.26  up  1.0  1.0
  31 1.0 osd.31  up  1.0  1.0
  36 1.0 osd.36  up  1.0  1.0
  41 1.0 osd.41  up  1.0  1.0
  46 1.0 osd.46  up  1.0  1.0
  51 1.0 osd.51  up  1.0  1.0
  56 1.0 osd.56  up  1.0  1.0
  -5 1.0 host ix-s6-platter
  12 1.0 osd.12  up  1.0  1.0
  19 1.0 osd.19  up  1.0  1.0
 24 1.0 osd.24  up  1.0  1.0
  29 1.0 osd.29  up  1.0  1.0
  34 1.0 osd.34  up  1.0  1.0
  38 1.0 osd.38  up  1.0  1.0
  42 1.0 osd.42  up  1.0  1.0
  50 1.0 osd.50  up  1.0  1.0
  53 1.0 osd.53  up  1.0  1.0
  57 1.0 osd.57  up  1.0  1.0
  -6 1.0 host ix-s7-platter
  10 1.0 osd.10  up  1.0  1.0
  15 1.0 osd.15  up  1.0  1.0
  20 1.0 osd.20  up  1.0  1.0
  25 1.0 osd.25  up  1.0  1.0
  30 1.0 osd.30  up  1.0  1.0
  35 1.0 osd.35  up  1.0  1.0
  40 1.0 osd.40  up  1.0  1.0
  45 1.0 osd.45  up  1.0  1.0
  49 1.0 osd.49  up  1.0  1.0
  52 1.0 osd.52  up  1.0  1.0


Then create 2 pools, 1 on HDD (platters), 1 on SSD/
and put SSD pul in from of HDD pool (cache tier)

now we receive very bad performance results from cluster.
Even with rados 

[ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-12 Thread Steve Dainard
I ran a ceph osd reweight-by-utilization yesterday and partway through
had a network interruption. After the network was restored the cluster
continued to rebalance but this morning the cluster has stopped
rebalance and status will not change from:

# ceph status
cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c
 health HEALTH_WARN
1 pgs degraded
1 pgs stuck degraded
2 pgs stuck unclean
1 pgs stuck undersized
1 pgs undersized
recovery 8163/66089054 objects degraded (0.012%)
recovery 8194/66089054 objects misplaced (0.012%)
 monmap e24: 3 mons at
{mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0}
election epoch 250, quorum 0,1,2 mon1,mon2,mon3
 osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs
  pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects
251 TB used, 111 TB / 363 TB avail
8163/66089054 objects degraded (0.012%)
8194/66089054 objects misplaced (0.012%)
4142 active+clean
   1 active+undersized+degraded
   1 active+remapped


# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 2 pgs stuck unclean;
1 pgs stuck undersized; 1 pgs undersized; recovery 8163/66089054
objects degraded (0.012%); recovery 8194/66089054 objects misplaced
(0.012%)
pg 2.e7f is stuck unclean for 65125.554509, current state
active+remapped, last acting [58,5]
pg 2.782 is stuck unclean for 65140.681540, current state
active+undersized+degraded, last acting [76]
pg 2.782 is stuck undersized for 60568.221461, current state
active+undersized+degraded, last acting [76]
pg 2.782 is stuck degraded for 60568.221549, current state
active+undersized+degraded, last acting [76]
pg 2.782 is active+undersized+degraded, acting [76]
recovery 8163/66089054 objects degraded (0.012%)
recovery 8194/66089054 objects misplaced (0.012%)

# ceph pg 2.e7f query
recovery_state: [
{
name: Started\/Primary\/Active,
enter_time: 2015-08-11 15:43:09.190269,
might_have_unfound: [],
recovery_progress: {
backfill_targets: [],
waiting_on_backfill: [],
last_backfill_started: 0\/\/0\/\/-1,
backfill_info: {
begin: 0\/\/0\/\/-1,
end: 0\/\/0\/\/-1,
objects: []
},
peer_backfill_info: [],
backfills_in_flight: [],
recovering: [],
pg_backend: {
pull_from_peer: [],
pushing: []
}
},
scrub: {
scrubber.epoch_start: 0,
scrubber.active: 0,
scrubber.waiting_on: 0,
scrubber.waiting_on_whom: []
}
},
{
name: Started,
enter_time: 2015-08-11 15:43:04.955796
}
],


# ceph pg 2.782 query
  recovery_state: [
{
name: Started\/Primary\/Active,
enter_time: 2015-08-11 15:42:42.178042,
might_have_unfound: [
{
osd: 5,
status: not queried
}
],
recovery_progress: {
backfill_targets: [],
waiting_on_backfill: [],
last_backfill_started: 0\/\/0\/\/-1,
backfill_info: {
begin: 0\/\/0\/\/-1,
end: 0\/\/0\/\/-1,
objects: []
},
peer_backfill_info: [],
backfills_in_flight: [],
recovering: [],
pg_backend: {
pull_from_peer: [],
pushing: []
}
},
scrub: {
scrubber.epoch_start: 0,
scrubber.active: 0,
scrubber.waiting_on: 0,
scrubber.waiting_on_whom: []
}
},
{
name: Started,
enter_time: 2015-08-11 15:42:41.139709
}
],
agent_state: {}

I tried restarted osd.5/58/76 but no change.

Any suggestions?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds server(s) crashed

2015-08-12 Thread Bob Ababurko
If I am using a more recent client(kernel OR ceph-fuse), should I still be
worried about the MDS's crashing?  I have added RAM to my MDS hosts and its
my understanding this will also help mitigate any issues, in addition to
setting mds_bal_frag = true.  Not having used cephfs before, do I always
need to worry about my MDS servers crashing all the time, thus the need for
setting mds_reconnect_timeout to 0?  This is not ideal for us nor is the
idea of clients not able to access their mounts after a MDS recovery.

I am actually looking for the most stable way to implement cephfs at this
point.   My cephfs cluster contains millions of small files, so many inodes
if that needs to be taken into account.  Perhaps I should only be using one
MDS node for stability at this point?  Is this the best way forward to get
a handle on stability?  I'm also curious if I should I set my mds cache
size to a number greater than files I have in the cephfs cluster?  If you
can give some key points to configure cephfs to get the best stability and
if possible, availability.this would be helpful to me.

thanks again for the help.

thanks,
Bob
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds server(s) crashed

2015-08-12 Thread yangyongp...@bwstor.com.cn
I also encounter a problem,standby mds can not be altered to active when active 
mds service stopped,which bother me for serval days.Maybe MDS cluster can solve 
those problem,but ceph team haven't released this feature.



yangyongp...@bwstor.com.cn
 
From: Yan, Zheng
Date: 2015-08-13 10:21
To: Bob Ababurko
CC: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mds server(s) crashed
On Thu, Aug 13, 2015 at 7:05 AM, Bob Ababurko b...@ababurko.net wrote:

 If I am using a more recent client(kernel OR ceph-fuse), should I still be
 worried about the MDS's crashing?  I have added RAM to my MDS hosts and its
 my understanding this will also help mitigate any issues, in addition to
 setting mds_bal_frag = true.  Not having used cephfs before, do I always
 need to worry about my MDS servers crashing all the time, thus the need for
 setting mds_reconnect_timeout to 0?  This is not ideal for us nor is the
 idea of clients not able to access their mounts after a MDS recovery.

 
It's unlikely this issue will happen again. But I can't  guarantee no
other issue.
 
no need to set mds_reconnect_timeout to 0.
 
 
 I am actually looking for the most stable way to implement cephfs at this
 point.   My cephfs cluster contains millions of small files, so many inodes
 if that needs to be taken into account.  Perhaps I should only be using one
 MDS node for stability at this point?  Is this the best way forward to get a
 handle on stability?  I'm also curious if I should I set my mds cache size
 to a number greater than files I have in the cephfs cluster?  If you can
 give some key points to configure cephfs to get the best stability and if
 possible, availability.this would be helpful to me.
 
One active MDS is the most stable setup. Adding a few standby MDS
should not hurt stability.
 
You can't set  mds cache size to a number greater than files in the
fs, it requires lots of memory.
 
 
Yan, Zheng
 

 thanks again for the help.

 thanks,
 Bob

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds server(s) crashed

2015-08-12 Thread Bob Ababurko
On Wed, Aug 12, 2015 at 7:21 PM, Yan, Zheng uker...@gmail.com wrote:

 On Thu, Aug 13, 2015 at 7:05 AM, Bob Ababurko b...@ababurko.net wrote:
 
  If I am using a more recent client(kernel OR ceph-fuse), should I still
 be
  worried about the MDS's crashing?  I have added RAM to my MDS hosts and
 its
  my understanding this will also help mitigate any issues, in addition to
  setting mds_bal_frag = true.  Not having used cephfs before, do I always
  need to worry about my MDS servers crashing all the time, thus the need
 for
  setting mds_reconnect_timeout to 0?  This is not ideal for us nor is the
  idea of clients not able to access their mounts after a MDS recovery.
 

 It's unlikely this issue will happen again. But I can't  guarantee no
 other issue.

 no need to set mds_reconnect_timeout to 0.


ok, Good to know.




  I am actually looking for the most stable way to implement cephfs at this
  point.   My cephfs cluster contains millions of small files, so many
 inodes
  if that needs to be taken into account.  Perhaps I should only be using
 one
  MDS node for stability at this point?  Is this the best way forward to
 get a
  handle on stability?  I'm also curious if I should I set my mds cache
 size
  to a number greater than files I have in the cephfs cluster?  If you can
  give some key points to configure cephfs to get the best stability and if
  possible, availability.this would be helpful to me.

 One active MDS is the most stable setup. Adding a few standby MDS
 should not hurt stability.

 You can't set  mds cache size to a number greater than files in the
 fs, it requires lots of memory.



I'm not sure what amount of RAM you consider to be 'lots' but I would
really like to understand a bit more about this.  Perhaps a rule of thumb?
It there an advantage to more RAM  large mds cache size?  We plan on
putting close to a billion small files in this pool via cephfs so what
should we be considering when sizing our MDS hosts OR change to the MDS
config?  Basically, what should we OR should not be doing when we have a
cluster with this many files?  Thanks!


 Yan, Zheng

 
  thanks again for the help.
 
  thanks,
  Bob
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds server(s) crashed

2015-08-12 Thread Yan, Zheng
On Thu, Aug 13, 2015 at 7:05 AM, Bob Ababurko b...@ababurko.net wrote:

 If I am using a more recent client(kernel OR ceph-fuse), should I still be
 worried about the MDS's crashing?  I have added RAM to my MDS hosts and its
 my understanding this will also help mitigate any issues, in addition to
 setting mds_bal_frag = true.  Not having used cephfs before, do I always
 need to worry about my MDS servers crashing all the time, thus the need for
 setting mds_reconnect_timeout to 0?  This is not ideal for us nor is the
 idea of clients not able to access their mounts after a MDS recovery.


It's unlikely this issue will happen again. But I can't  guarantee no
other issue.

no need to set mds_reconnect_timeout to 0.


 I am actually looking for the most stable way to implement cephfs at this
 point.   My cephfs cluster contains millions of small files, so many inodes
 if that needs to be taken into account.  Perhaps I should only be using one
 MDS node for stability at this point?  Is this the best way forward to get a
 handle on stability?  I'm also curious if I should I set my mds cache size
 to a number greater than files I have in the cephfs cluster?  If you can
 give some key points to configure cephfs to get the best stability and if
 possible, availability.this would be helpful to me.

One active MDS is the most stable setup. Adding a few standby MDS
should not hurt stability.

You can't set  mds cache size to a number greater than files in the
fs, it requires lots of memory.


Yan, Zheng


 thanks again for the help.

 thanks,
 Bob

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds server(s) crashed

2015-08-12 Thread John Spray
On Wed, Aug 12, 2015 at 5:08 AM, Bob Ababurko b...@ababurko.net wrote:
 What is risky about enabling mds_bal_frag on a cluster with data and will
 there be any performance degradation if enabled?

No specific gotchas, just that it is not something that has especially
good coverage in our automated tests.  We recently enabled it for our
general purpose tests (i.e. not specifically exercising fragmentation,
just normal fs workloads) and nothing's blown up horribly, but that's
about as far as the assurance goes.

Performance wise, there's a cost associated with splitting things up
and merging them, but a benefit associated with having smaller
fragments in general.  Probably doesn't make a huge difference outside
of the initial part where your existing large directories would all be
getting fragmented all of a sudden.

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Semi-reproducible crash of ceph-fuse

2015-08-12 Thread Jörg Henne
Jörg Henne hennejg@... writes:

 we are running ceph version 0.94.2 with a cephfs mounted using ceph-fuse on
 Ubuntu 14.04 LTS. I think we have found a bug that lets us semi-reprodicibly
 crash the ceph-fuse process. 

Reported as http://tracker.ceph.com/issues/12674
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd out

2015-08-12 Thread chmind
Hello. 
Could you please help me to remove osd from cluster;

 # ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.02998 root default
-2 0.00999 host ceph1
 0 0.00999 osd.0   up  1.0  1.0
-3 0.00999 host ceph2
 1 0.00999 osd.1   up  1.0  1.0
-4 0.00999 host ceph3
 2 0.00999 osd.2   up  1.0  1.0


# ceph -s
cluster 64f87255-d56e-499d-8ebc-65e0f577e0aa
 health HEALTH_OK
 monmap e1: 3 mons at 
{ceph1=10.0.0.101:6789/0,ceph2=10.0.0.102:6789/0,ceph3=10.0.0.103:6789/0}
election epoch 10, quorum 0,1,2 ceph1,ceph2,ceph3
 osdmap e76: 3 osds: 3 up, 3 in
  pgmap v328: 128 pgs, 1 pools, 10 bytes data, 1 objects
120 MB used, 45926 MB / 46046 MB avail
 128 active+clean


# ceph osd out 0
marked out osd.0.

# ceph -w
cluster 64f87255-d56e-499d-8ebc-65e0f577e0aa
 health HEALTH_WARN
128 pgs stuck unclean
recovery 1/3 objects misplaced (33.333%)
 monmap e1: 3 mons at 
{ceph1=10.0.0.101:6789/0,ceph2=10.0.0.102:6789/0,ceph3=10.0.0.103:6789/0}
election epoch 10, quorum 0,1,2 ceph1,ceph2,ceph3
 osdmap e79: 3 osds: 3 up, 2 in; 128 remapped pgs
  pgmap v332: 128 pgs, 1 pools, 10 bytes data, 1 objects
89120 kB used, 30610 MB / 30697 MB avail
1/3 objects misplaced (33.333%)
 128 active+remapped

2015-08-12 18:43:12.412286 mon.0 [INF] pgmap v332: 128 pgs: 128 
active+remapped; 10 bytes data, 89120 kB used, 30610 MB / 30697 MB avail; 1/3 
objects misplaced (33.333%)
2015-08-12 18:43:20.362337 mon.0 [INF] HEALTH_WARN; 128 pgs stuck unclean; 
recovery 1/3 objects misplaced (33.333%)
2015-08-12 18:44:15.055825 mon.0 [INF] pgmap v333: 128 pgs: 128 
active+remapped; 10 bytes data, 89120 kB used, 30610 MB / 30697 MB avail; 1/3 
objects misplaced (33.333%)


and it never become active+clean . 
What I’m doing wrong ? 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: OSD crashes after upgrade to 0.80.10

2015-08-12 Thread Gerd Jakobovitsch

An update:

It seems that I am arriving at memory shortage. Even with 32 GB for 20 
OSDs and 2 GB swap, ceph-osd uses all available memory.
I created another swap device with 10 GB, and I managed to get the 
failed OSD running without crash, but consuming extra 5 GB.

Are there known issues regarding memory on ceph osd?

But I still get the problem of the incomplete+inactive PG.

Regards.

Gerd

On 12-08-2015 10:11, Gerd Jakobovitsch wrote:

I tried it, the error propagates to whichever OSD gets the errorred PG.

For the moment, this is my worst problem. I have one PG 
incomplete+inactive, and the OSD with the highest priority in it gets 
100 blocked requests (I guess that is the maximum), and, although 
running, doesn't get other requests - for example, ceph tell osd.21 
injectargs '--osd-max-backfills 1'. After some time, it crashes, and 
the blocked requests go to the second OSD for the errorred PG. I can't 
get rid of these slow requests.


I guessed a problem with leveldb, I checked, and had the default 
version for debian wheezy (0+20120530.gitdd0d562-1). I updated it for 
wheezy-backports (1.17-1~bpo70+1), but the error was the same.


I use regular wheezy kernel (3.2+46).

On 11-08-2015 23:52, Haomai Wang wrote:

it seems like a leveldb problem. could you just kick it out and add a
new osd to make cluster healthy firstly?

On Wed, Aug 12, 2015 at 1:31 AM, Gerd Jakobovitschg...@mandic.net.br  wrote:

Dear all,

I run a ceph system with 4 nodes and ~80 OSDs using xfs, with currently 75%
usage, running firefly. On friday I upgraded it from 0.80.8 to 0.80.10, and
since then I got several OSDs crashing and never recovering: trying to run
it, ends up crashing as follows.

Is this problem known? Is there any configuration that should be checked?
Any way to try to recover these OSDs without losing all data?

After that, setting the OSD to lost, I got one incomplete, inactive PG. Is
there any way to recover it? Data still exists in crashed OSDs.

Regards.

[(12:58:13) root@spcsnp3 ~]# service ceph start osd.7
=== osd.7 ===
2015-08-11 12:58:21.003876 7f17ed52b700  1 monclient(hunting): found
mon.spcsmp2
2015-08-11 12:58:21.003915 7f17ef493700  5 monclient: authenticate success,
global_id 206010466
create-or-move updated item name 'osd.7' weight 3.64 at location
{host=spcsnp3,root=default} to crush map
Starting Ceph osd.7 on spcsnp3...
2015-08-11 12:58:21.279878 7f200fa8f780  0 ceph version 0.80.10
(ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918
starting osd.7 at :/0 osd_data /var/lib/ceph/osd/ceph-7
/var/lib/ceph/osd/ceph-7/journal
[(12:58:21) root@spcsnp3 ~]# 2015-08-11 12:58:21.348094 7f200fa8f780 10
filestore(/var/lib/ceph/osd/ceph-7) dump_stop
2015-08-11 12:58:21.348291 7f200fa8f780  5
filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal
/var/lib/ceph/osd/ceph-7/journal
2015-08-11 12:58:21.348326 7f200fa8f780 10
filestore(/var/lib/ceph/osd/ceph-7) mount fsid is
54c136da-c51c-4799-b2dc-b7988982ee00
2015-08-11 12:58:21.349010 7f200fa8f780  0
filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs)
2015-08-11 12:58:21.349026 7f200fa8f780  1
filestore(/var/lib/ceph/osd/ceph-7)  disabling 'filestore replica fadvise'
due to known issues with fadvise(DONTNEED) on xfs
2015-08-11 12:58:21.353277 7f200fa8f780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP
ioctl is supported and appears to work
2015-08-11 12:58:21.353302 7f200fa8f780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2015-08-11 12:58:21.362106 7f200fa8f780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features:
syscall(SYS_syncfs, fd) fully supported
2015-08-11 12:58:21.362195 7f200fa8f780  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is
disabled by conf
2015-08-11 12:58:21.362701 7f200fa8f780  5
filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995
2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal (Aborted) **
  in thread 7f200fa8f780

  ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
  1: /usr/bin/ceph-osd() [0xab7562]
  2: (()+0xf0a0) [0x7f200efcd0a0]
  3: (gsignal()+0x35) [0x7f200db3f165]
  4: (abort()+0x180) [0x7f200db423e0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d]
  6: (()+0x63996) [0x7f200e393996]
  7: (()+0x639c3) [0x7f200e3939c3]
  8: (()+0x63bee) [0x7f200e393bee]
  9: (tc_new()+0x48e) [0x7f200f213aee]
  10: (std::string::_Rep::_S_create(unsigned long, unsigned long,
std::allocatorchar const)+0x59) [0x7f200e3ef999]
  11: (std::string::_Rep::_M_clone(std::allocatorchar const, unsigned
long)+0x28) [0x7f200e3f0708]
  12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0]
  13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5]
  14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2)
[0x7f200f46ffa2]
  15: 

Re: [ceph-users] osd out

2015-08-12 Thread GuangYang
If you are using the default configuration to create the pool (3 replicas), 
after losing 1 OSD and having 2 left, CRUSH would not be able to find enough 
OSDs (at least 3) to map the PG thus it would stuck at unclean.


Thanks,
Guang



 From: chm...@yandex.ru
 Date: Wed, 12 Aug 2015 19:46:01 +0300
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] osd out

 Hello.
 Could you please help me to remove osd from cluster;

 # ceph osd tree
 ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -1 0.02998 root default
 -2 0.00999 host ceph1
 0 0.00999 osd.0 up 1.0 1.0
 -3 0.00999 host ceph2
 1 0.00999 osd.1 up 1.0 1.0
 -4 0.00999 host ceph3
 2 0.00999 osd.2 up 1.0 1.0


 # ceph -s
 cluster 64f87255-d56e-499d-8ebc-65e0f577e0aa
 health HEALTH_OK
 monmap e1: 3 mons at 
 {ceph1=10.0.0.101:6789/0,ceph2=10.0.0.102:6789/0,ceph3=10.0.0.103:6789/0}
 election epoch 10, quorum 0,1,2 ceph1,ceph2,ceph3
 osdmap e76: 3 osds: 3 up, 3 in
 pgmap v328: 128 pgs, 1 pools, 10 bytes data, 1 objects
 120 MB used, 45926 MB / 46046 MB avail
 128 active+clean


 # ceph osd out 0
 marked out osd.0.

 # ceph -w
 cluster 64f87255-d56e-499d-8ebc-65e0f577e0aa
 health HEALTH_WARN
 128 pgs stuck unclean
 recovery 1/3 objects misplaced (33.333%)
 monmap e1: 3 mons at 
 {ceph1=10.0.0.101:6789/0,ceph2=10.0.0.102:6789/0,ceph3=10.0.0.103:6789/0}
 election epoch 10, quorum 0,1,2 ceph1,ceph2,ceph3
 osdmap e79: 3 osds: 3 up, 2 in; 128 remapped pgs
 pgmap v332: 128 pgs, 1 pools, 10 bytes data, 1 objects
 89120 kB used, 30610 MB / 30697 MB avail
 1/3 objects misplaced (33.333%)
 128 active+remapped

 2015-08-12 18:43:12.412286 mon.0 [INF] pgmap v332: 128 pgs: 128 
 active+remapped; 10 bytes data, 89120 kB used, 30610 MB / 30697 MB avail; 1/3 
 objects misplaced (33.333%)
 2015-08-12 18:43:20.362337 mon.0 [INF] HEALTH_WARN; 128 pgs stuck unclean; 
 recovery 1/3 objects misplaced (33.333%)
 2015-08-12 18:44:15.055825 mon.0 [INF] pgmap v333: 128 pgs: 128 
 active+remapped; 10 bytes data, 89120 kB used, 30610 MB / 30697 MB avail; 1/3 
 objects misplaced (33.333%)


 and it never become active+clean .
 What I’m doing wrong ?
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH cache layer. Very slow

2015-08-12 Thread Pieter Koorts

Hi Igor

I suspect you have very much the same problem as me.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html

Basically Samsung drives (like many SATA SSD's) are very much hit and miss so 
you will need to test them like described here to see if they are any good. 
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

To give you an idea my average performance went from 11MB/s (with Samsung SSD) 
to 30MB/s (without any SSD) on write performance. This is a very small cluster.

Pieter

On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor igor.voloshane...@gmail.com 
wrote:

Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12 
disks on each, 10 HDD, 2 SSD)

Also we cover this with custom crushmap with 2 root leaf

ID   WEIGHT  TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY
-100 5.0 root ssd
-102 1.0     host ix-s2-ssd
   2 1.0         osd.2               up  1.0          1.0
   9 1.0         osd.9               up  1.0          1.0
-103 1.0     host ix-s3-ssd
   3 1.0         osd.3               up  1.0          1.0
   7 1.0         osd.7               up  1.0          1.0
-104 1.0     host ix-s5-ssd
   1 1.0         osd.1               up  1.0          1.0
   6 1.0         osd.6               up  1.0          1.0
-105 1.0     host ix-s6-ssd
   4 1.0         osd.4               up  1.0          1.0
   8 1.0         osd.8               up  1.0          1.0
-106 1.0     host ix-s7-ssd
   0 1.0         osd.0               up  1.0          1.0
   5 1.0         osd.5               up  1.0          1.0
  -1 5.0 root platter
  -2 1.0     host ix-s2-platter
  13 1.0         osd.13              up  1.0          1.0
  17 1.0         osd.17              up  1.0          1.0
  21 1.0         osd.21              up  1.0          1.0
  27 1.0         osd.27              up  1.0          1.0
  32 1.0         osd.32              up  1.0          1.0
  37 1.0         osd.37              up  1.0          1.0
  44 1.0         osd.44              up  1.0          1.0
  48 1.0         osd.48              up  1.0          1.0
  55 1.0         osd.55              up  1.0          1.0
  59 1.0         osd.59              up  1.0          1.0
  -3 1.0     host ix-s3-platter
  14 1.0         osd.14              up  1.0          1.0
  18 1.0         osd.18              up  1.0          1.0
  23 1.0         osd.23              up  1.0          1.0
  28 1.0         osd.28              up  1.0          1.0
  33 1.0         osd.33              up  1.0          1.0
  39 1.0         osd.39              up  1.0          1.0
  43 1.0         osd.43              up  1.0          1.0
  47 1.0         osd.47              up  1.0          1.0
  54 1.0         osd.54              up  1.0          1.0
  58 1.0         osd.58              up  1.0          1.0
  -4 1.0     host ix-s5-platter
  11 1.0         osd.11              up  1.0          1.0
  16 1.0         osd.16              up  1.0          1.0
  22 1.0         osd.22              up  1.0          1.0
  26 1.0         osd.26              up  1.0          1.0
  31 1.0         osd.31              up  1.0          1.0
  36 1.0         osd.36              up  1.0          1.0
  41 1.0         osd.41              up  1.0          1.0
  46 1.0         osd.46              up  1.0          1.0
  51 1.0         osd.51              up  1.0          1.0
  56 1.0         osd.56              up  1.0          1.0
  -5 1.0     host ix-s6-platter
  12 1.0         osd.12              up  1.0          1.0
  19 1.0         osd.19              up  1.0          1.0
 24 1.0         osd.24              up  1.0          1.0
  29 1.0         osd.29              up  1.0          1.0
  34 1.0         osd.34              up  1.0          1.0
  38 1.0         osd.38              up  1.0          1.0
  42 1.0         osd.42              up  1.0          1.0
  50 1.0         osd.50              up  1.0          1.0
  53 1.0         osd.53              up  1.0          1.0
  57 1.0         osd.57              up  1.0          1.0
  -6 1.0     host ix-s7-platter
  10 1.0         osd.10              up  1.0          1.0
  15 1.0         osd.15              up  1.0          1.0
  20 1.0         osd.20              up  1.0          1.0
  25 1.0    

Re: [ceph-users] Fwd: OSD crashes after upgrade to 0.80.10

2015-08-12 Thread Gerd Jakobovitsch

I tried it, the error propagates to whichever OSD gets the errorred PG.

For the moment, this is my worst problem. I have one PG 
incomplete+inactive, and the OSD with the highest priority in it gets 
100 blocked requests (I guess that is the maximum), and, although 
running, doesn't get other requests - for example, ceph tell osd.21 
injectargs '--osd-max-backfills 1'. After some time, it crashes, and the 
blocked requests go to the second OSD for the errorred PG. I can't get 
rid of these slow requests.


I guessed a problem with leveldb, I checked, and had the default version 
for debian wheezy (0+20120530.gitdd0d562-1). I updated it for 
wheezy-backports (1.17-1~bpo70+1), but the error was the same.


I use regular wheezy kernel (3.2+46).

On 11-08-2015 23:52, Haomai Wang wrote:

it seems like a leveldb problem. could you just kick it out and add a
new osd to make cluster healthy firstly?

On Wed, Aug 12, 2015 at 1:31 AM, Gerd Jakobovitsch g...@mandic.net.br wrote:


Dear all,

I run a ceph system with 4 nodes and ~80 OSDs using xfs, with currently 75%
usage, running firefly. On friday I upgraded it from 0.80.8 to 0.80.10, and
since then I got several OSDs crashing and never recovering: trying to run
it, ends up crashing as follows.

Is this problem known? Is there any configuration that should be checked?
Any way to try to recover these OSDs without losing all data?

After that, setting the OSD to lost, I got one incomplete, inactive PG. Is
there any way to recover it? Data still exists in crashed OSDs.

Regards.

[(12:58:13) root@spcsnp3 ~]# service ceph start osd.7
=== osd.7 ===
2015-08-11 12:58:21.003876 7f17ed52b700  1 monclient(hunting): found
mon.spcsmp2
2015-08-11 12:58:21.003915 7f17ef493700  5 monclient: authenticate success,
global_id 206010466
create-or-move updated item name 'osd.7' weight 3.64 at location
{host=spcsnp3,root=default} to crush map
Starting Ceph osd.7 on spcsnp3...
2015-08-11 12:58:21.279878 7f200fa8f780  0 ceph version 0.80.10
(ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918
starting osd.7 at :/0 osd_data /var/lib/ceph/osd/ceph-7
/var/lib/ceph/osd/ceph-7/journal
[(12:58:21) root@spcsnp3 ~]# 2015-08-11 12:58:21.348094 7f200fa8f780 10
filestore(/var/lib/ceph/osd/ceph-7) dump_stop
2015-08-11 12:58:21.348291 7f200fa8f780  5
filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal
/var/lib/ceph/osd/ceph-7/journal
2015-08-11 12:58:21.348326 7f200fa8f780 10
filestore(/var/lib/ceph/osd/ceph-7) mount fsid is
54c136da-c51c-4799-b2dc-b7988982ee00
2015-08-11 12:58:21.349010 7f200fa8f780  0
filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs)
2015-08-11 12:58:21.349026 7f200fa8f780  1
filestore(/var/lib/ceph/osd/ceph-7)  disabling 'filestore replica fadvise'
due to known issues with fadvise(DONTNEED) on xfs
2015-08-11 12:58:21.353277 7f200fa8f780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP
ioctl is supported and appears to work
2015-08-11 12:58:21.353302 7f200fa8f780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2015-08-11 12:58:21.362106 7f200fa8f780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features:
syscall(SYS_syncfs, fd) fully supported
2015-08-11 12:58:21.362195 7f200fa8f780  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is
disabled by conf
2015-08-11 12:58:21.362701 7f200fa8f780  5
filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995
2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal (Aborted) **
  in thread 7f200fa8f780

  ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
  1: /usr/bin/ceph-osd() [0xab7562]
  2: (()+0xf0a0) [0x7f200efcd0a0]
  3: (gsignal()+0x35) [0x7f200db3f165]
  4: (abort()+0x180) [0x7f200db423e0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d]
  6: (()+0x63996) [0x7f200e393996]
  7: (()+0x639c3) [0x7f200e3939c3]
  8: (()+0x63bee) [0x7f200e393bee]
  9: (tc_new()+0x48e) [0x7f200f213aee]
  10: (std::string::_Rep::_S_create(unsigned long, unsigned long,
std::allocatorchar const)+0x59) [0x7f200e3ef999]
  11: (std::string::_Rep::_M_clone(std::allocatorchar const, unsigned
long)+0x28) [0x7f200e3f0708]
  12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0]
  13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5]
  14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2)
[0x7f200f46ffa2]
  15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*,
unsigned long*)+0x180) [0x7f200f468360]
  16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2)
[0x7f200f46adf2]
  17: (leveldb::DB::Open(leveldb::Options const, std::string const,
leveldb::DB**)+0xff) [0x7f200f46b11f]
  18: (LevelDBStore::do_open(std::ostream, bool)+0xd8) [0xa123a8]
  19: (FileStore::mount()+0x18e0) [0x9b7080]
  20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a]
  

Re: [ceph-users] RBD performance slowly degrades :-(

2015-08-12 Thread Irek Fasikhov
Hi.
Read this thread here:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg17360.html

С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

2015-08-12 14:52 GMT+03:00 Pieter Koorts pieter.koo...@me.com:

 Hi

 Something that's been bugging me for a while is I am trying to diagnose
 iowait time within KVM guests. Guests doing reads or writes tend do about
 50% to 90% iowait but the host itself is only doing about 1% to 2% iowait.
 So the result is the guests are extremely slow.

 I currently run 3x hosts each with a single SSD and single HDD OSD in
 cache-teir writeback mode. Although the SSD (Samsung 850 EVO 120GB) is not
 a great one it should at least perform reasonably compared to a hard disk
 and doing some direct SSD tests I get approximately 100MB/s write and
 200MB/s read on each SSD.

 When I run rados bench though, the benchmark starts with a not great but
 okay speed and as the benchmark progresses it just gets slower and slower
 till it's worse than a USB hard drive. The SSD cache pool is 120GB in size
 (360GB RAW) and in use at about 90GB. I have tried tuning the XFS mount
 options as well but it has had little effect.

 Understandably the server spec is not great but I don't expect performance
 to be that bad.

 *OSD config:*
 [osd]
 osd crush update on start = false
 osd mount options xfs =
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

 *Servers spec:*
 Dual Quad Core XEON E5410 and 32GB RAM in each server
 10GBE @ 10G speed with 8000byte Jumbo Frames.

 *Rados bench result:* (starts at 50MB/s average and plummets down to
 11MB/s)
 sudo rados bench -p rbd 50 write --no-cleanup -t 1
  Maintaining 1 concurrent writes of 4194304 bytes for up to 50 seconds or
 0 objects
  Object prefix: benchmark_data_osc-mgmt-1_10007
sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
  0   0 0 0 0 0 - 0
  1   11413   51.990652 0.0671911  0.074661
  2   12726   51.990852 0.0631836 0.0751152
  3   13736   47.992140 0.0691167 0.0802425
  4   15150   49.992256 0.0816432 0.0795869
  5   15655   43.993420  0.208393  0.088523
  6   1616039.99420  0.241164 0.0999179
  7   16463   35.993412  0.239001  0.106577
  8   16665   32.4942 8  0.214354  0.122767
  9   17271 31.5524  0.132588  0.125438
 10   17776   30.394820  0.256474  0.128548
 11   17978   28.3589 8  0.183564  0.138354
 12   18281   26.995612  0.345809  0.145523
 13   1858425.84212  0.373247  0.151291
 14   18685   24.2819 4  0.950586  0.160694
 15   18685   22.6632 0 -  0.160694
 16   19089   22.2466 8  0.204714  0.178352
 17   19493   21.879116  0.282236  0.180571
 18   19897   21.552416  0.262566  0.183742
 19   1   101   100   21.049512  0.357659  0.187477
 20   1   104   10320.59712  0.369327  0.192479
 21   1   105   104   19.8066 4  0.373233  0.194217
 22   1   105   104   18.9064 0 -  0.194217
 23   1   106   105   18.2582 2   2.35078  0.214756
 24   1   107   106   17.6642 4  0.680246  0.219147
 25   1   109   108   17.2776 8  0.677688  0.229222
 26   1   113   112   17.228316   0.29171  0.230487
 27   1   117   116   17.182816  0.255915  0.231101
 28   1   120   119   16.997612  0.412411  0.235122
 29   1   120   119   16.4115 0 -  0.235122
 30   1   120   119   15.8645 0 -  0.235122
 31   1   120   119   15.3527 0 -  0.235122
 32   1   122   121   15.1229 2  0.319309  0.262822
 33   1   124   123   14.9071 8  0.344094  0.266201
 34   1   127   126   14.821512   0.33534  0.267913
 35   1   129   128   14.6266 8  0.355403  0.269241
 36   1   132   131   14.553612  0.581528  0.274327
 37   1   132   131   14.1603 0 -  0.274327
 38   1   133   132   13.8929 2   1.43621   0.28313
 39   1   134   133   13.6392 4  0.894817  0.287729
 40   1   134   133   13.2982 0 -  0.287729
 41   1   

[ceph-users] RBD performance slowly degrades :-(

2015-08-12 Thread Pieter Koorts

Hi

Something that's been bugging me for a while is I am trying to diagnose iowait 
time within KVM guests. Guests doing reads or writes tend do about 50% to 90% 
iowait but the host itself is only doing about 1% to 2% iowait. So the result 
is the guests are extremely slow.

I currently run 3x hosts each with a single SSD and single HDD OSD in 
cache-teir writeback mode. Although the SSD (Samsung 850 EVO 120GB) is not a 
great one it should at least perform reasonably compared to a hard disk and 
doing some direct SSD tests I get approximately 100MB/s write and 200MB/s read 
on each SSD.

When I run rados bench though, the benchmark starts with a not great but okay 
speed and as the benchmark progresses it just gets slower and slower till it's 
worse than a USB hard drive. The SSD cache pool is 120GB in size (360GB RAW) 
and in use at about 90GB. I have tried tuning the XFS mount options as well but 
it has had little effect.

Understandably the server spec is not great but I don't expect performance to 
be that bad.

OSD config:
[osd]
osd crush update on start = false
osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

Servers spec:
Dual Quad Core XEON E5410 and 32GB RAM in each server
10GBE @ 10G speed with 8000byte Jumbo Frames.

Rados bench result: (starts at 50MB/s average and plummets down to 11MB/s)
sudo rados bench -p rbd 50 write --no-cleanup -t 1
 Maintaining 1 concurrent writes of 4194304 bytes for up to 50 seconds or 0 
objects
 Object prefix: benchmark_data_osc-mgmt-1_10007
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1       1        14        13   51.9906        52 0.0671911  0.074661
     2       1        27        26   51.9908        52 0.0631836 0.0751152
     3       1        37        36   47.9921        40 0.0691167 0.0802425
     4       1        51        50   49.9922        56 0.0816432 0.0795869
     5       1        56        55   43.9934        20  0.208393  0.088523
     6       1        61        60    39.994        20  0.241164 0.0999179
     7       1        64        63   35.9934        12  0.239001  0.106577
     8       1        66        65   32.4942         8  0.214354  0.122767
     9       1        72        71     31.55        24  0.132588  0.125438
    10       1        77        76   30.3948        20  0.256474  0.128548
    11       1        79        78   28.3589         8  0.183564  0.138354
    12       1        82        81   26.9956        12  0.345809  0.145523
    13       1        85        84    25.842        12  0.373247  0.151291
    14       1        86        85   24.2819         4  0.950586  0.160694
    15       1        86        85   22.6632         0         -  0.160694
    16       1        90        89   22.2466         8  0.204714  0.178352
    17       1        94        93   21.8791        16  0.282236  0.180571
    18       1        98        97   21.5524        16  0.262566  0.183742
    19       1       101       100   21.0495        12  0.357659  0.187477
    20       1       104       103    20.597        12  0.369327  0.192479
    21       1       105       104   19.8066         4  0.373233  0.194217
    22       1       105       104   18.9064         0         -  0.194217
    23       1       106       105   18.2582         2   2.35078  0.214756
    24       1       107       106   17.6642         4  0.680246  0.219147
    25       1       109       108   17.2776         8  0.677688  0.229222
    26       1       113       112   17.2283        16   0.29171  0.230487
    27       1       117       116   17.1828        16  0.255915  0.231101
    28       1       120       119   16.9976        12  0.412411  0.235122
    29       1       120       119   16.4115         0         -  0.235122
    30       1       120       119   15.8645         0         -  0.235122
    31       1       120       119   15.3527         0         -  0.235122
    32       1       122       121   15.1229         2  0.319309  0.262822
    33       1       124       123   14.9071         8  0.344094  0.266201
    34       1       127       126   14.8215        12   0.33534  0.267913
    35       1       129       128   14.6266         8  0.355403  0.269241
    36       1       132       131   14.5536        12  0.581528  0.274327
    37       1       132       131   14.1603         0         -  0.274327
    38       1       133       132   13.8929         2   1.43621   0.28313
    39       1       134       133   13.6392         4  0.894817  0.287729
    40       1       134       133   13.2982         0         -  0.287729
    41       1       135       134   13.0714         2   1.87878  0.299602
    42       1       138       137   13.0459        12  0.309637  0.304601
    43       1       140       139   12.9285         8  0.302935  0.304491
    44       1       141       140   12.7256         4    1.5538  0.313415
    45 

[ceph-users] Cache tier best practices

2015-08-12 Thread Dominik Zalewski
Hi,

I would like to hear from people who use cache tier in Ceph about best
practices and things I should avoid.

I remember hearing that it wasn't that stable back then. Has it changed in
Hammer release?

Any tips and tricks are much appreciated!

Thanks

Dominik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache tier best practices

2015-08-12 Thread Nick Fisk
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Dominik Zalewski
 Sent: 12 August 2015 14:40
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Cache tier best practices
 
 Hi,
 
 I would like to hear from people who use cache tier in Ceph about best
 practices and things I should avoid.
 
 I remember hearing that it wasn't that stable back then. Has it changed in
 Hammer release?

It's not so much the stability, but the performance. If your working set will 
sit mostly in the cache tier and won't tend to change then you might be 
alright. Otherwise you will find that performance is very poor.

Only tip I can really give is that I have found dropping the RBD block size 
down to 512kb-1MB helps quite a bit as it makes the cache more effective and 
also minimises the amount of data transferred on each promotion/flush.

 
 Any tips and tricks are much appreciated!
 
 Thanks
 
 Dominik




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd rename snaps?

2015-08-12 Thread Jason Dillaman
There currently is no mechanism to rename snapshots without hex editing the RBD 
image header data structure.  I created a new Ceph feature request [1] to add 
this ability in the future.

[1] http://tracker.ceph.com/issues/12678

-- 

Jason Dillaman 
Red Hat Ceph Storage Engineering 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message -
 From: Stefan Priebe - Profihost AG s.pri...@profihost.ag
 To: ceph-users@lists.ceph.com
 Sent: Wednesday, August 12, 2015 11:10:07 AM
 Subject: [ceph-users] rbd rename snaps?
 
 Hi,
 
 for mds there is the ability to rename snapshots. But for rbd i can't
 see one.
 
 Is there a way to rename a snapshot?
 
 Greets,
 Stefan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd out

2015-08-12 Thread chmind
Yeah. You are right. Thank you. 

 On Aug 12, 2015, at 19:53, GuangYang yguan...@outlook.com wrote:
 
 If you are using the default configuration to create the pool (3 replicas), 
 after losing 1 OSD and having 2 left, CRUSH would not be able to find enough 
 OSDs (at least 3) to map the PG thus it would stuck at unclean.
 
 
 Thanks,
 Guang
 
 
 
 From: chm...@yandex.ru
 Date: Wed, 12 Aug 2015 19:46:01 +0300
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] osd out
 
 Hello.
 Could you please help me to remove osd from cluster;
 
 # ceph osd tree
 ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -1 0.02998 root default
 -2 0.00999 host ceph1
 0 0.00999 osd.0 up 1.0 1.0
 -3 0.00999 host ceph2
 1 0.00999 osd.1 up 1.0 1.0
 -4 0.00999 host ceph3
 2 0.00999 osd.2 up 1.0 1.0
 
 
 # ceph -s
 cluster 64f87255-d56e-499d-8ebc-65e0f577e0aa
 health HEALTH_OK
 monmap e1: 3 mons at 
 {ceph1=10.0.0.101:6789/0,ceph2=10.0.0.102:6789/0,ceph3=10.0.0.103:6789/0}
 election epoch 10, quorum 0,1,2 ceph1,ceph2,ceph3
 osdmap e76: 3 osds: 3 up, 3 in
 pgmap v328: 128 pgs, 1 pools, 10 bytes data, 1 objects
 120 MB used, 45926 MB / 46046 MB avail
 128 active+clean
 
 
 # ceph osd out 0
 marked out osd.0.
 
 # ceph -w
 cluster 64f87255-d56e-499d-8ebc-65e0f577e0aa
 health HEALTH_WARN
 128 pgs stuck unclean
 recovery 1/3 objects misplaced (33.333%)
 monmap e1: 3 mons at 
 {ceph1=10.0.0.101:6789/0,ceph2=10.0.0.102:6789/0,ceph3=10.0.0.103:6789/0}
 election epoch 10, quorum 0,1,2 ceph1,ceph2,ceph3
 osdmap e79: 3 osds: 3 up, 2 in; 128 remapped pgs
 pgmap v332: 128 pgs, 1 pools, 10 bytes data, 1 objects
 89120 kB used, 30610 MB / 30697 MB avail
 1/3 objects misplaced (33.333%)
 128 active+remapped
 
 2015-08-12 18:43:12.412286 mon.0 [INF] pgmap v332: 128 pgs: 128 
 active+remapped; 10 bytes data, 89120 kB used, 30610 MB / 30697 MB avail; 
 1/3 objects misplaced (33.333%)
 2015-08-12 18:43:20.362337 mon.0 [INF] HEALTH_WARN; 128 pgs stuck unclean; 
 recovery 1/3 objects misplaced (33.333%)
 2015-08-12 18:44:15.055825 mon.0 [INF] pgmap v333: 128 pgs: 128 
 active+remapped; 10 bytes data, 89120 kB used, 30610 MB / 30697 MB avail; 
 1/3 objects misplaced (33.333%)
 
 
 and it never become active+clean .
 What I’m doing wrong ?
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com