Re: [ceph-users] What is in the mon leveldb?

2018-03-27 Thread Tracy Reed
>   health: HEALTH_WARN
>   recovery 1230/13361271 objects misplaced (0.009%)
> 
> and no recovery is happening. I'm not sure why. This hasn't happened
> before. But the mon db had been growing since long before this
> circumstance.

Hmmok, the recent trouble started a few days ago when we removed a
node containing 4 OSDs from the cluster. The OSDs on that node were shut
down but were not removed from the crush map. So apparently this has
caused some issues. I just removed the OSDs properly and now there is
recovery happening. Unfortunately it now says 30% of my objects are
misplaced so I'm looking at 24 hours of recovery. Maybe the store.db
will be smaller when it finally finishes.

-- 
Tracy Reed
http://tracyreed.org
Digital signature attached for your safety.


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is in the mon leveldb?

2018-03-27 Thread Tracy Reed
On Mon, Mar 26, 2018 at 11:15:34PM PDT, Wido den Hollander spake thusly:
> The MONs keep a history of OSDMaps and other maps. Normally these maps
> are trimmed from the database, but if one or more PGs are not
> active+clean the MONs will keep a large history to get old OSDs up to
> speed which might be needed to bring that PGs to a clean state again.
>
> What is the status of your Ceph cluster (ceph -s) and what version are
> you running?

Ah...well. That leads to my next question which may resolve this issue:

Current state of my cluster is:

  health: HEALTH_WARN
  recovery 1230/13361271 objects misplaced (0.009%)

and no recovery is happening. I'm not sure why. This hasn't happened
before. But the mon db had been growing since long before this
circumstance.

Any idea why it might be stuck like this? I suppose I need to clear this
up before I can know if this is the cause of the disk usage.

> And yes, make sure your MONs do have a tens of GBs available should they
> need it for a very long recovery.

Yeah...I've temporarily moved the store.db to another disk and symlinked
it back but I'm working towards rebuilding my mons.

> For example, I'm working on a 2200 OSD cluster which has been doing a
> recovery operation for a week now and the MON DBs are about 50GB now.

Wow. My cluster is only around 70 OSDs.

Thanks!

-- 
Tracy Reed
http://tracyreed.org
Digital signature attached for your safety.


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error getting attr on : 32.5_head, #-34:a0000000:::scrub_32.5:head#, (61) No data available bad?

2018-03-27 Thread Marc Roos

Is this bad? Or expected because a osd is down?

[@c01 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 
--pool rbd rbd_data.1f114174b0dc51.0974 remove
Error getting attr on : 32.5_head,#-34:a000:::scrub_32.5:head#, (61) 
No data available
Error getting attr on : 34.7_head,#-36:e000:::scrub_34.7:head#, (61) 
No data available
Error getting attr on : 31.3_head,#-33:c000:::scrub_31.3:head#, (61) 
No data available
Error getting attr on : 31.4_head,#-33:2000:::scrub_31.4:head#, (61) 
No data available
Error getting attr on : 33.1_head,#-35:8000:::scrub_33.1:head#, (61) 
No data available
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-27 Thread Peter Linder

Did you upgrade from an earlier version?

With 12.2.4 the option/"mon_max_pg_per_osd"/ is set at 200 I think and 
no OSD will allow more than 2x that and I *think* that includes the PGs 
that a OSD has as well as the ones it is about to get during a 
rebalance. With that said you should be below 400 anyways but who knows? 
I don't remember what "indep" really does.


If you add one OSD at a time and let the rebalancing process finish in 
between perhaps then it can complete as the number of PGs per OSD 
decreases?


By the way, if you change your failure domain to "host", will you not 
need at least 10+3 hosts?



//


Den 2018-03-27 kl. 20:56, skrev Jon Light:
Oops, sorry about not including the version. Everything is running 
12.2.4 on Ubuntu 16.04.


Below is the output from ceph osd df. The OSDs are pretty full, hence 
adding a new OSD node. I did have to bump up the nearfull ratio to .90 
and reweight a few OSDs to bring them a little closer to the average.


ID  CLASS WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE  VAR PGS
  0   ssd 1.74649  1.0 1788G 15688M 1773G  0.86 0.01 88
  1   ssd 1.74649  1.0 1788G 16489M 1772G  0.90 0.01 96
  2   ssd 1.74649  1.0 1788G 17224M 1771G  0.94 0.01 86
  3   ssd 1.74649  1.0 1788G 16745M 1772G  0.91 0.01 100
  4   ssd 1.74649  1.0 1788G 17016M 1771G  0.93 0.01 109
  5   ssd 1.74649  1.0 1788G 15964M 1772G  0.87 0.01 101
  6   ssd 1.74649  1.0 1788G 15612M 1773G  0.85 0.01 95
  7   ssd 1.74649  1.0 1788G 16109M 1772G  0.88 0.01 93
  8   hdd 9.09560  1.0 9313G  7511G 1802G 80.65 1.21 169
  9   hdd 9.09560  1.0 9313G  7155G 2158G 76.83 1.16 161
 10   hdd 9.09560  1.0 9313G  7953G 1360G 85.39 1.28 179
 11   hdd 9.09560  0.95000 9313G  7821G 1492G 83.98 1.26 176
 12   hdd 9.09560  1.0 9313G  7193G 2120G 77.24 1.16 162
 13   hdd 9.09560  1.0 9313G  8131G 1182G 87.30 1.31 183
 14   hdd 9.09560  1.0 9313G  7643G 1670G 82.07 1.23 172
 15   hdd 9.09560  1.0 9313G  7019G 2294G 75.36 1.13 158
 16   hdd 9.09560  1.0 9313G  7419G 1894G 79.66 1.20 167
 17   hdd 9.09560  1.0 9313G  7333G 1980G 78.74 1.18 165
 18   hdd 9.09560  1.0 9313G  7107G 2206G 76.31 1.15 160
 19   hdd 9.09560  1.0 9313G  7288G 2025G 78.25 1.18 164
 20   hdd 9.09560  1.0 9313G  8133G 1180G 87.32 1.31 183
 21   hdd 9.09560  1.0 9313G  7374G 1939G 79.17 1.19 166
 22   hdd 9.09560  1.0 9313G  7550G 1763G 81.07 1.22 170
 23   hdd 9.09560  1.0 9313G  7552G 1761G 81.08 1.22 170
 24   hdd 9.09560  1.0 9313G  7955G 1358G 85.42 1.28 179
 25   hdd 9.09560  1.0 9313G  7909G 1404G 84.92 1.28 178
 26   hdd 9.09560  1.0 9313G  7685G 1628G 82.51 1.24 173
 27   hdd 9.09560  1.0 9313G  7284G 2029G 78.21 1.18 164
 28   hdd 9.09560  1.0 9313G  7243G 2070G 77.77 1.17 163
 29   hdd 9.09560  1.0 9313G  7509G 1804G 80.63 1.21 169
 30   hdd 9.09560  1.0 9313G  7065G 2248G 75.86 1.14 159
 31   hdd 9.09560  1.0 9313G  7155G 2158G 76.83 1.16 161
 32   hdd 9.09560  1.0 9313G  6932G 2381G 74.43 1.12 156
 33   hdd 9.09560  1.0 9313G  6756G 2557G 72.54 1.09 152
 34   hdd 9.09560  1.0 9313G  7687G 1626G 82.54 1.24 173
 35   hdd 9.09560  1.0 9313G  6665G 2648G 71.57 1.08 150
 36   hdd 9.09560  1.0 9313G  7954G 1359G 85.41 1.28 179
 37   hdd 9.09560  1.0 9313G  7113G 2199G 76.38 1.15 160
 38   hdd 9.09560  1.0 9313G  7286G 2027G 78.23 1.18 164
 39   hdd 9.09560  1.0 9313G  7198G 2115G 77.28 1.16 162
 40   hdd 9.09560  1.0 9313G  7953G 1360G 85.39 1.28 179
 41   hdd 9.09560  1.0 9313G  6756G 2557G 72.54 1.09 152
 42   hdd 9.09560  1.0 9313G  7241G 2072G 77.75 1.17 163
 43   hdd 9.09560  1.0 9313G  7063G 2250G 75.84 1.14 159
 44   hdd 9.09560  1.0 9313G  7951G 1362G 85.38 1.28 179
 45   hdd 9.09560  1.0 9313G  6708G 2605G 72.03 1.08 151
 46   hdd 9.09560  1.0 9313G  7598G 1715G 81.58 1.23 171
 47   hdd 9.09560  1.0 9313G  7065G 2248G 75.86 1.14 159
 48   hdd 9.09560  1.0 9313G  7868G 1445G 84.48 1.27 177
 49   hdd 9.09560  1.0 9313G  7331G 1982G 78.72 1.18 165
 50   hdd 9.09560  1.0 9313G  7377G 1936G 79.21 1.19 166
 51   hdd 9.09560  1.0 9313G  7065G 2248G 75.86 1.14 159
 52   hdd 9.09560  1.0 9313G  8041G 1272G 86.34 1.30 181
 53   hdd 9.09560  1.0 9313G  7152G 2161G 76.79 1.15 161
 54   hdd 9.09560  1.0 9313G  7505G 1808G 80.58 1.21 169
 55   hdd 9.09560  1.0 9313G  7556G 1757G 81.13 1.22 170
 56   hdd 9.09560  1.0 9313G  6841G 2472G 73.46 1.10 154
 57   hdd 9.09560  1.0 9313G  7598G 1715G 81.58 1.23 171
 58   hdd 9.09560  1.0 9313G  7245G 2068G 77.79 1.17 163
 59   hdd 9.09560  1.0 9313G  7152G 2161G 76.79 1.15 161
 60   hdd 9.09560  1.0 9313G  7864G 1449G 84.44 1.27 177
 61   hdd 9.09560  1.0 9313G  6890G 2423G 73.98 1.11 155
 62   hdd 9.09560  1.0 9313G  6884G 2429G 73.92 1.11 155
 63   hdd 9.09560  1.0 9313G  7776G 1537G 83.49 1.26 175
 64   hdd 9.09560  1.0 9313G  7597G 1716G 81.57 1.23 

Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-27 Thread Jon Light
Oops, sorry about not including the version. Everything is running 12.2.4
on Ubuntu 16.04.

Below is the output from ceph osd df. The OSDs are pretty full, hence
adding a new OSD node. I did have to bump up the nearfull ratio to .90 and
reweight a few OSDs to bring them a little closer to the average.

ID  CLASS WEIGHT  REWEIGHT SIZE  USEAVAIL %USE  VAR  PGS
  0   ssd 1.74649  1.0 1788G 15688M 1773G  0.86 0.01  88
  1   ssd 1.74649  1.0 1788G 16489M 1772G  0.90 0.01  96
  2   ssd 1.74649  1.0 1788G 17224M 1771G  0.94 0.01  86
  3   ssd 1.74649  1.0 1788G 16745M 1772G  0.91 0.01 100
  4   ssd 1.74649  1.0 1788G 17016M 1771G  0.93 0.01 109
  5   ssd 1.74649  1.0 1788G 15964M 1772G  0.87 0.01 101
  6   ssd 1.74649  1.0 1788G 15612M 1773G  0.85 0.01  95
  7   ssd 1.74649  1.0 1788G 16109M 1772G  0.88 0.01  93
  8   hdd 9.09560  1.0 9313G  7511G 1802G 80.65 1.21 169
  9   hdd 9.09560  1.0 9313G  7155G 2158G 76.83 1.16 161
 10   hdd 9.09560  1.0 9313G  7953G 1360G 85.39 1.28 179
 11   hdd 9.09560  0.95000 9313G  7821G 1492G 83.98 1.26 176
 12   hdd 9.09560  1.0 9313G  7193G 2120G 77.24 1.16 162
 13   hdd 9.09560  1.0 9313G  8131G 1182G 87.30 1.31 183
 14   hdd 9.09560  1.0 9313G  7643G 1670G 82.07 1.23 172
 15   hdd 9.09560  1.0 9313G  7019G 2294G 75.36 1.13 158
 16   hdd 9.09560  1.0 9313G  7419G 1894G 79.66 1.20 167
 17   hdd 9.09560  1.0 9313G  7333G 1980G 78.74 1.18 165
 18   hdd 9.09560  1.0 9313G  7107G 2206G 76.31 1.15 160
 19   hdd 9.09560  1.0 9313G  7288G 2025G 78.25 1.18 164
 20   hdd 9.09560  1.0 9313G  8133G 1180G 87.32 1.31 183
 21   hdd 9.09560  1.0 9313G  7374G 1939G 79.17 1.19 166
 22   hdd 9.09560  1.0 9313G  7550G 1763G 81.07 1.22 170
 23   hdd 9.09560  1.0 9313G  7552G 1761G 81.08 1.22 170
 24   hdd 9.09560  1.0 9313G  7955G 1358G 85.42 1.28 179
 25   hdd 9.09560  1.0 9313G  7909G 1404G 84.92 1.28 178
 26   hdd 9.09560  1.0 9313G  7685G 1628G 82.51 1.24 173
 27   hdd 9.09560  1.0 9313G  7284G 2029G 78.21 1.18 164
 28   hdd 9.09560  1.0 9313G  7243G 2070G 77.77 1.17 163
 29   hdd 9.09560  1.0 9313G  7509G 1804G 80.63 1.21 169
 30   hdd 9.09560  1.0 9313G  7065G 2248G 75.86 1.14 159
 31   hdd 9.09560  1.0 9313G  7155G 2158G 76.83 1.16 161
 32   hdd 9.09560  1.0 9313G  6932G 2381G 74.43 1.12 156
 33   hdd 9.09560  1.0 9313G  6756G 2557G 72.54 1.09 152
 34   hdd 9.09560  1.0 9313G  7687G 1626G 82.54 1.24 173
 35   hdd 9.09560  1.0 9313G  6665G 2648G 71.57 1.08 150
 36   hdd 9.09560  1.0 9313G  7954G 1359G 85.41 1.28 179
 37   hdd 9.09560  1.0 9313G  7113G 2199G 76.38 1.15 160
 38   hdd 9.09560  1.0 9313G  7286G 2027G 78.23 1.18 164
 39   hdd 9.09560  1.0 9313G  7198G 2115G 77.28 1.16 162
 40   hdd 9.09560  1.0 9313G  7953G 1360G 85.39 1.28 179
 41   hdd 9.09560  1.0 9313G  6756G 2557G 72.54 1.09 152
 42   hdd 9.09560  1.0 9313G  7241G 2072G 77.75 1.17 163
 43   hdd 9.09560  1.0 9313G  7063G 2250G 75.84 1.14 159
 44   hdd 9.09560  1.0 9313G  7951G 1362G 85.38 1.28 179
 45   hdd 9.09560  1.0 9313G  6708G 2605G 72.03 1.08 151
 46   hdd 9.09560  1.0 9313G  7598G 1715G 81.58 1.23 171
 47   hdd 9.09560  1.0 9313G  7065G 2248G 75.86 1.14 159
 48   hdd 9.09560  1.0 9313G  7868G 1445G 84.48 1.27 177
 49   hdd 9.09560  1.0 9313G  7331G 1982G 78.72 1.18 165
 50   hdd 9.09560  1.0 9313G  7377G 1936G 79.21 1.19 166
 51   hdd 9.09560  1.0 9313G  7065G 2248G 75.86 1.14 159
 52   hdd 9.09560  1.0 9313G  8041G 1272G 86.34 1.30 181
 53   hdd 9.09560  1.0 9313G  7152G 2161G 76.79 1.15 161
 54   hdd 9.09560  1.0 9313G  7505G 1808G 80.58 1.21 169
 55   hdd 9.09560  1.0 9313G  7556G 1757G 81.13 1.22 170
 56   hdd 9.09560  1.0 9313G  6841G 2472G 73.46 1.10 154
 57   hdd 9.09560  1.0 9313G  7598G 1715G 81.58 1.23 171
 58   hdd 9.09560  1.0 9313G  7245G 2068G 77.79 1.17 163
 59   hdd 9.09560  1.0 9313G  7152G 2161G 76.79 1.15 161
 60   hdd 9.09560  1.0 9313G  7864G 1449G 84.44 1.27 177
 61   hdd 9.09560  1.0 9313G  6890G 2423G 73.98 1.11 155
 62   hdd 9.09560  1.0 9313G  6884G 2429G 73.92 1.11 155
 63   hdd 9.09560  1.0 9313G  7776G 1537G 83.49 1.26 175
 64   hdd 9.09560  1.0 9313G  7597G 1716G 81.57 1.23 171
 65   hdd 9.09560  1.0 9313G  6706G 2607G 72.00 1.08 151
 66   hdd 9.09560  0.95000 9313G  7820G 1493G 83.97 1.26 176
 67   hdd 9.09560  0.95000 9313G  8043G 1270G 86.36 1.30 181
 68   hdd 9.09560  1.0 9313G  7643G 1670G 82.07 1.23 172
 69   hdd 9.09560  1.0 9313G  6620G 2693G 71.08 1.07 149
 70   hdd 9.09560  1.0 9313G  7775G 1538G 83.48 1.26 175
 71   hdd 9.09560  1.0 9313G  7731G 1581G 83.02 1.25 174
 72   hdd 9.09560  1.0 9313G  7598G 1715G 81.58 1.23 171
 73   hdd 9.09560  1.0 9313G  6575G 2738G 70.60 1.06 148
 74   hdd 9.09560  1.0 9313G  7155G 2158G 76.83 1.16 161
 75   hdd 9.09560  1.0 9313G  6220G 3093G 66.79 1.00 

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-03-27 Thread Patrick Donnelly
Hello Alexandre,

On Thu, Mar 22, 2018 at 2:29 AM, Alexandre DERUMIER  wrote:
> Hi,
>
> I'm running cephfs since 2 months now,
>
> and my active msd memory usage is around 20G now (still growing).
>
> ceph 1521539 10.8 31.2 20929836 20534868 ?   Ssl  janv.26 8573:34 
> /usr/bin/ceph-mds -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
> USER PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>
>
> this is on luminous 12.2.2
>
> only tuning done is:
>
> mds_cache_memory_limit = 5368709120
>
>
> (5GB). I known it's a soft limit, but 20G seem quite huge vs 5GB 
>
>
> Is it normal ?

No, that's definitely not normal!


> # ceph daemon mds.2 perf dump mds
> {
> "mds": {
> "request": 1444009197,
> "reply": 1443999870,
> "reply_latency": {
> "avgcount": 1443999870,
> "sum": 1657849.656122933,
> "avgtime": 0.001148095
> },
> "forward": 0,
> "dir_fetch": 51740910,
> "dir_commit": 9069568,
> "dir_split": 64367,
> "dir_merge": 58016,
> "inode_max": 2147483647,
> "inodes": 2042975,
> "inodes_top": 152783,
> "inodes_bottom": 138781,
> "inodes_pin_tail": 1751411,
> "inodes_pinned": 1824714,
> "inodes_expired": 7258145573,
> "inodes_with_caps": 1812018,
> "caps": 2538233,
> "subtrees": 2,
> "traverse": 1591668547,
> "traverse_hit": 1259482170,
> "traverse_forward": 0,
> "traverse_discover": 0,
> "traverse_dir_fetch": 30827836,
> "traverse_remote_ino": 7510,
> "traverse_lock": 86236,
> "load_cent": 144401980319,
> "q": 49,
> "exported": 0,
> "exported_inodes": 0,
> "imported": 0,
> "imported_inodes": 0
> }
> }

Can you also share `ceph daemon mds.2 cache status`, the full `ceph
daemon mds.2 perf dump`, and `ceph status`?

Note [1] will be in 12.2.5 and may help with your issue.

[1] https://github.com/ceph/ceph/pull/20527

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-27 Thread Peter Linder
I've had similar issues, but I think your problem might be something 
else. Could you send the output of "ceph osd df"?


Other people will probably be interested in what version you are using 
as well.



Den 2018-03-27 kl. 20:07, skrev Jon Light:

Hi all,

I'm adding a new OSD node with 36 OSDs to my cluster and have run into 
some problems. Here are some of the details of the cluster:


1 OSD node with 80 OSDs
1 EC pool with k=10, m=3
pg_num 1024
osd failure domain

I added a second OSD node and started creating OSDs with ceph-deploy, 
one by one. The first 2 added fine, but each subsequent new OSD 
resulted in more and more PGs stuck activating. I've added a total of 
14 new OSDs, but had to set 12 of those with a weight of 0 to get the 
cluster healthy and usable until I get it fixed.


I have read some things about similar behavior due to PG overdose 
protection, but I don't think that's the case here because the failure 
domain is set to osd. Instead, I think my CRUSH rule need some attention:


rule main-storage {
        id 1
        type erasure
        min_size 3
        max_size 13
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take default class hdd
        step choose indep 0 type osd
        step emit
}

I don't believe I have modified anything from the automatically 
generated rule except for the addition of the hdd class.


I have been reading the documentation on CRUSH rules, but am having 
trouble figuring out if the rule is setup properly. After a few more 
nodes are added I do want to change the failure domain to host, but 
osd is sufficient for now.


Can anyone help out to see if the rule is causing the problems or if I 
should be looking at something else?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PGs stuck activating after adding new OSDs

2018-03-27 Thread Jon Light
Hi all,

I'm adding a new OSD node with 36 OSDs to my cluster and have run into some
problems. Here are some of the details of the cluster:

1 OSD node with 80 OSDs
1 EC pool with k=10, m=3
pg_num 1024
osd failure domain

I added a second OSD node and started creating OSDs with ceph-deploy, one
by one. The first 2 added fine, but each subsequent new OSD resulted in
more and more PGs stuck activating. I've added a total of 14 new OSDs, but
had to set 12 of those with a weight of 0 to get the cluster healthy and
usable until I get it fixed.

I have read some things about similar behavior due to PG overdose
protection, but I don't think that's the case here because the failure
domain is set to osd. Instead, I think my CRUSH rule need some attention:

rule main-storage {
id 1
type erasure
min_size 3
max_size 13
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 0 type osd
step emit
}

I don't believe I have modified anything from the automatically generated
rule except for the addition of the hdd class.

I have been reading the documentation on CRUSH rules, but am having trouble
figuring out if the rule is setup properly. After a few more nodes are
added I do want to change the failure domain to host, but osd is sufficient
for now.

Can anyone help out to see if the rule is causing the problems or if I
should be looking at something else?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Memory leak in Ceph OSD?

2018-03-27 Thread David Turner
With default memory settings, the assumed memory requirements of Ceph are
1GB RAM/1TB of OSD size.  Increasing any settings from default will
increase that baseline.

On Tue, Mar 27, 2018 at 1:10 AM Alex Gorbachev 
wrote:

> On Mon, Mar 26, 2018 at 3:08 PM, Igor Fedotov  wrote:
> > Hi Alex,
> >
> > I can see your bug report: https://tracker.ceph.com/issues/23462
> >
> > if your settings from there are applicable for your comment here then you
> > have bluestore cache size limit set to 5 Gb that totals in 90 Gb RAM
> for  18
> > OSD for BlueStore cache only.
> >
> > There is also additional memory overhead per OSD hence the amount of free
> > memory you should expect isn't that much. If any at all...
> >
> > Can you reduce bluestore cache size limits and check if out-of-memory
> issue
> > is still happening?
> >
>
> Thank you Igor, reducing to 3GB now and will advise.  I did not
> realize there's additional memory on top of the 90GB, the nodes each
> have 128 GB.
>
>
> --
> Alex Gorbachev
> Storcium
>
> >
> > Thanks,
> >
> > Igor
> >
> >
> >
> > On 3/26/2018 5:09 PM, Alex Gorbachev wrote:
> >>
> >> On Wed, Mar 21, 2018 at 2:26 PM, Kjetil Joergensen  >
> >> wrote:
> >>>
> >>> I retract my previous statement(s).
> >>>
> >>> My current suspicion is that this isn't a leak as much as it being
> >>> load-driven, after enough waiting - it generally seems to settle around
> >>> some
> >>> equilibrium. We do seem to sit on the mempools x 2.4 ~ ceph-osd RSS,
> >>> which
> >>> is on the higher side (I see documentation alluding to expecting
> ~1.5x).
> >>>
> >>> -KJ
> >>>
> >>> On Mon, Mar 19, 2018 at 3:05 AM, Konstantin Shalygin 
> >>> wrote:
> 
> 
> > We don't run compression as far as I know, so that wouldn't be it. We
> > do
> > actually run a mix of bluestore & filestore - due to the rest of the
> > cluster predating a stable bluestore by some amount.
> 
> 
> 
>  12.2.2 -> 12.2.4 at 2018/03/10: I don't see increase of memory usage.
> No
>  any compressions of course.
> 
> 
> 
> 
> 
> http://storage6.static.itmages.com/i/18/0319/h_1521453809_9131482_859b1fb0a5.png
> 
> >> I am seeing these entries under load - should be plenty of RAM on a
> >> node with 128GB RAM and 18 OSDs
> >>
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193331] winbindd
> >> cpuset=/ mems_allowed=0-1
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193337] CPU: 3 PID:
> >> 3406 Comm: winbindd Not tainted 4.14.14-041414-generic #201801201219
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193338] Hardware name:
> >> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
> >> 03/04/2015
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193339] Call Trace:
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193347]
> >> dump_stack+0x5c/0x85
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193351]
> >> dump_header+0x94/0x229
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193355]  ?
> >> do_try_to_free_pages+0x2a1/0x330
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193357]  ?
> >> get_page_from_freelist+0xa3/0xb20
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193359]
> >> oom_kill_process+0x213/0x410
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193361]
> >> out_of_memory+0x2af/0x4d0
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193363]
> >> __alloc_pages_slowpath+0xab2/0xe40
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193366]
> >> __alloc_pages_nodemask+0x261/0x280
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193370]
> >> filemap_fault+0x33f/0x6b0
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193373]  ?
> >> filemap_map_pages+0x18a/0x3a0
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193376]
> >> ext4_filemap_fault+0x2c/0x40
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193379]
> >> __do_fault+0x19/0xe0
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193381]
> >> __handle_mm_fault+0xcd6/0x1180
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193383]
> >> handle_mm_fault+0xaa/0x1f0
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193387]
> >> __do_page_fault+0x25d/0x4e0
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193391]  ?
> >> page_fault+0x36/0x60
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193393]
> >> page_fault+0x4c/0x60
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193396] RIP:
> >> 0033:0x56443d3d1239
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193397] RSP:
> >> 002b:7ffe6e44b3a0 EFLAGS: 00010246
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193399] Mem-Info:
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]
> >> active_anon:30843938 inactive_anon:1403277 isolated_anon:0
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]
> >> active_file:121 inactive_file:977 isolated_file:18
> >> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]
> 

Re: [ceph-users] Instructions for manually adding a object gateway node ?

2018-03-27 Thread Marc Roos
 

And if you don’t have the pools, I created these. But from a little 
testing I noticed that lots of them are not used, so you have to 
investigate what is necessary or not. I think it is because I have read 
old and new docs, I was not able to find a definite list of necessary 
pools. So share if you found something ;)

ceph osd pool create default.rgw 8
ceph osd pool create default.rgw.meta 8
ceph osd pool create default.rgw.control 8
ceph osd pool create default.rgw.log 8
ceph osd pool create .rgw.root 8
ceph osd pool create .rgw.gc 8
ceph osd pool create .rgw.buckets 16 
ceph osd pool create .rgw.buckets.index 8
ceph osd pool create .rgw.buckets.extra 8
ceph osd pool create .intent-log 8
ceph osd pool create .usage 8
ceph osd pool create .users 8
ceph osd pool create .users.email 8
ceph osd pool create .users.swift 8
ceph osd pool create .users.uid 8



POOL_NAME  USED   OBJECTS CLONES COPIES   
MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPSRD WR_OPSWR
.intent-log 0   0  00
  0   00 0  0 0 0
.rgw.buckets0   0  00
  0   00 0  0 0 0
.rgw.buckets.extra  0   0  00
  0   00 0  0 0 0
.rgw.buckets.index  0   0  00
  0   00 0  0 0 0
.rgw.gc 0   0  00
  0   00 0  0 0 0
.rgw.root1113   4  0   12
  0   00  6501  4334k12  8192
.usage  0   0  00
  0   00 0  0 0 0
.users  0   0  00
  0   00 0  0 0 0
.users.email0   0  00
  0   00 0  0 0 0
.users.swift0   0  00
  0   00 0  0 0 0
.users.uid  0   0  00
  0   00 0  0 0 0
default.rgw 0   0  00
  0   00 0  0 0 0
default.rgw.buckets.data3814M1596  0 4788
  0   00  1546 85953k 10850 4008M
default.rgw.buckets.index   0   4  0   12
  0   00 11070 11865k  6681 0
default.rgw.buckets.non-ec  0   0  00
  0   00  2505  1624k  1952 0
default.rgw.control 0   8  0   24
  0   00 0  0 0 0
default.rgw.log  2431 210  0  630
  0   00  12914226 12611M   8603543 13312
default.rgw.meta 2462  14  0   42
  0   00  1832  1575k   185 18432



-Original Message-
From: Marc Roos 
Sent: dinsdag 27 maart 2018 16:35
To: ceph-users; massimo.sgaravatto
Subject: Re: [ceph-users] Instructions for manually adding a object 
gateway node ?

 
This is how I did it (centos7), but beware I have client.rgw1 not
client.radosgw.rgw1


yum install ceph-radosgw

Creating the gw node user:

ceph auth get-or-create client.rgw1

ceph auth caps client.rgw1 osd 'allow rwx' mon 'allow rwx'

#limit access
#ceph auth caps client.rgw1 mon 'allow rx' osd 'allow rwx 
pool=default.rgw, allow rwx pool=default.rgw.meta, allow rwx 
pool=.rgw.root, allow rwx pool=default.rgw.control, allow rwx 
pool=.rgw.gc, allow rwx pool=.rgw.buckets, allow rwx 
pool=.rgw.buckets.index, allow rwx pool=.rgw.buckets.extra, allow rwx 
pool=default.rgw.log, allow rwx pool=.intent-log, allow rwx pool=.usage, 

allow rwx pool=.users, allow rwx pool=.users.email, allow rwx 
pool=.users.swift, allow rwx pool=.users.uid'

mkdir -p /var/lib/ceph/radosgw/ceph-rgw1
chown ceph.ceph -R /var/lib/ceph/radosgw

service ceph-radosgw@rgw1 start

systemctl enable ceph-radosgw@rgw1

Adding the configuration to /etc/ceph/ceph.conf:

[client.rgw1]
rgw_frontends = civetweb port=80+7480s ssl_certificate=/etc/ceph/xxx.pem


-Original Message-
From: Massimo Sgaravatto [mailto:massimo.sgarava...@gmail.com] 
Sent: dinsdag 27 maart 2018 16:03
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Instructions for manually adding a object gateway 
node ?

Hi

Are there somewhere some instructions on how to *MANUALLY* add a object 
gateway node on a Luminous cluster, that was manually installed (i.e. 
not using ceph-deploy) ?

In the official doc I can find instruction 

Re: [ceph-users] Instructions for manually adding a object gateway node ?

2018-03-27 Thread Marc Roos
 
This is how I did it (centos7), but beware I have client.rgw1 not 
client.radosgw.rgw1


yum install ceph-radosgw

Creating the gw node user:

ceph auth get-or-create client.rgw1

ceph auth caps client.rgw1 osd 'allow rwx' mon 'allow rwx'

#limit access
#ceph auth caps client.rgw1 mon 'allow rx' osd 'allow rwx 
pool=default.rgw, allow rwx pool=default.rgw.meta, allow rwx 
pool=.rgw.root, allow rwx pool=default.rgw.control, allow rwx 
pool=.rgw.gc, allow rwx pool=.rgw.buckets, allow rwx 
pool=.rgw.buckets.index, allow rwx pool=.rgw.buckets.extra, allow rwx 
pool=default.rgw.log, allow rwx pool=.intent-log, allow rwx pool=.usage, 
allow rwx pool=.users, allow rwx pool=.users.email, allow rwx 
pool=.users.swift, allow rwx pool=.users.uid'

mkdir -p /var/lib/ceph/radosgw/ceph-rgw1
chown ceph.ceph -R /var/lib/ceph/radosgw

service ceph-radosgw@rgw1 start

systemctl enable ceph-radosgw@rgw1

Adding the configuration to /etc/ceph/ceph.conf:

[client.rgw1]
rgw_frontends = civetweb port=80+7480s ssl_certificate=/etc/ceph/xxx.pem


-Original Message-
From: Massimo Sgaravatto [mailto:massimo.sgarava...@gmail.com] 
Sent: dinsdag 27 maart 2018 16:03
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Instructions for manually adding a object gateway 
node ?

Hi

Are there somewhere some instructions on how to *MANUALLY* add a object 
gateway node on a Luminous cluster, that was manually installed (i.e. 
not using ceph-deploy) ?

In the official doc I can find instruction only referring to ceph-deploy 
...


Thanks, Massimo 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] remove big rbd image is very slow

2018-03-27 Thread shadow_lin
I did have done that before, but in most time I can't just delete the pool.
Is there any other way to speed up the rbd image deletion?

2018-03-27 


shadowlin




发件人:Ilya Dryomov 
发送时间:2018-03-26 20:09
主题:Re: [ceph-users] remove big rbd image is very slow
收件人:"shadow_lin"
抄送:"ceph-users"

On Sat, Mar 17, 2018 at 5:11 PM, shadow_lin  wrote: 
> Hi list, 
> My ceph version is jewel 10.2.10. 
> I tired to use rbd rm to remove a 50TB image(without object map because krbd 
> does't support it).It takes about 30mins to just complete about 3%. Is this 
> expected? Is there a way to make it faster? 
> I know there are scripts to delete rados objects of the rbd image to make it 
> faster. But is the slowness expected for rbd rm command? 
> 
> PS: I also encounter very slow rbd export for large rbd image(20TB image but 
> with only a few GB data).Takes hours to completed the export.I guess both 
> are related to object map not enabled, but krbd doesn't support object map 
> feature. 

If you don't have any other images in that pool, you can simply delete 
the pool with "ceph osd pool delete".  It'll take a second ;) 

Thanks, 

Ilya ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Instructions for manually adding a object gateway node ?

2018-03-27 Thread Massimo Sgaravatto
Hi

Are there somewhere some instructions on how to *MANUALLY* add a object
gateway node on a Luminous cluster, that was manually installed (i.e. not
using ceph-deploy) ?

In the official doc I can find instruction only referring to ceph-deploy ...


Thanks, Massimo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard
On Tue, Mar 27, 2018 at 9:46 PM, Brad Hubbard  wrote:

>
>
> On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins  wrote:
>
>> Hi Brad,
>>
>> that post was mine. I knew it quite well.
>>
> That Post was about confirm the fact that minimum requirements written in
>> the documentation really didn't exists.
>>
>> However I never asked if there is somewhere a place where is possible to
>> download the DEV or the RC of Centos7.5.
>> I was thinking about to join the community of tester and developers that
>> are already testing Ceph on that "*not ready*" environment.
>>
>> In that POST these questions were not really made, so no answer where
>> given.
>>
>
> From that thread.
>
> "The necessary kernel changes actually are included as part of 4.16-rc1
> which is available now. We also offer a pre-built test kernel with the
> necessary fixes here [1].
>
> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/;
> 
>
> I notice that URL is unavailable so maybe the real question should be why
> is that kernel no longer available?
>

Turns out this build got "garbage collected" and replacing it is being
worked on right now.


>
> There are plenty more available at https://shaman.ceph.com/repos/
> kernel/testing/ but *I* can't tell you which is relevant but perhaps
> someone else can.
>
> I see that you talked also about other distribution. Well, I read around
>> that Suse already implement iSCSI.
>> However as far as I know (which is not so much), this distribution use
>> modified kernel in order to let this work.
>> And in order to use it it's needed  a dashboard that can handle these
>> kind of differences (OpenAttic).
>> I knew already OpenAttic is contributing in developing the next
>> generation of the Ceph Dashboard (and this sound damn good!).
>> However this also means to me that the *official dashboard* should not
>> be talking about ISCSI at all (as every implementation of iSCSI are running
>> on mod version).
>>
>> So these are the things I cannot figure out:
>> Why is the iSCSI board on the CEPH official dashboard? (I could
>> understand on OpenAttic which run on SUSE but not on the official one).
>>
> Why do you believe it should not be?
>
>> And why, in the official documentation, the minimu requirements to let
>> iSCSI work, is to install CentOS7.5? Which doesn't exist? Is there a RC
>> candidate which I can start to use?
>>
>
> But it doesn't say that, it says " RHEL/CentOS 7.5; Linux kernel v4.16 or
> newer; or the Ceph iSCSI client test kernel
> ". You seem to be
> ignoring the "Ceph iSCSI client test kernel
> " part?
>
> And... if SUSE or even other distribution works already with iSCSI... why
>> the documentation just doesn't reccomend these ones instead of RHEL or
>> CENTOS?
>>
> Because that would be odd, to say the least. If the documentation is
> incorrect for CentOS then it was, at least at some point, thought to be
> correct and it probably will be correct again in the near future and, if
> not, we can review and correct it as necessary.
>
>> There is something confused about what the documentation minimal
>> requirements, the dashboard suggest to be able to do, and what i read
>> around about modded Ceph for other linux distributions.
>> I create a new post to clarify all these points.
>>
>> Thanks for your answer! :)
>>
>>
>>
>> Il 27/03/2018 11:24, Brad Hubbard ha scritto:
>>
>> See the thread in this very ML titled "Ceph iSCSI is a prank?", last
>> update thirteen days ago.
>>
>> If your questions are not answered by that thread let us know.
>>
>> Please also remember that CentOS is not the only platform that ceph runs
>> on by a long shot and that not all distros lag as much as it (not a
>> criticism, just a fact. The reasons for lagging are valid and well
>> documented and should be accepted by those who choose to use them). if you
>> want the bleeding edge then rhel/centos should not be your platform of
>> choice.
>>
>>
>> On Tue, Mar 27, 2018 at 7:04 PM, Max Cuttins  wrote:
>>
>>> Thanks Jason,
>>>
>>> this is exactly what i read around and I supposed.
>>> The RHEL 7.5 is not yet released (neither is Kernel 4.16)
>>>
>>> So my dubt are 2:
>>>
>>> *1) If it's not released... why is this in the documentation?*
>>> Is the documentation talking about a Dev candidate already accessible
>>> somewhere?
>>>
>>> 2) why in the dashboard is there already a iSCSI board?
>>> I guess I miss something or is really just for future implementation
>>> and not usable yet?
>>> And if it is usable... where I can download the necessarie in order to
>>> start?
>>>
>>>
>>> Il 26/03/2018 14:10, Jason Dillaman ha scritto:
>>>
>>> RHEL 7.5 has not been released yet, but it should be released very
>>> soon. After it's released, it usually takes the CentOS team a little
>>> time to put 

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread John Spray
On Tue, Mar 27, 2018 at 12:12 PM, Max Cuttins  wrote:

> Hi Brad,
>
> that post was mine. I knew it quite well.
> That Post was about confirm the fact that minimum requirements written in
> the documentation really didn't exists.
>
However I never asked if there is somewhere a place where is possible to
> download the DEV or the RC of Centos7.5.
> I was thinking about to join the community of tester and developers that
> are already testing Ceph on that "*not ready*" environment.
>
> In that POST these questions were not really made, so no answer where
> given.
>
> I see that you talked also about other distribution. Well, I read around
> that Suse already implement iSCSI.
> However as far as I know (which is not so much), this distribution use
> modified kernel in order to let this work.
> And in order to use it it's needed  a dashboard that can handle these kind
> of differences (OpenAttic).
> I knew already OpenAttic is contributing in developing the next generation
> of the Ceph Dashboard (and this sound damn good!).
> However this also means to me that the *official dashboard* should not be
> talking about ISCSI at all (as every implementation of iSCSI are running on
> mod version).
>
> So these are the things I cannot figure out:
> Why is the iSCSI board on the CEPH official dashboard? (I could understand
> on OpenAttic which run on SUSE but not on the official one).
>

We do not forbid features in Ceph just because they require a recent
kernel.  The iSCSI support in the dashboard works if you have an iSCSI
enabled system, and does no harm if you don't.


> And why, in the official documentation, the minimu requirements to let
> iSCSI work, is to install CentOS7.5? Which doesn't exist? Is there a RC
> candidate which I can start to use?
>
I think this has already been explained to you, but to restate it for the
record: the supported kernel bits are expected to be part of CentOS 7.5,
which as you know is not out yet.  The same page in the documentation says
that the required kernel version is 4.16, so you are free to find any
distro that provides a >4.16 kernel and try that, or perhaps install a
newer kernel on an existing distro.  As for pre-releases of CentOS, I have
no idea, I don't follow their release process that closely.

And... if SUSE or even other distribution works already with iSCSI... why
> the documentation just doesn't reccomend these ones instead of RHEL or
> CENTOS?
>
There are many linux distros, many Ceph developers, and we don't all have
up to date knowledge of every distro.  If you're having success with iSCSI
on a particular distribution, then by all means open a pull request to
record that on the requirements page.

John




> There is something confused about what the documentation minimal
> requirements, the dashboard suggest to be able to do, and what i read
> around about modded Ceph for other linux distributions.
> I create a new post to clarify all these points.
>
> Thanks for your answer! :)
>
>
>
> Il 27/03/2018 11:24, Brad Hubbard ha scritto:
>
> See the thread in this very ML titled "Ceph iSCSI is a prank?", last
> update thirteen days ago.
>
> If your questions are not answered by that thread let us know.
>
> Please also remember that CentOS is not the only platform that ceph runs
> on by a long shot and that not all distros lag as much as it (not a
> criticism, just a fact. The reasons for lagging are valid and well
> documented and should be accepted by those who choose to use them). if you
> want the bleeding edge then rhel/centos should not be your platform of
> choice.
>
>
> On Tue, Mar 27, 2018 at 7:04 PM, Max Cuttins  wrote:
>
>> Thanks Jason,
>>
>> this is exactly what i read around and I supposed.
>> The RHEL 7.5 is not yet released (neither is Kernel 4.16)
>>
>> So my dubt are 2:
>>
>> *1) If it's not released... why is this in the documentation?*
>> Is the documentation talking about a Dev candidate already accessible
>> somewhere?
>>
>> 2) why in the dashboard is there already a iSCSI board?
>> I guess I miss something or is really just for future implementation
>> and not usable yet?
>> And if it is usable... where I can download the necessarie in order to
>> start?
>>
>>
>> Il 26/03/2018 14:10, Jason Dillaman ha scritto:
>>
>> RHEL 7.5 has not been released yet, but it should be released very
>> soon. After it's released, it usually takes the CentOS team a little
>> time to put together their matching release. I also suspect that Linux
>> kernel 4.16 is going to be released in the next week or so as well.
>>
>> On Sat, Mar 24, 2018 at 7:36 AM, Max Cuttins  
>>  wrote:
>>
>> As stated in the documentation, in order to use iSCSI it's needed use
>> CentOS7.5.
>> Where can I download it?
>>
>>
>> Thanks
>>
>>
>> iSCSI Targets
>>
>> Traditionally, block-level access to a Ceph storage cluster has been limited
>> to QEMU and librbd, which is a key enabler for adoption 

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard
On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins  wrote:

> Hi Brad,
>
> that post was mine. I knew it quite well.
>
That Post was about confirm the fact that minimum requirements written in
> the documentation really didn't exists.
>
> However I never asked if there is somewhere a place where is possible to
> download the DEV or the RC of Centos7.5.
> I was thinking about to join the community of tester and developers that
> are already testing Ceph on that "*not ready*" environment.
>
> In that POST these questions were not really made, so no answer where
> given.
>

>From that thread.

"The necessary kernel changes actually are included as part of 4.16-rc1
which is available now. We also offer a pre-built test kernel with the
necessary fixes here [1].

[1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/;


I notice that URL is unavailable so maybe the real question should be why
is that kernel no longer available?

There are plenty more available at
https://shaman.ceph.com/repos/kernel/testing/ but *I* can't tell you which
is relevant but perhaps someone else can.

I see that you talked also about other distribution. Well, I read around
> that Suse already implement iSCSI.
> However as far as I know (which is not so much), this distribution use
> modified kernel in order to let this work.
> And in order to use it it's needed  a dashboard that can handle these kind
> of differences (OpenAttic).
> I knew already OpenAttic is contributing in developing the next generation
> of the Ceph Dashboard (and this sound damn good!).
> However this also means to me that the *official dashboard* should not be
> talking about ISCSI at all (as every implementation of iSCSI are running on
> mod version).
>
> So these are the things I cannot figure out:
> Why is the iSCSI board on the CEPH official dashboard? (I could understand
> on OpenAttic which run on SUSE but not on the official one).
>
Why do you believe it should not be?

> And why, in the official documentation, the minimu requirements to let
> iSCSI work, is to install CentOS7.5? Which doesn't exist? Is there a RC
> candidate which I can start to use?
>

But it doesn't say that, it says " RHEL/CentOS 7.5; Linux kernel v4.16 or
newer; or the Ceph iSCSI client test kernel
". You seem to be
ignoring the "Ceph iSCSI client test kernel
" part?

And... if SUSE or even other distribution works already with iSCSI... why
> the documentation just doesn't reccomend these ones instead of RHEL or
> CENTOS?
>
Because that would be odd, to say the least. If the documentation is
incorrect for CentOS then it was, at least at some point, thought to be
correct and it probably will be correct again in the near future and, if
not, we can review and correct it as necessary.

> There is something confused about what the documentation minimal
> requirements, the dashboard suggest to be able to do, and what i read
> around about modded Ceph for other linux distributions.
> I create a new post to clarify all these points.
>
> Thanks for your answer! :)
>
>
>
> Il 27/03/2018 11:24, Brad Hubbard ha scritto:
>
> See the thread in this very ML titled "Ceph iSCSI is a prank?", last
> update thirteen days ago.
>
> If your questions are not answered by that thread let us know.
>
> Please also remember that CentOS is not the only platform that ceph runs
> on by a long shot and that not all distros lag as much as it (not a
> criticism, just a fact. The reasons for lagging are valid and well
> documented and should be accepted by those who choose to use them). if you
> want the bleeding edge then rhel/centos should not be your platform of
> choice.
>
>
> On Tue, Mar 27, 2018 at 7:04 PM, Max Cuttins  wrote:
>
>> Thanks Jason,
>>
>> this is exactly what i read around and I supposed.
>> The RHEL 7.5 is not yet released (neither is Kernel 4.16)
>>
>> So my dubt are 2:
>>
>> *1) If it's not released... why is this in the documentation?*
>> Is the documentation talking about a Dev candidate already accessible
>> somewhere?
>>
>> 2) why in the dashboard is there already a iSCSI board?
>> I guess I miss something or is really just for future implementation
>> and not usable yet?
>> And if it is usable... where I can download the necessarie in order to
>> start?
>>
>>
>> Il 26/03/2018 14:10, Jason Dillaman ha scritto:
>>
>> RHEL 7.5 has not been released yet, but it should be released very
>> soon. After it's released, it usually takes the CentOS team a little
>> time to put together their matching release. I also suspect that Linux
>> kernel 4.16 is going to be released in the next week or so as well.
>>
>> On Sat, Mar 24, 2018 at 7:36 AM, Max Cuttins  
>>  wrote:
>>
>> As stated in the documentation, in order to use iSCSI it's needed use
>> CentOS7.5.
>> 

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Max Cuttins

Hi Brad,

    that post was mine. I knew it quite well.
That Post was about confirm the fact that minimum requirements written 
in the documentation really didn't exists.


However I never asked if there is somewhere a place where is possible to 
download the DEV or the RC of Centos7.5.
I was thinking about to join the community of tester and developers that 
are already testing Ceph on that "/not ready/" environment.


In that POST these questions were not really made, so no answer where given.

I see that you talked also about other distribution. Well, I read around 
that Suse already implement iSCSI.
However as far as I know (which is not so much), this distribution use 
modified kernel in order to let this work.
And in order to use it it's needed  a dashboard that can handle these 
kind of differences (OpenAttic).
I knew already OpenAttic is contributing in developing the next 
generation of the Ceph Dashboard (and this sound damn good!).
However this also means to me that the *official dashboard* should not 
be talking about ISCSI at all (as every implementation of iSCSI are 
running on mod version).


So these are the things I cannot figure out:
Why is the iSCSI board on the CEPH official dashboard? (I could 
understand on OpenAttic which run on SUSE but not on the official one).
And why, in the official documentation, the minimu requirements to let 
iSCSI work, is to install CentOS7.5? Which doesn't exist? Is there a RC 
candidate which I can start to use?
And... if SUSE or even other distribution works already with iSCSI... 
why the documentation just doesn't reccomend these ones instead of RHEL 
or CENTOS?


There is something confused about what the documentation minimal 
requirements, the dashboard suggest to be able to do, and what i read 
around about modded Ceph for other linux distributions.

I create a new post to clarify all these points.

Thanks for your answer! :)



Il 27/03/2018 11:24, Brad Hubbard ha scritto:
See the thread in this very ML titled "Ceph iSCSI is a prank?", last 
update thirteen days ago.


If your questions are not answered by that thread let us know.

Please also remember that CentOS is not the only platform that ceph 
runs on by a long shot and that not all distros lag as much as it (not 
a criticism, just a fact. The reasons for lagging are valid and well 
documented and should be accepted by those who choose to use them). if 
you want the bleeding edge then rhel/centos should not be your 
platform of choice.



On Tue, Mar 27, 2018 at 7:04 PM, Max Cuttins > wrote:


Thanks Jason,

this is exactly what i read around and I supposed.
The RHEL 7.5 is not yet released (neither is Kernel 4.16)

So my dubt are 2:

*1) If it's not released... why is this in the documentation?*
Is the documentation talking about a Dev candidate already
accessible somewhere?

2) why in the dashboard is there already a iSCSI board?
I guess I miss something or is really just for future
implementation and not usable yet?
And if it is usable... where I can download the necessarie in
order to start?


Il 26/03/2018 14:10, Jason Dillaman ha scritto:

RHEL 7.5 has not been released yet, but it should be released very
soon. After it's released, it usually takes the CentOS team a little
time to put together their matching release. I also suspect that Linux
kernel 4.16 is going to be released in the next week or so as well.

On Sat, Mar 24, 2018 at 7:36 AM, Max Cuttins 
  wrote:

As stated in the documentation, in order to use iSCSI it's needed use
CentOS7.5.
Where can I download it?


Thanks


iSCSI Targets

Traditionally, block-level access to a Ceph storage cluster has been limited
to QEMU and librbd, which is a key enabler for adoption within OpenStack
environments. Starting with the Ceph Luminous release, block-level access is
expanding to offer standard iSCSI support allowing wider platform usage, and
potentially opening new use cases.

RHEL/CentOS 7.5; Linux kernel v4.16 or newer; or the Ceph iSCSI client test
kernel
A working Ceph Storage cluster, deployed with ceph-ansible or using the
command-line interface
iSCSI gateways nodes, which can either be colocated with OSD nodes or on
dedicated nodes
Separate network subnets for iSCSI front-end traffic and Ceph back-end
traffic


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Brad Hubbard
On Tue, Mar 27, 2018 at 9:04 PM, Dietmar Rieder
 wrote:
> Thanks Brad!

Hey Dietmar,

yw.

>
> I added some information to the ticket.
> Unfortunately I still could not grab a coredump, since there was no
> segfault lately.

OK. That may help to get us started. Getting late here for me so I'll
take a look at this tomorrow.

Thanks!

>
>  http://tracker.ceph.com/issues/23431
>
> Maybe Oliver has something to add as well.
>
>
> Dietmar
>
>
> On 03/27/2018 11:37 AM, Brad Hubbard wrote:
>> "NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to interpret this."
>>
>> Have you ever wondered what this means and why it's there? :)
>>
>> This is at least something you can try. it may provide useful
>> information, it may not.
>>
>> This stack looks like it is either corrupted, or possibly not in ceph
>> but in one of the linked libraries or glibc itself. If it's the
>> former, it probably won't tell us anything. If it's the latter you
>> will need the relevant debuginfo installed to get meaningful output
>> and note that it will probably take a while. '' in this
>> case is ceph-osd of course.
>>
>> Alternatively, if you can upload a coredump and an sosreport (so I can
>> validate exact versions of all packages installed) I can try and take
>> a look.
>>
>> On Fri, Mar 23, 2018 at 9:20 PM, Dietmar Rieder
>>  wrote:
>>> Hi,
>>>
>>>
>>> I encountered one more two days ago, and I opened a ticket:
>>>
>>> http://tracker.ceph.com/issues/23431
>>>
>>> In our case it is more like 1 every two weeks, for now...
>>> And it is affecting different OSDs on different hosts.
>>>
>>> Dietmar
>>>
>>> On 03/23/2018 11:50 AM, Oliver Freyermuth wrote:
 Hi together,

 I notice exactly the same, also the same addresses, Luminous 12.2.4, 
 CentOS 7.
 Sadly, logs are equally unhelpful.

 It happens randomly on an OSD about once per 2-3 days (of the 196 total 
 OSDs we have). It's also not a container environment.

 Cheers,
   Oliver

 Am 08.03.2018 um 15:00 schrieb Dietmar Rieder:
> Hi,
>
> I noticed in my client (using cephfs) logs that an osd was unexpectedly
> going down.
> While checking the osd logs for the affected OSD I found that the osd
> was seg faulting:
>
> []
> 2018-03-07 06:01:28.873049 7fd9af370700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7fd9af370700 thread_name:safe_timer
>
>   ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b)
> luminous (stable)
>1: (()+0xa3c611) [0x564585904611]
> 2: (()+0xf5e0) [0x7fd9b66305e0]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
> [...]
>
> Should I open a ticket for this? What additional information is needed?
>
>
> I put the relevant log entries for download under [1], so maybe someone
> with more
> experience can find some useful information therein.
>
> Thanks
>   Dietmar
>
>
> [1] https://expirebox.com/download/6473c34c80e8142e22032469a59df555.html
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>>
>>> --
>>> _
>>> D i e t m a r  R i e d e r, Mag.Dr.
>>> Innsbruck Medical University
>>> Biocenter - Division for Bioinformatics
>>> Innrain 80, 6020 Innsbruck
>>> Phone: +43 512 9003 71402
>>> Fax: +43 512 9003 73100
>>> Email: dietmar.rie...@i-med.ac.at
>>> Web:   http://www.icbi.at
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>
>
> --
> _
> D i e t m a r  R i e d e r, Mag.Dr.
> Innsbruck Medical University
> Biocenter - Division for Bioinformatics
> Innrain 80, 6020 Innsbruck
> Phone: +43 512 9003 71402
> Fax: +43 512 9003 73100
> Email: dietmar.rie...@i-med.ac.at
> Web:   http://www.icbi.at
>
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Dietmar Rieder
Thanks Brad!

I added some information to the ticket.
Unfortunately I still could not grab a coredump, since there was no
segfault lately.

 http://tracker.ceph.com/issues/23431

Maybe Oliver has something to add as well.


Dietmar


On 03/27/2018 11:37 AM, Brad Hubbard wrote:
> "NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this."
> 
> Have you ever wondered what this means and why it's there? :)
> 
> This is at least something you can try. it may provide useful
> information, it may not.
> 
> This stack looks like it is either corrupted, or possibly not in ceph
> but in one of the linked libraries or glibc itself. If it's the
> former, it probably won't tell us anything. If it's the latter you
> will need the relevant debuginfo installed to get meaningful output
> and note that it will probably take a while. '' in this
> case is ceph-osd of course.
> 
> Alternatively, if you can upload a coredump and an sosreport (so I can
> validate exact versions of all packages installed) I can try and take
> a look.
> 
> On Fri, Mar 23, 2018 at 9:20 PM, Dietmar Rieder
>  wrote:
>> Hi,
>>
>>
>> I encountered one more two days ago, and I opened a ticket:
>>
>> http://tracker.ceph.com/issues/23431
>>
>> In our case it is more like 1 every two weeks, for now...
>> And it is affecting different OSDs on different hosts.
>>
>> Dietmar
>>
>> On 03/23/2018 11:50 AM, Oliver Freyermuth wrote:
>>> Hi together,
>>>
>>> I notice exactly the same, also the same addresses, Luminous 12.2.4, CentOS 
>>> 7.
>>> Sadly, logs are equally unhelpful.
>>>
>>> It happens randomly on an OSD about once per 2-3 days (of the 196 total 
>>> OSDs we have). It's also not a container environment.
>>>
>>> Cheers,
>>>   Oliver
>>>
>>> Am 08.03.2018 um 15:00 schrieb Dietmar Rieder:
 Hi,

 I noticed in my client (using cephfs) logs that an osd was unexpectedly
 going down.
 While checking the osd logs for the affected OSD I found that the osd
 was seg faulting:

 []
 2018-03-07 06:01:28.873049 7fd9af370700 -1 *** Caught signal
 (Segmentation fault) **
  in thread 7fd9af370700 thread_name:safe_timer

   ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b)
 luminous (stable)
1: (()+0xa3c611) [0x564585904611]
 2: (()+0xf5e0) [0x7fd9b66305e0]
  NOTE: a copy of the executable, or `objdump -rdS ` is
 needed to interpret this.
 [...]

 Should I open a ticket for this? What additional information is needed?


 I put the relevant log entries for download under [1], so maybe someone
 with more
 experience can find some useful information therein.

 Thanks
   Dietmar


 [1] https://expirebox.com/download/6473c34c80e8142e22032469a59df555.html



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> --
>> _
>> D i e t m a r  R i e d e r, Mag.Dr.
>> Innsbruck Medical University
>> Biocenter - Division for Bioinformatics
>> Innrain 80, 6020 Innsbruck
>> Phone: +43 512 9003 71402
>> Fax: +43 512 9003 73100
>> Email: dietmar.rie...@i-med.ac.at
>> Web:   http://www.icbi.at
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 
> 


-- 
_
D i e t m a r  R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Division for Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402
Fax: +43 512 9003 73100
Email: dietmar.rie...@i-med.ac.at
Web:   http://www.icbi.at




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Brad Hubbard
"NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this."

Have you ever wondered what this means and why it's there? :)

This is at least something you can try. it may provide useful
information, it may not.

This stack looks like it is either corrupted, or possibly not in ceph
but in one of the linked libraries or glibc itself. If it's the
former, it probably won't tell us anything. If it's the latter you
will need the relevant debuginfo installed to get meaningful output
and note that it will probably take a while. '' in this
case is ceph-osd of course.

Alternatively, if you can upload a coredump and an sosreport (so I can
validate exact versions of all packages installed) I can try and take
a look.

On Fri, Mar 23, 2018 at 9:20 PM, Dietmar Rieder
 wrote:
> Hi,
>
>
> I encountered one more two days ago, and I opened a ticket:
>
> http://tracker.ceph.com/issues/23431
>
> In our case it is more like 1 every two weeks, for now...
> And it is affecting different OSDs on different hosts.
>
> Dietmar
>
> On 03/23/2018 11:50 AM, Oliver Freyermuth wrote:
>> Hi together,
>>
>> I notice exactly the same, also the same addresses, Luminous 12.2.4, CentOS 
>> 7.
>> Sadly, logs are equally unhelpful.
>>
>> It happens randomly on an OSD about once per 2-3 days (of the 196 total OSDs 
>> we have). It's also not a container environment.
>>
>> Cheers,
>>   Oliver
>>
>> Am 08.03.2018 um 15:00 schrieb Dietmar Rieder:
>>> Hi,
>>>
>>> I noticed in my client (using cephfs) logs that an osd was unexpectedly
>>> going down.
>>> While checking the osd logs for the affected OSD I found that the osd
>>> was seg faulting:
>>>
>>> []
>>> 2018-03-07 06:01:28.873049 7fd9af370700 -1 *** Caught signal
>>> (Segmentation fault) **
>>>  in thread 7fd9af370700 thread_name:safe_timer
>>>
>>>   ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b)
>>> luminous (stable)
>>>1: (()+0xa3c611) [0x564585904611]
>>> 2: (()+0xf5e0) [0x7fd9b66305e0]
>>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>>> needed to interpret this.
>>> [...]
>>>
>>> Should I open a ticket for this? What additional information is needed?
>>>
>>>
>>> I put the relevant log entries for download under [1], so maybe someone
>>> with more
>>> experience can find some useful information therein.
>>>
>>> Thanks
>>>   Dietmar
>>>
>>>
>>> [1] https://expirebox.com/download/6473c34c80e8142e22032469a59df555.html
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> _
> D i e t m a r  R i e d e r, Mag.Dr.
> Innsbruck Medical University
> Biocenter - Division for Bioinformatics
> Innrain 80, 6020 Innsbruck
> Phone: +43 512 9003 71402
> Fax: +43 512 9003 73100
> Email: dietmar.rie...@i-med.ac.at
> Web:   http://www.icbi.at
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard
See the thread in this very ML titled "Ceph iSCSI is a prank?", last update
thirteen days ago.

If your questions are not answered by that thread let us know.

Please also remember that CentOS is not the only platform that ceph runs on
by a long shot and that not all distros lag as much as it (not a criticism,
just a fact. The reasons for lagging are valid and well documented and
should be accepted by those who choose to use them). if you want the
bleeding edge then rhel/centos should not be your platform of choice.


On Tue, Mar 27, 2018 at 7:04 PM, Max Cuttins  wrote:

> Thanks Jason,
>
> this is exactly what i read around and I supposed.
> The RHEL 7.5 is not yet released (neither is Kernel 4.16)
>
> So my dubt are 2:
>
> *1) If it's not released... why is this in the documentation?*
> Is the documentation talking about a Dev candidate already accessible
> somewhere?
>
> 2) why in the dashboard is there already a iSCSI board?
> I guess I miss something or is really just for future implementation
> and not usable yet?
> And if it is usable... where I can download the necessarie in order to
> start?
>
>
> Il 26/03/2018 14:10, Jason Dillaman ha scritto:
>
> RHEL 7.5 has not been released yet, but it should be released very
> soon. After it's released, it usually takes the CentOS team a little
> time to put together their matching release. I also suspect that Linux
> kernel 4.16 is going to be released in the next week or so as well.
>
> On Sat, Mar 24, 2018 at 7:36 AM, Max Cuttins  
>  wrote:
>
> As stated in the documentation, in order to use iSCSI it's needed use
> CentOS7.5.
> Where can I download it?
>
>
> Thanks
>
>
> iSCSI Targets
>
> Traditionally, block-level access to a Ceph storage cluster has been limited
> to QEMU and librbd, which is a key enabler for adoption within OpenStack
> environments. Starting with the Ceph Luminous release, block-level access is
> expanding to offer standard iSCSI support allowing wider platform usage, and
> potentially opening new use cases.
>
> RHEL/CentOS 7.5; Linux kernel v4.16 or newer; or the Ceph iSCSI client test
> kernel
> A working Ceph Storage cluster, deployed with ceph-ansible or using the
> command-line interface
> iSCSI gateways nodes, which can either be colocated with OSD nodes or on
> dedicated nodes
> Separate network subnets for iSCSI front-end traffic and Ceph back-end
> traffic
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Max Cuttins

Thanks Jason,

this is exactly what i read around and I supposed.
The RHEL 7.5 is not yet released (neither is Kernel 4.16)

So my dubt are 2:

*1) If it's not released... why is this in the documentation?*
Is the documentation talking about a Dev candidate already accessible 
somewhere?


2) why in the dashboard is there already a iSCSI board?
I guess I miss something or is really just for future implementation 
and not usable yet?
And if it is usable... where I can download the necessarie in order to 
start?



Il 26/03/2018 14:10, Jason Dillaman ha scritto:

RHEL 7.5 has not been released yet, but it should be released very
soon. After it's released, it usually takes the CentOS team a little
time to put together their matching release. I also suspect that Linux
kernel 4.16 is going to be released in the next week or so as well.

On Sat, Mar 24, 2018 at 7:36 AM, Max Cuttins  wrote:

As stated in the documentation, in order to use iSCSI it's needed use
CentOS7.5.
Where can I download it?


Thanks


iSCSI Targets

Traditionally, block-level access to a Ceph storage cluster has been limited
to QEMU and librbd, which is a key enabler for adoption within OpenStack
environments. Starting with the Ceph Luminous release, block-level access is
expanding to offer standard iSCSI support allowing wider platform usage, and
potentially opening new use cases.

RHEL/CentOS 7.5; Linux kernel v4.16 or newer; or the Ceph iSCSI client test
kernel
A working Ceph Storage cluster, deployed with ceph-ansible or using the
command-line interface
iSCSI gateways nodes, which can either be colocated with OSD nodes or on
dedicated nodes
Separate network subnets for iSCSI front-end traffic and Ceph back-end
traffic


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Requests blocked as cluster is unaware of dead OSDs for quite a long time

2018-03-27 Thread Wido den Hollander


On 03/27/2018 12:58 AM, Jared H wrote:
> I have three datacenters with three storage hosts in each, which house
> one OSD/MON per host. There are three replicas, one in each datacenter.
> I want the cluster to be able to survive a nuke dropped on 1/3
> datacenters, scaling up to 2/5 datacenters. I do not need realtime data
> replication (Ceph is already fast enough), but I do need decently
> realtime fault tolerance such that requests are blocked for ideally less
> than 10 seconds.
> 
> In testing, I kill networking on 3 hosts and the cluster becomes
> unresponsive for 1-5 minutes as requests are blocked. The monitors are
> detected as down within 15-20 seconds, but OSD take a long time to
> change state to 'down'.
> > I have played with these timeout and heartbeat options but they don't
> seem to have any effect:
> [osd]
> osd_heartbeat=3
> osd_heartbeat_grace=9
> osd_mon_heartbeat_interval=3
> osd_mon_report_interval_min=3
> osd_mon_report_interval_max=9
> osd_mon_ack_timeout=9
> 
> Is it the nature of the networking failure? I can pkill ceph-osd to
> simulate a software failure and they are detected as down almost instantly.
> 

when you kill the OSD the other OSDs will get a 'connection refused' and
can declare the OSD down immediately. But when you kill the network
things start to timeout.

It's hard to judge from the outside what exactly happens, but keep in
mind, Ceph is designed with data consistency as the number 1 priority.
It will choose safety of data over availability. So if it's not sure
what is happening I/O will block.

Wido

> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is in the mon leveldb?

2018-03-27 Thread Wido den Hollander


On 03/27/2018 06:40 AM, Tracy Reed wrote:
> Hello all,
> 
> It seems I have underprovisioned storage space for my mons and my
> /var/lib/ceph/mon filesystem is getting full. When I first started using
> ceph this only took up tens of megabytes and I assumed it would stay
> that way and 5G for this filesystem seemed luxurious. Little did I know
> that mon was going to be storing multiple gigs of data! That's still a
> trivial amount of course but larger than what I expected and now I have
> to do some work to rebuild my monitors on bigger storage. 
> 
> I'm curious: Exactly what is being stored and is there any way to trim
> it down a bit? It has slowly grown over time. I've already run a compact
> on it which gained me only a few percent.
> 

The MONs keep a history of OSDMaps and other maps. Normally these maps
are trimmed from the database, but if one or more PGs are not
active+clean the MONs will keep a large history to get old OSDs up to
speed which might be needed to bring that PGs to a clean state again.

What is the status of your Ceph cluster (ceph -s) and what version are
you running?

And yes, make sure your MONs do have a tens of GBs available should they
need it for a very long recovery.

For example, I'm working on a 2200 OSD cluster which has been doing a
recovery operation for a week now and the MON DBs are about 50GB now.

Wido

> Thanks!
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com