Re: [ceph-users] MDS segfaults on client connection -- brand new FS

2019-03-08 Thread Gregory Farnum
I don’t have any idea what’s going on here or why it’s not working, but you
are using v0.94.7. That release is:
1) out of date for the Hammer cycle, which reached at least .94.10
2) prior to the release where we declared CephFS stable (Jewel, v10.2.0)
3) way past its supported expiration date.

You will have a much better time deploying Luminous or Mimic, especially
since you want to use CephFS. :)
-Greg

On Fri, Mar 8, 2019 at 5:02 PM Kadiyska, Yana  wrote:

> Hi,
>
>
>
> I’m very much hoping someone can unblock me on this – we recently ran into
> a very odd issue – I sent an earlier email to the list
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033579.html
>
>
>
> After unsuccessfully trying to repair we decided to forsake the Filesystem
>
>
>
> I marked the cluster down, failed the MDSs, removed the FS and the
> metadata and data pools.
>
>
>
> Then created a new Filesystem from scratch.
>
>
>
> However, I am still observing MDS segfaulting when a client tries to
> connect. This is quite urgent for me as we don’t have a functioning
> Filesystem – if someone can advise how I can remove any and all state
> please do so – I just want to start fresh. I am very puzzled that a brand
> new FS doesn’t work
>
>
>
> Here is the MDS log at level 20 – one odd thing I notice is that the
> client seems to start showing ? as the id well before the segfault…In any
> case, I’m just asking what needs to be done to remove all state from the
> MDS nodes:
>
>
>
> 2019-03-08 19:30:12.024535 7f25ec184700 20 mds.0.server get_session have
> 0x5477e00 client.*2160819875* :0/945029522 state open
>
> 2019-03-08 19:30:12.024537 7f25ec184700 15 mds.0.server
> oldest_client_tid=1
>
> 2019-03-08 19:30:12.024564 7f25ec184700  7 mds.0.cache request_start
> request(client.?:1 cr=0x54a8680)
>
> 2019-03-08 19:30:12.024566 7f25ec184700  7 mds.0.server
> dispatch_client_request client_request(client.?:1 getattr pAsLsXsFs #1
> 2019-03-08 19:29:15.425510 RETRY=2) v2
>
> 2019-03-08 19:30:12.024576 7f25ec184700 10 mds.0.server
> rdlock_path_pin_ref request(client.?:1 cr=0x54a8680) #1
>
> 2019-03-08 19:30:12.024577 7f25ec184700  7 mds.0.cache traverse: opening
> base ino 1 snap head
>
> 2019-03-08 19:30:12.024579 7f25ec184700 10 mds.0.cache path_traverse
> finish on snapid head
>
> 2019-03-08 19:30:12.024580 7f25ec184700 10 mds.0.server ref is [inode 1
> [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) |
> dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024589 7f25ec184700 10 mds.0.locker acquire_locks
> request(client.?:1 cr=0x54a8680)
>
> 2019-03-08 19:30:12.024591 7f25ec184700 20 mds.0.locker  must rdlock
> (iauth sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024594 7f25ec184700 20 mds.0.locker  must rdlock
> (ilink sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024597 7f25ec184700 20 mds.0.locker  must rdlock
> (ifile sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024600 7f25ec184700 20 mds.0.locker  must rdlock
> (ixattr sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024602 7f25ec184700 20 mds.0.locker  must rdlock
> (isnap sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024605 7f25ec184700 10 mds.0.locker  must authpin
> [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1)
> (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024607 7f25ec184700 10 mds.0.locker  auth_pinning
> [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1)
> (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024610 7f25ec184700 10 mds.0.cache.ino(1) auth_pin by
> 0x51e5e00 on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f()
> n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968] now
> 1+0
>
> 2019-03-08 19:30:12.024614 7f25ec184700  7 mds.0.locker rdlock_start  on
> (isnap sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024618 7f25ec184700 10 mds.0.locker  got rdlock on
> (isnap sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1
> dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024621 7f25ec184700  7 mds.0.locker rdlock_start  on
> (ifile sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1
> dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 

[ceph-users] MDS segfaults on client connection -- brand new FS

2019-03-08 Thread Kadiyska, Yana
Hi,

I’m very much hoping someone can unblock me on this – we recently ran into a 
very odd issue – I sent an earlier email to the list
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033579.html

After unsuccessfully trying to repair we decided to forsake the Filesystem

I marked the cluster down, failed the MDSs, removed the FS and the metadata and 
data pools.

Then created a new Filesystem from scratch.

However, I am still observing MDS segfaulting when a client tries to connect. 
This is quite urgent for me as we don’t have a functioning Filesystem – if 
someone can advise how I can remove any and all state please do so – I just 
want to start fresh. I am very puzzled that a brand new FS doesn’t work

Here is the MDS log at level 20 – one odd thing I notice is that the client 
seems to start showing ? as the id well before the segfault…In any case, I’m 
just asking what needs to be done to remove all state from the MDS nodes:


2019-03-08 19:30:12.024535 7f25ec184700 20 mds.0.server get_session have 
0x5477e00 client.2160819875 :0/945029522 state open

2019-03-08 19:30:12.024537 7f25ec184700 15 mds.0.server  oldest_client_tid=1

2019-03-08 19:30:12.024564 7f25ec184700  7 mds.0.cache request_start 
request(client.?:1 cr=0x54a8680)

2019-03-08 19:30:12.024566 7f25ec184700  7 mds.0.server dispatch_client_request 
client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510 
RETRY=2) v2

2019-03-08 19:30:12.024576 7f25ec184700 10 mds.0.server rdlock_path_pin_ref 
request(client.?:1 cr=0x54a8680) #1

2019-03-08 19:30:12.024577 7f25ec184700  7 mds.0.cache traverse: opening base 
ino 1 snap head

2019-03-08 19:30:12.024579 7f25ec184700 10 mds.0.cache path_traverse finish on 
snapid head

2019-03-08 19:30:12.024580 7f25ec184700 10 mds.0.server ref is [inode 1 
[...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | 
dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024589 7f25ec184700 10 mds.0.locker acquire_locks 
request(client.?:1 cr=0x54a8680)

2019-03-08 19:30:12.024591 7f25ec184700 20 mds.0.locker  must rdlock (iauth 
sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) 
(iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024594 7f25ec184700 20 mds.0.locker  must rdlock (ilink 
sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) 
(iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024597 7f25ec184700 20 mds.0.locker  must rdlock (ifile 
sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) 
(iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024600 7f25ec184700 20 mds.0.locker  must rdlock (ixattr 
sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) 
(iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024602 7f25ec184700 20 mds.0.locker  must rdlock (isnap 
sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) 
(iversion lock) | request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024605 7f25ec184700 10 mds.0.locker  must authpin [inode 1 
[...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | 
request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024607 7f25ec184700 10 mds.0.locker  auth_pinning [inode 1 
[...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) | 
request=1 dirfrag=1 0x53ca968]

2019-03-08 19:30:12.024610 7f25ec184700 10 mds.0.cache.ino(1) auth_pin by 
0x51e5e00 on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 
1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968] now 1+0

2019-03-08 19:30:12.024614 7f25ec184700  7 mds.0.locker rdlock_start  on (isnap 
sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 
1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024618 7f25ec184700 10 mds.0.locker  got rdlock on (isnap 
sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 
1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1 dirfrag=1 authpin=1 
0x53ca968]

2019-03-08 19:30:12.024621 7f25ec184700  7 mds.0.locker rdlock_start  on (ifile 
sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 
1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1 dirfrag=1 authpin=1 
0x53ca968]

2019-03-08 19:30:12.024625 7f25ec184700 10 mds.0.locker  got rdlock on (ifile 
sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 
1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=2 
dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024628 7f25ec184700  7 mds.0.locker rdlock_start  on (iauth 
sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f() n(v0 
1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) | request=1 lock=2 
dirfrag=1 authpin=1 0x53ca968]

2019-03-08 19:30:12.024631 7f25ec184700 10 mds.0.locker  got rdlock on (iauth 
sync r=1) [inode 1 

Re: [ceph-users] Failed to repair pg

2019-03-08 Thread Herbert Alexander Faleiros
Hi,

[...]
> Now I have:
>
> HEALTH_ERR 5 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 5 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 2.2bb is active+clean+inconsistent, acting [36,12,80]
>
> Jumped from 3 to 5 scrub errors now.

did the same on osd.36 and a repair worked.

Health OK again.

Thank you,

--
Herbert
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw sync falling behind regularly

2019-03-08 Thread Casey Bodley

(cc ceph-users)

Can you tell whether these sync errors are coming from metadata sync or 
data sync? Are they blocking sync from making progress according to your 
'sync status'?


On 3/8/19 10:23 AM, Trey Palmer wrote:

Casey,

Having done the 'reshard stale-instances delete' earlier on the advice 
of another list member, we have tons of sync errors on deleted 
buckets, as you mention.


After 'data sync init' we're still seeing all of these errors on 
deleted buckets.


Since buckets are metadata, it occurred to me this morning that 
buckets are metadata so a 'sync init' wouldn't refresh that info.  
 But a 'metadata sync init' might get rid of the stale bucket sync 
info and stop the sync errors.   Would that be the way to go?


Thanks,

Trey



On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley > wrote:


Hi Trey,

I think it's more likely that these stale metadata entries are from
deleted buckets, rather than accidental bucket reshards. When a
bucket
is deleted in a multisite configuration, we don't delete its bucket
instance because other zones may still need to sync the object
deletes -
and they can't make progress on sync if the bucket metadata
disappears.
These leftover bucket instances look the same to the 'reshard
stale-instances' commands, but I'd be cautious about using that to
remove them in multisite, as it may cause more sync errors and
potentially leak storage if they still contain objects.

Regarding 'datalog trim', that alone isn't safe because it could trim
entries that hadn't been applied on other zones yet, causing them to
miss some updates. What you can do is run 'data sync init' on each
zone,
and restart gateways. This will restart with a data full sync (which
will scan all buckets for changes), and skip past any datalog entries
from before the full sync. I was concerned that the bug in error
handling (ie "ERROR: init sync on...") would also affect full
sync, but
that doesn't appear to be the case - so I do think that's worth
trying.

On 3/5/19 6:24 PM, Trey Palmer wrote:
> Casey,
>
> Thanks very much for the reply!
>
> We definitely have lots of errors on sync-disabled buckets and the
> workaround for that is obvious (most of them are empty anyway).
>
> Our second form of error is stale buckets.  We had dynamic
resharding
> enabled but have now disabled it (having discovered it was on by
> default, and not supported in multisite).
>
> We removed several hundred stale buckets via 'radosgw-admin
sharding
> stale-instances rm', but they are still giving us sync errors.
>
> I have found that these buckets do have entries in 'radosgw-admin
> datalog list', and my guess is this could be fixed by doing a
> 'radosgw-admin datalog trim' for each entry on the master zone.
>
> Does that sound right?  :-)
>
> Thanks again for the detailed explanation,
>
> Trey Palmer
>
> On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley mailto:cbod...@redhat.com>
> >> wrote:
>
>     Hi Christian,
>
>     I think you've correctly intuited that the issues are related to
>     the use
>     of 'bucket sync disable'. There was a bug fix for that
feature in
> http://tracker.ceph.com/issues/26895, and I recently found that a
>     block
>     of code was missing from its luminous backport. That missing
code is
>     what handled those "ERROR: init sync on 
failed,
>     retcode=-2" errors.
>
>     I included a fix for that in a later backport
>     (https://github.com/ceph/ceph/pull/26549), which I'm still
working to
>     get through qa. I'm afraid I can't really recommend a workaround
>     for the
>     issue in the meantime.
>
>     Looking forward though, we do plan to support something like
s3's
>     cross
>     region replication so you can enable replication on a
specific bucket
>     without having to enable it globally.
>
>     Casey
>
>
>     On 3/5/19 2:32 PM, Christian Rice wrote:
>     >
>     > Much appreciated.  We’ll continue to poke around and
certainly will
>     > disable the dynamic resharding.
>     >
>     > We started with 12.2.8 in production.  We definitely did not
>     have it
>     > enabled in ceph.conf
>     >
>     > *From: *Matthew H mailto:matthew.he...@hotmail.com>
>     >>
>     > *Date: *Tuesday, March 5, 2019 at 11:22 AM
>     > *To: *Christian Rice mailto:cr...@pandora.com>
>     >>,
ceph-users
>     > mailto:ceph-users@lists.ceph.com>


Re: [ceph-users] 13.2.4 odd memory leak?

2019-03-08 Thread Mark Nelson


On 3/8/19 8:12 AM, Steffen Winther Sørensen wrote:



On 8 Mar 2019, at 14.30, Mark Nelson > wrote:



On 3/8/19 5:56 AM, Steffen Winther Sørensen wrote:


On 5 Mar 2019, at 10.02, Paul Emmerich > wrote:


Yeah, there's a bug in 13.2.4. You need to set it to at least ~1.2GB.

Yeap thanks, setting it at 1G+256M worked :)
Hope this won’t bloat memory during coming weekend VM backups 
through CephFS





FWIW, setting it to 1.2G will almost certainly result in the 
bluestore caches being stuck at cache_min, ie 128MB and the autotuner 
may not be able to keep the OSD memory that low.  I typically 
recommend a bare minimum of 2GB per OSD, and on SSD/NVMe backed OSDs 
3-4+ can improve performance significantly.

This a smaller dev cluster, not much IO, 4 nodes of 16GB & 6x HDD OSD

Just want to avoid consuming swap, which bloated after patching to 
13.2.4 from 13.2.2 after performing VM snapshots to CephFS, Otherwise 
cluster has been fine for ages…

/Steffen



Understood.  We struggled with whether we should have separate HDD and 
SSD defaults for osd_memory_target, but we were seeing other users 
having problems with setting the global default vs the ssd/hdd default 
and not seeing expected behavior.  We decided to have a single 
osd_memory_target to try to make the whole thing simpler with only a 
single parameter to set.  The 4GB/OSD is aggressive but can dramatically 
improve performance on NVMe and we figured that it sort of communicates 
to users where we think the sweet spot is (and as devices and data sets 
get larger, this is going to be even more important).



Mark







Mark




On Tue, Mar 5, 2019 at 9:00 AM Steffen Winther Sørensen
mailto:ste...@gmail.com>> wrote:



On 4 Mar 2019, at 16.09, Paul Emmerich > wrote:


Bloated to ~4 GB per OSD and you are on HDDs?

Something like that yes.


13.2.3 backported the cache auto-tuning which targets 4 GB memory
usage by default.


See https://ceph.com/releases/13-2-4-mimic-released/

Right, thanks…


The bluestore_cache_* options are no longer needed. They are replaced
by osd_memory_target, defaulting to 4GB. BlueStore will expand
and contract its cache to attempt to stay within this
limit. Users upgrading should note this is a higher default
than the previous bluestore_cache_size of 1GB, so OSDs using
BlueStore will use more memory by default.
For more details, see the BlueStore docs.

Adding a 'osd memory target’ value to our ceph.conf and restarting 
an OSD just makes the OSD dump like this:


[osd]
  ; this key makes 13.2.4 OSDs abort???
  osd memory target = 1073741824

  ; other OSD key settings
  osd pool default size = 2  # Write an object 2 times.
  osd pool default min size = 1 # Allow writing one copy in a 
degraded state.


  osd pool default pg num = 256
  osd pool default pgp num = 256

  client cache size = 131072
  osd client op priority = 40
  osd op threads = 8
  osd client message size cap = 512
  filestore min sync interval = 10
  filestore max sync interval = 60

  recovery max active = 2
  recovery op priority = 30
  osd max backfills = 2




osd log snippet:
 -472> 2019-03-05 08:36:02.233 7f2743a8c1c0  1 -- - start start
 -471> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 init 
/var/lib/ceph/osd/ceph-12 (looks like hdd)
 -470> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 journal 
/var/lib/ceph/osd/ceph-12/journal
 -469> 2019-03-05 08:36:02.234 7f2743a8c1c0  1 
bluestore(/var/lib/ceph/osd/ceph-12) _mount path 
/var/lib/ceph/osd/ceph-12
 -468> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev create path 
/var/lib/ceph/osd/ceph-12/block type kernel
 -467> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev(0x55b31af4a000 
/var/lib/ceph/osd/ceph-12/block) open path 
/var/lib/ceph/osd/ceph-12/block
 -466> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 bdev(0x55b31af4a000 
/var/lib/ceph/osd/ceph-12/block) open size 146775474176 
(0x222c80, 137 GiB) block_size 4096 (4 KiB) rotational
 -465> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 
bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 
1073741824 meta 0.4 kv 0.4 data 0.2
 -464> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev create path 
/var/lib/ceph/osd/ceph-12/block type kernel
 -463> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
/var/lib/ceph/osd/ceph-12/block) open path 
/var/lib/ceph/osd/ceph-12/block
 -462> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
/var/lib/ceph/osd/ceph-12/block) open size 146775474176 
(0x222c80, 137 GiB) block_size 4096 (4 KiB) rotational
 -461> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs 
add_block_device bdev 1 path /var/lib/ceph/osd/ceph-12/block size 
137 GiB

 -460> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs mount
 -459> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
compaction_readahead_size = 2097152
 -458> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
compression = kNoCompression
 -457> 

Re: [ceph-users] 13.2.4 odd memory leak?

2019-03-08 Thread Steffen Winther Sørensen


> On 8 Mar 2019, at 14.30, Mark Nelson  wrote:
> 
> 
> On 3/8/19 5:56 AM, Steffen Winther Sørensen wrote:
>> 
>>> On 5 Mar 2019, at 10.02, Paul Emmerich >> > wrote:
>>> 
>>> Yeah, there's a bug in 13.2.4. You need to set it to at least ~1.2GB.
>> Yeap thanks, setting it at 1G+256M worked :)
>> Hope this won’t bloat memory during coming weekend VM backups through CephFS
>> 
> 
> 
> FWIW, setting it to 1.2G will almost certainly result in the bluestore caches 
> being stuck at cache_min, ie 128MB and the autotuner may not be able to keep 
> the OSD memory that low.  I typically recommend a bare minimum of 2GB per 
> OSD, and on SSD/NVMe backed OSDs 3-4+ can improve performance significantly.
This a smaller dev cluster, not much IO, 4 nodes of 16GB & 6x HDD OSD 

Just want to avoid consuming swap, which bloated after patching to 13.2.4 from 
13.2.2 after performing VM snapshots to CephFS, Otherwise cluster has been fine 
for ages…
/Steffen


> 
> 
> Mark
> 
> 
> 
>>> On Tue, Mar 5, 2019 at 9:00 AM Steffen Winther Sørensen
>>>  wrote:
 
 
 On 4 Mar 2019, at 16.09, Paul Emmerich  wrote:
 
 Bloated to ~4 GB per OSD and you are on HDDs?
 
 Something like that yes.
 
 
 13.2.3 backported the cache auto-tuning which targets 4 GB memory
 usage by default.
 
 
 See https://ceph.com/releases/13-2-4-mimic-released/
 
 Right, thanks…
 
 
 The bluestore_cache_* options are no longer needed. They are replaced
 by osd_memory_target, defaulting to 4GB. BlueStore will expand
 and contract its cache to attempt to stay within this
 limit. Users upgrading should note this is a higher default
 than the previous bluestore_cache_size of 1GB, so OSDs using
 BlueStore will use more memory by default.
 For more details, see the BlueStore docs.
 
 Adding a 'osd memory target’ value to our ceph.conf and restarting an OSD 
 just makes the OSD dump like this:
 
 [osd]
   ; this key makes 13.2.4 OSDs abort???
   osd memory target = 1073741824
 
   ; other OSD key settings
   osd pool default size = 2  # Write an object 2 times.
   osd pool default min size = 1 # Allow writing one copy in a degraded 
 state.
 
   osd pool default pg num = 256
   osd pool default pgp num = 256
 
   client cache size = 131072
   osd client op priority = 40
   osd op threads = 8
   osd client message size cap = 512
   filestore min sync interval = 10
   filestore max sync interval = 60
 
   recovery max active = 2
   recovery op priority = 30
   osd max backfills = 2
 
 
 
 
 osd log snippet:
  -472> 2019-03-05 08:36:02.233 7f2743a8c1c0  1 -- - start start
  -471> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 init 
 /var/lib/ceph/osd/ceph-12 (looks like hdd)
  -470> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 journal 
 /var/lib/ceph/osd/ceph-12/journal
  -469> 2019-03-05 08:36:02.234 7f2743a8c1c0  1 
 bluestore(/var/lib/ceph/osd/ceph-12) _mount path /var/lib/ceph/osd/ceph-12
  -468> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev create path 
 /var/lib/ceph/osd/ceph-12/block type kernel
  -467> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev(0x55b31af4a000 
 /var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block
  -466> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 bdev(0x55b31af4a000 
 /var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c80, 137 
 GiB) block_size 4096 (4 KiB) rotational
  -465> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 
 bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 
 1073741824 meta 0.4 kv 0.4 data 0.2
  -464> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev create path 
 /var/lib/ceph/osd/ceph-12/block type kernel
  -463> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
 /var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block
  -462> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
 /var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c80, 137 
 GiB) block_size 4096 (4 KiB) rotational
  -461> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs add_block_device 
 bdev 1 path /var/lib/ceph/osd/ceph-12/block size 137 GiB
  -460> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs mount
  -459> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
 compaction_readahead_size = 2097152
  -458> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
 compression = kNoCompression
  -457> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
 max_write_buffer_number = 4
  -456> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
 min_write_buffer_number_to_merge = 1
  -455> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
 

Re: [ceph-users] rbd cache limiting IOPS

2019-03-08 Thread Alexandre DERUMIER
>>(I think I see a PR about this on performance meeting pad some months ago) 

https://github.com/ceph/ceph/pull/25713


- Mail original -
De: "aderumier" 
À: "Engelmann Florian" 
Cc: "ceph-users" 
Envoyé: Vendredi 8 Mars 2019 15:03:23
Objet: Re: [ceph-users] rbd cache limiting IOPS

>>Which options do we have to increase IOPS while writeback cache is used? 

If I remember they are some kind of global lock/mutex with rbd cache, 

and I think they are some work currently to improve it. 

(I think I see a PR about this on performance meeting pad some months ago) 

- Mail original - 
De: "Engelmann Florian"  
À: "ceph-users"  
Envoyé: Jeudi 7 Mars 2019 11:41:41 
Objet: [ceph-users] rbd cache limiting IOPS 

Hi, 

we are running an Openstack environment with Ceph block storage. There 
are six nodes in the current Ceph cluster (12.2.10) with NVMe SSDs and a 
P4800X Optane for rocksdb and WAL. 
The decision was made to use rbd writeback cache with KVM/QEMU. The 
write latency is incredible good (~85 µs) and the read latency is still 
good (~0.6ms). But we are limited to ~23.000 IOPS in a KVM machine. So 
we did the same FIO benchmark after we disabled the rbd cache and got 
65.000 IOPS but of course the write latency (QD1) was increased to ~ 0.6ms. 
We tried to tune: 

rbd cache size -> 256MB 
rbd cache max dirty -> 192MB 
rbd cache target dirty -> 128MB 

but still we are locked at ~23.000 IOPS with enabled writeback cache. 

Right now we are not sure if the tuned settings have been honoured by 
libvirt. 

Which options do we have to increase IOPS while writeback cache is used? 

All the best, 
Florian 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd cache limiting IOPS

2019-03-08 Thread Alexandre DERUMIER
>>Which options do we have to increase IOPS while writeback cache is used?

If I remember they are some kind of global lock/mutex with rbd cache,

and I think they are some work currently to improve it.

(I think I see a PR about this on performance meeting pad some months ago)

- Mail original -
De: "Engelmann Florian" 
À: "ceph-users" 
Envoyé: Jeudi 7 Mars 2019 11:41:41
Objet: [ceph-users] rbd cache limiting IOPS

Hi, 

we are running an Openstack environment with Ceph block storage. There 
are six nodes in the current Ceph cluster (12.2.10) with NVMe SSDs and a 
P4800X Optane for rocksdb and WAL. 
The decision was made to use rbd writeback cache with KVM/QEMU. The 
write latency is incredible good (~85 µs) and the read latency is still 
good (~0.6ms). But we are limited to ~23.000 IOPS in a KVM machine. So 
we did the same FIO benchmark after we disabled the rbd cache and got 
65.000 IOPS but of course the write latency (QD1) was increased to ~ 0.6ms. 
We tried to tune: 

rbd cache size -> 256MB 
rbd cache max dirty -> 192MB 
rbd cache target dirty -> 128MB 

but still we are locked at ~23.000 IOPS with enabled writeback cache. 

Right now we are not sure if the tuned settings have been honoured by 
libvirt. 

Which options do we have to increase IOPS while writeback cache is used? 

All the best, 
Florian 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 13.2.4 odd memory leak?

2019-03-08 Thread Mark Nelson


On 3/8/19 5:56 AM, Steffen Winther Sørensen wrote:



On 5 Mar 2019, at 10.02, Paul Emmerich  wrote:

Yeah, there's a bug in 13.2.4. You need to set it to at least ~1.2GB.

Yeap thanks, setting it at 1G+256M worked :)
Hope this won’t bloat memory during coming weekend VM backups through CephFS

/Steffen



FWIW, setting it to 1.2G will almost certainly result in the bluestore 
caches being stuck at cache_min, ie 128MB and the autotuner may not be 
able to keep the OSD memory that low.  I typically recommend a bare 
minimum of 2GB per OSD, and on SSD/NVMe backed OSDs 3-4+ can improve 
performance significantly.



Mark




On Tue, Mar 5, 2019 at 9:00 AM Steffen Winther Sørensen
 wrote:



On 4 Mar 2019, at 16.09, Paul Emmerich  wrote:

Bloated to ~4 GB per OSD and you are on HDDs?

Something like that yes.


13.2.3 backported the cache auto-tuning which targets 4 GB memory
usage by default.


See https://ceph.com/releases/13-2-4-mimic-released/

Right, thanks…


The bluestore_cache_* options are no longer needed. They are replaced
by osd_memory_target, defaulting to 4GB. BlueStore will expand
and contract its cache to attempt to stay within this
limit. Users upgrading should note this is a higher default
than the previous bluestore_cache_size of 1GB, so OSDs using
BlueStore will use more memory by default.
For more details, see the BlueStore docs.

Adding a 'osd memory target’ value to our ceph.conf and restarting an OSD just 
makes the OSD dump like this:

[osd]
   ; this key makes 13.2.4 OSDs abort???
   osd memory target = 1073741824

   ; other OSD key settings
   osd pool default size = 2  # Write an object 2 times.
   osd pool default min size = 1 # Allow writing one copy in a degraded state.

   osd pool default pg num = 256
   osd pool default pgp num = 256

   client cache size = 131072
   osd client op priority = 40
   osd op threads = 8
   osd client message size cap = 512
   filestore min sync interval = 10
   filestore max sync interval = 60

   recovery max active = 2
   recovery op priority = 30
   osd max backfills = 2




osd log snippet:
  -472> 2019-03-05 08:36:02.233 7f2743a8c1c0  1 -- - start start
  -471> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 init 
/var/lib/ceph/osd/ceph-12 (looks like hdd)
  -470> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 journal 
/var/lib/ceph/osd/ceph-12/journal
  -469> 2019-03-05 08:36:02.234 7f2743a8c1c0  1 
bluestore(/var/lib/ceph/osd/ceph-12) _mount path /var/lib/ceph/osd/ceph-12
  -468> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev create path 
/var/lib/ceph/osd/ceph-12/block type kernel
  -467> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev(0x55b31af4a000 
/var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block
  -466> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 bdev(0x55b31af4a000 
/var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c80, 137 GiB) 
block_size 4096 (4 KiB) rotational
  -465> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 
bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 1073741824 meta 
0.4 kv 0.4 data 0.2
  -464> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev create path 
/var/lib/ceph/osd/ceph-12/block type kernel
  -463> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
/var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block
  -462> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
/var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c80, 137 GiB) 
block_size 4096 (4 KiB) rotational
  -461> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs add_block_device bdev 1 
path /var/lib/ceph/osd/ceph-12/block size 137 GiB
  -460> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs mount
  -459> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
compaction_readahead_size = 2097152
  -458> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option compression 
= kNoCompression
  -457> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
max_write_buffer_number = 4
  -456> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
min_write_buffer_number_to_merge = 1
  -455> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
recycle_log_file_num = 4
  -454> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
writable_file_max_buffer_size = 0
  -453> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
write_buffer_size = 268435456
  -452> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
compaction_readahead_size = 2097152
  -451> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option compression 
= kNoCompression
  -450> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
max_write_buffer_number = 4
  -449> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
min_write_buffer_number_to_merge = 1
  -448> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
recycle_log_file_num = 4
  -447> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 

Re: [ceph-users] garbage in cephfs pool

2019-03-08 Thread Fyodor Ustinov
Hi!

And more:

# rados df
POOL_NAME USED  OBJECTS CLONES   COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED 
   RD_OPS  RD WR_OPS  WR
fsd0 B 11527769278 69166614  0   00 
137451347  61 TiB   46171363  63 TiB
fsdtier240 KiB 2048  0 6144  0   00 
234897999  53 TiB  199347641 8.5 TiB

rados -p fsdtier ls|wc -l
0

fsdtier pool is a tier for fsd pool.

What is it?

- Original Message -
From: "Fyodor Ustinov" 
To: "ceph-users" 
Sent: Thursday, 7 March, 2019 11:57:10
Subject: [ceph-users] garbage in cephfs pool

Hi!

After removing all files from cephfs I see that situation:
#ceph df
POOLS:
NAME   ID USED%USED MAX AVAIL OBJECTS
fsd2  0 B 0   233 TiB 11527762

#rados df
POOL_NAME USED  OBJECTS CLONES   COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED 
   RD_OPS  RD WR_OPS  WR
fsd0 B 11527761270 69166566  0   00 
137451347  61 TiB   46169087  63 TiB

pool contain objects like that:
1bd3d0a.
1af4b02.
12a3b4a.
11a1876.
1bbda52.
1a09fcd.
1b54612.


Where did these objects come from and how to get rid of them?


WBR,
Fyodor.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Failed to repair pg

2019-03-08 Thread Herbert Alexander Faleiros
Hi,

thanks for the answer.

On Thu, Mar 07, 2019 at 07:48:59PM -0800, David Zafman wrote:
> See what results you get from this command.
> 
> # rados list-inconsistent-snapset 2.2bb --format=json-pretty
> 
> You might see this, so nothing interesting.  If you don't get json, then 
> re-run a scrub again.
> 
> {
>      "epoch": ##,
>      "inconsistents": []
> }

# rados list-inconsistent-snapset 2.2bb --format=json-pretty
{
"epoch": 485065,
"inconsistents": [
{
"name": "rbd_data.dfd5e2235befd0.0001c299",
"nspace": "",
"locator": "",
"snap": 326022,
"errors": [
"headless"
]
},
{
"name": "rbd_data.dfd5e2235befd0.0001c299",
"nspace": "",
"locator": "",
"snap": "head",
"snapset": {
"snap_context": {
"seq": 327360,
"snaps": []
},
"head_exists": 1,
"clones": []
},
"errors": [
"extra_clones"
],
"extra clones": [
326022
]
}
]
}

> I don't think you need to do the remove-clone-metadata because you got 
> "unexpected clone" so I think you'd get "Clone 326022 not present"
> 
> I think you need to remove the clone object from osd.12 and osd.80.  For 
> example:
> 
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12/ 
> --journal-path /dev/sdXX --op list rbd_data.dfd5e2235befd0.0001c299
> 
> ["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.0001c299","key":"","snapid":-2,"hash":,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.0001c299","key":"","snapid":326022,"hash":#,"max":0,"pool":2,"namespace":"","max":0}]
> 
> Use the json for snapid 326022 to remove it.
> 
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12/ 
> --journal-path /dev/sdXX 
> '["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.0001c299","key":"","snapid":326022,"hash":#,"max":0,"pool":2,"namespace":"","max":0}]'
>  
> remove

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-80/ --journal-path 
/dev/sda1 --op list rbd_data.dfd5e2235befd0.0001c299 --pgid 2.2bb
["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.0001c299","key":"","snapid":326022,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]
["2.2bb",{"oid":"rbd_data.dfd5e2235befd

I added --pgid 2.2bb because it is taking to long to finish.

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-80/ --journal-path 
/dev/sda1 
'["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.0001c299","key":"","snapid":326022,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]'
 remove
remove #2:dd4a7bd3:::rbd_data.dfd5e2235befd0.0001c299:4f986#

osd.12 was a slight diferent because it is bluestore:

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12/ --op list 
rbd_data.dfd5e2235befd0.0001c299 --pgid 2.2bb
["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.0001c299","key":"","snapid":326022,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]
["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.0001c299","key":"","snapid":-2,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12/ 
'["2.2bb",{"oid":"rbd_data.dfd5e2235befd0.0001c299","key":"","snapid":326022,"hash":3420345019,"max":0,"pool":2,"namespace":"","max":0}]'
 remove
remove #2:dd4a7bd3:::rbd_data.dfd5e2235befd0.0001c299:4f986#

But nothing changed, so I tried to repair the pg again and from osd.36
I got now:

2019-03-08 09:09:11.786038 7f920c40d700 -1 log_channel(cluster) log [ERR] : 
2.2bb shard 36 soid 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.0001c299:4f986 
: candidate size 0 info size 4194304 mismatch
2019-03-08 09:09:11.786041 7f920c40d700 -1 log_channel(cluster) log [ERR] : 
2.2bb soid 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.0001c299:4f986 : failed 
to pick suitable object info
2019-03-08 09:09:11.786182 7f920c40d700 -1 log_channel(cluster) log [ERR] : 
repair 2.2bb 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.0001c299:4f986 : on 
disk size (0) does not match object info size (4194304) adjusted for ondisk to 
(4194304)
2019-03-08 09:09:11.786191 7f920c40d700 -1 log_channel(cluster) log [ERR] : 
repair 2.2bb 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.0001c299:4f986 : is 
an unexpected clone
2019-03-08 09:09:11.786213 7f920c40d700 -1 osd.36 pg_epoch: 485254 pg[2.2bb( v 
485253'15080921 (485236'15079373,485253'15080921] local-lis/les=485251/485252 
n=3836 ec=38/38 lis/c 485251/485251 les/c/f 485252/485252/0 
485251/485251/484996) [36,12,80] r=0 lpr=485251 crt=485253'15080921 lcod 
485252'15080920 mlcod 485252'15080920 
active+clean+scrubbing+deep+inconsistent+repair snaptrimq=[5022c~1,50230~1]] 

Re: [ceph-users] 13.2.4 odd memory leak?

2019-03-08 Thread Steffen Winther Sørensen


> On 5 Mar 2019, at 10.02, Paul Emmerich  wrote:
> 
> Yeah, there's a bug in 13.2.4. You need to set it to at least ~1.2GB.
Yeap thanks, setting it at 1G+256M worked :)
Hope this won’t bloat memory during coming weekend VM backups through CephFS

/Steffen

> 
> On Tue, Mar 5, 2019 at 9:00 AM Steffen Winther Sørensen
>  wrote:
>> 
>> 
>> 
>> On 4 Mar 2019, at 16.09, Paul Emmerich  wrote:
>> 
>> Bloated to ~4 GB per OSD and you are on HDDs?
>> 
>> Something like that yes.
>> 
>> 
>> 13.2.3 backported the cache auto-tuning which targets 4 GB memory
>> usage by default.
>> 
>> 
>> See https://ceph.com/releases/13-2-4-mimic-released/
>> 
>> Right, thanks…
>> 
>> 
>> The bluestore_cache_* options are no longer needed. They are replaced
>> by osd_memory_target, defaulting to 4GB. BlueStore will expand
>> and contract its cache to attempt to stay within this
>> limit. Users upgrading should note this is a higher default
>> than the previous bluestore_cache_size of 1GB, so OSDs using
>> BlueStore will use more memory by default.
>> For more details, see the BlueStore docs.
>> 
>> Adding a 'osd memory target’ value to our ceph.conf and restarting an OSD 
>> just makes the OSD dump like this:
>> 
>> [osd]
>>   ; this key makes 13.2.4 OSDs abort???
>>   osd memory target = 1073741824
>> 
>>   ; other OSD key settings
>>   osd pool default size = 2  # Write an object 2 times.
>>   osd pool default min size = 1 # Allow writing one copy in a degraded state.
>> 
>>   osd pool default pg num = 256
>>   osd pool default pgp num = 256
>> 
>>   client cache size = 131072
>>   osd client op priority = 40
>>   osd op threads = 8
>>   osd client message size cap = 512
>>   filestore min sync interval = 10
>>   filestore max sync interval = 60
>> 
>>   recovery max active = 2
>>   recovery op priority = 30
>>   osd max backfills = 2
>> 
>> 
>> 
>> 
>> osd log snippet:
>>  -472> 2019-03-05 08:36:02.233 7f2743a8c1c0  1 -- - start start
>>  -471> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 init 
>> /var/lib/ceph/osd/ceph-12 (looks like hdd)
>>  -470> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 journal 
>> /var/lib/ceph/osd/ceph-12/journal
>>  -469> 2019-03-05 08:36:02.234 7f2743a8c1c0  1 
>> bluestore(/var/lib/ceph/osd/ceph-12) _mount path /var/lib/ceph/osd/ceph-12
>>  -468> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev create path 
>> /var/lib/ceph/osd/ceph-12/block type kernel
>>  -467> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev(0x55b31af4a000 
>> /var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block
>>  -466> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 bdev(0x55b31af4a000 
>> /var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c80, 137 
>> GiB) block_size 4096 (4 KiB) rotational
>>  -465> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 
>> bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 1073741824 
>> meta 0.4 kv 0.4 data 0.2
>>  -464> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev create path 
>> /var/lib/ceph/osd/ceph-12/block type kernel
>>  -463> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
>> /var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block
>>  -462> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
>> /var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c80, 137 
>> GiB) block_size 4096 (4 KiB) rotational
>>  -461> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs add_block_device bdev 
>> 1 path /var/lib/ceph/osd/ceph-12/block size 137 GiB
>>  -460> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs mount
>>  -459> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>> compaction_readahead_size = 2097152
>>  -458> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>> compression = kNoCompression
>>  -457> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>> max_write_buffer_number = 4
>>  -456> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>> min_write_buffer_number_to_merge = 1
>>  -455> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>> recycle_log_file_num = 4
>>  -454> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>> writable_file_max_buffer_size = 0
>>  -453> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>> write_buffer_size = 268435456
>>  -452> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>> compaction_readahead_size = 2097152
>>  -451> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>> compression = kNoCompression
>>  -450> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>> max_write_buffer_number = 4
>>  -449> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>> min_write_buffer_number_to_merge = 1
>>  -448> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>> recycle_log_file_num = 4
>>  -447> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>> writable_file_max_buffer_size = 0
>>  -446> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb