Re: [ceph-users] ceph-mgr fails to restart after upgrade to mimic

2019-01-07 Thread Randall Smith
More follow up because, obviously, this is a weird problem. I was able to
start up a luminous mgr and successfully join my 13.2.4 cluster. I still
can't get a 13.2.4 mgr to join. I still get the same error I've had before.
(See previously in the thread.)

It definitely seems like something is screwy with the mimic mgr.

On Mon, Jan 7, 2019 at 9:57 AM Randall Smith  wrote:

> I upgraded to 13.2.4 and, unsurprisingly, it did not solve the problem.
> ceph-mgr still fails. What else do I need to look at to try to solve this?
>
> Thanks.
>
> On Fri, Jan 4, 2019 at 3:20 PM Randall Smith  wrote:
>
>> Some more info that may or may not matter. :-) First off, I am running
>> 13.2.3 on Ubuntu Xenial (ceph version 13.2.3
>> (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic (stable)).
>>
>> Next, when I try running ceph-mgr with --no-mon-config, the app core
>> dumps.
>>
>>  0> 2019-01-04 14:56:56.416 7fbcc71db380 -1
>> /build/ceph-13.2.3/src/common/Timer.cc: In function 'virtual
>> SafeTimer::~SafeTimer()' thread 7fbcc71db380 time 2019-01-04 14:56:56.419012
>> /build/ceph-13.2.3/src/common/Timer.cc: 50: FAILED assert(thread ==
>> __null)
>>
>>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
>> (stable)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x7fbcbe5093c2]
>>  2: (()+0x2e5587) [0x7fbcbe509587]
>>  3: (()+0x2e12de) [0x7fbcbe5052de]
>>  4: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>>  5: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>>  6: (main()+0x24b) [0x49446b]
>>
>>  7: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>>  8: (_start()+0x29) [0x497dc9]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
>> to interpret this.
>>
>> --- logging levels ---
>>
>>0/ 5 none
>>0/ 1 lockdep
>>0/ 1 context
>>1/ 1 crush
>>1/ 5 mds
>>1/ 5 mds_balancer
>>
>>
>>1/ 5 mds_locker
>>
>>1/ 5 mds_log
>>
>>1/ 5 mds_log_expire
>>1/ 5 mds_migrator
>>0/ 1 buffer
>>0/ 1 timer
>>0/ 1 filer
>>0/ 1 striper
>>0/ 1 objecter
>>0/ 5 rados
>>0/ 5 rbd
>>0/ 5 rbd_mirror
>>0/ 5 rbd_replay
>>0/ 5 journaler
>>0/ 5 objectcacher
>>0/ 5 client
>>1/ 5 osd
>>0/ 5 optracker
>>0/ 5 objclass
>>1/ 3 filestore
>>1/ 3 journal
>>   10/10 ms
>>1/ 5 mon
>>0/10 monc
>>1/ 5 paxos
>>0/ 5 tp
>>1/ 5 auth
>>1/ 5 crypto
>>1/ 1 finisher
>>1/ 1 reserver
>>1/ 5 heartbeatmap
>>1/ 5 perfcounter
>>1/ 5 rgw
>>1/ 5 rgw_sync
>>1/10 civetweb
>>1/ 5 javaclient
>>1/ 5 asok
>>1/ 1 throttle
>>0/ 0 refs
>>1/ 5 xio
>>1/ 5 compressor
>>1/ 5 bluestore
>>1/ 5 bluefs
>>1/ 3 bdev
>>1/ 5 kstore
>>4/ 5 rocksdb
>>4/ 5 leveldb
>>4/ 5 memdb
>>1/ 5 kinetic
>>1/ 5 fuse
>>1/ 5 mgr
>>1/ 5 mgrc
>>1/ 5 dpdk
>>1/ 5 eventtrace
>>   -2/-2 (syslog threshold)
>>   99/99 (stderr threshold)
>>   max_recent 1
>>   max_new 1000
>>   log_file
>> --- end dump of recent events ---
>> *** Caught signal (Aborted) **
>>  in thread 7fbcc71db380 thread_name:ceph-mgr
>>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
>> (stable)
>>  1: /usr/bin/ceph-mgr() [0x63ebd0]
>>  2: (()+0x11390) [0x7fbcbd819390]
>>  3: (gsignal()+0x38) [0x7fbcbcf66428]
>>  4: (abort()+0x16a) [0x7fbcbcf6802a]
>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x250) [0x7fbcbe509510]
>>  6: (()+0x2e5587) [0x7fbcbe509587]
>>  7: (()+0x2e12de) [0x7fbcbe5052de]
>>  8: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>>  9: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>>  10: (main()+0x24b) [0x49446b]
>>  11: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>>  12: (_start()+0x29) [0x497dc9]
>> 2019-01-04 14:56:56.420 7fbcc71db380 -1 *** Caught signal (Aborted) **
>>  in thread 7fbcc71db380 thread_name:ceph-mgr
>>
>>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
>> (stable)
>>  1: /usr/bin/ceph-mgr() [0x63ebd0]
>>  2: (()+0x11390) [0x7fbcbd819390]
>>  3: (gsignal()+0x38) [0x7fbcbcf66428]
>>  4: (abort()+0x16a) [0x7fbcbcf6802a]
>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x250) [0x7fbcbe509510]
>>  6: (()+0x2e5587) [0x7fbcbe509587]
>>  7: (()+0x2e12de) [0x7fbcbe5052de]
>>  8: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>>  9: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>>  10: (main()+0x24b) [0x49446b]
>>  11: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>>  12: (_start()+0x29) [0x497dc9]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
>> to interpret this.
>>
>>
>> On Fri, Jan 4, 2019 at 1:53 PM Randall Smith  wrote:
>>
>>> I think this is the relevant section of the debug log. There's no
>>> AUTH_NONE error which would make things easy. You can see the same "Invalid
>>> argument" error that I'm seeing in the mgr debug output. The malformed
>>> request feels like a compatibility or protocol communication issue.

Re: [ceph-users] Is it possible to increase Ceph Mon store?

2019-01-07 Thread Pardhiv Karri
Thank you Bryan, for the information. We have 816 OSDs of size 2TB each.
The mon store too big popped up when no rebalancing happened in that month.
It is slightly above the 15360 threshold around 15900 or 16100 and stayed
there for more than a week. We ran the "ceph tell mon.[ID] compact" to get
it back earlier this week. Currently the mon store is around 12G on each
monitor. If it doesn't grow then I won't change the value but if it grows
and gives the warning then I will increase it using "mon_data_size_warn".

Thanks,
Pardhiv Karri



On Mon, Jan 7, 2019 at 1:55 PM Bryan Stillwell 
wrote:

> I believe the option you're looking for is mon_data_size_warn.  The
> default is set to 16106127360.
>
>
>
> I've found that sometimes the mons need a little help getting started with
> trimming if you just completed a large expansion.  Earlier today I had a
> cluster where the mon's data directory was over 40GB on all the mons.  When
> I restarted them one at a time with 'mon_compact_on_start = true' set in
> the '[mon]' section of ceph.conf, they stayed around 40GB in size.
> However, when I was about to hit send on an email to the list about this
> very topic, the warning cleared up and now the data directory is now
> between 1-3GB on each of the mons.  This was on a cluster with >1900 OSDs.
>
>
>
> Bryan
>
>
>
> *From: *ceph-users  on behalf of
> Pardhiv Karri 
> *Date: *Monday, January 7, 2019 at 11:08 AM
> *To: *ceph-users 
> *Subject: *[ceph-users] Is it possible to increase Ceph Mon store?
>
>
>
> Hi,
>
>
>
> We have a large Ceph cluster (Hammer version). We recently saw its mon
> store growing too big > 15GB on all 3 monitors without any rebalancing
> happening for quiet sometime. We have compacted the DB using  "#ceph tell
> mon.[ID] compact" for now. But is there a way to increase the size of the
> mon store to 32GB or something to avoid getting the Ceph health to warning
> state due to Mon store growing too big?
>
>
>
> --
>
> Thanks,
>
> *P**ardhiv **K**arri*
>
>
>
>


-- 
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is it possible to increase Ceph Mon store?

2019-01-07 Thread Bryan Stillwell
I believe the option you're looking for is mon_data_size_warn.  The default is 
set to 16106127360.

I've found that sometimes the mons need a little help getting started with 
trimming if you just completed a large expansion.  Earlier today I had a 
cluster where the mon's data directory was over 40GB on all the mons.  When I 
restarted them one at a time with 'mon_compact_on_start = true' set in the 
'[mon]' section of ceph.conf, they stayed around 40GB in size.   However, when 
I was about to hit send on an email to the list about this very topic, the 
warning cleared up and now the data directory is now between 1-3GB on each of 
the mons.  This was on a cluster with >1900 OSDs.

Bryan

From: ceph-users  on behalf of Pardhiv Karri 

Date: Monday, January 7, 2019 at 11:08 AM
To: ceph-users 
Subject: [ceph-users] Is it possible to increase Ceph Mon store?

Hi,

We have a large Ceph cluster (Hammer version). We recently saw its mon store 
growing too big > 15GB on all 3 monitors without any rebalancing happening for 
quiet sometime. We have compacted the DB using  "#ceph tell mon.[ID] compact" 
for now. But is there a way to increase the size of the mon store to 32GB or 
something to avoid getting the Ceph health to warning state due to Mon store 
growing too big?

--
Thanks,
Pardhiv Karri



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osdmaps not being cleaned up in 12.2.8

2019-01-07 Thread Bryan Stillwell
I have a cluster with over 1900 OSDs running Luminous (12.2.8) that isn't 
cleaning up old osdmaps after doing an expansion.  This is even after the 
cluster became 100% active+clean:

# find /var/lib/ceph/osd/ceph-1754/current/meta -name 'osdmap*' | wc -l
46181

With the osdmaps being over 600KB in size this adds up:

# du -sh /var/lib/ceph/osd/ceph-1754/current/meta
31G /var/lib/ceph/osd/ceph-1754/current/meta

I remember running into this during the hammer days:

http://tracker.ceph.com/issues/13990

Did something change recently that may have broken this fix?

Thanks,
Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions re mon_osd_cache_size increase

2019-01-07 Thread Anthony D'Atri
Thanks, Greg.  This is as I suspected. Ceph is full of subtleties and I wanted 
to be sure.

-- aad


> 
> The osd_map_cache_size controls the OSD’s cache of maps; the change in 13.2.3 
> is to the default for the monitors’.
> On Mon, Jan 7, 2019 at 8:24 AM Anthony D'Atri  > wrote:
> 
> 
> > * The default memory utilization for the mons has been increased
> >  somewhat.  Rocksdb now uses 512 MB of RAM by default, which should
> >  be sufficient for small to medium-sized clusters; large clusters
> >  should tune this up.  Also, the `mon_osd_cache_size` has been
> >  increase from 10 OSDMaps to 500, which will translate to an
> >  additional 500 MB to 1 GB of RAM for large clusters, and much less
> >  for small clusters.
> 
> 
> Just I don't perseverate on this:   mon_osd_cache_size is a [mon] setting for 
> ceph-mon only?  Does it relate to osd_map_cache_size?  ISTR that in the past 
> the latter defaulted to 500; I had seen a presentation (I think from Dan) at 
> an OpenStack Summit advising its decrease and it defaults to 50 now.  
> 
> I like to be very clear about where additional memory is needed, especially 
> for dense systems.
> 
> -- Anthony
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions re mon_osd_cache_size increase

2019-01-07 Thread Gregory Farnum
The osd_map_cache_size controls the OSD’s cache of maps; the change in
13.2.3 is to the default for the monitors’.
On Mon, Jan 7, 2019 at 8:24 AM Anthony D'Atri  wrote:

>
>
> > * The default memory utilization for the mons has been increased
> >  somewhat.  Rocksdb now uses 512 MB of RAM by default, which should
> >  be sufficient for small to medium-sized clusters; large clusters
> >  should tune this up.  Also, the `mon_osd_cache_size` has been
> >  increase from 10 OSDMaps to 500, which will translate to an
> >  additional 500 MB to 1 GB of RAM for large clusters, and much less
> >  for small clusters.
>
>
> Just I don't perseverate on this:   mon_osd_cache_size is a [mon] setting
> for ceph-mon only?  Does it relate to osd_map_cache_size?  ISTR that in the
> past the latter defaulted to 500; I had seen a presentation (I think from
> Dan) at an OpenStack Summit advising its decrease and it defaults to 50
> now.
>
> I like to be very clear about where additional memory is needed,
> especially for dense systems.
>
> -- Anthony
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw/s3: performance of range requests

2019-01-07 Thread Casey Bodley


On 1/7/19 3:15 PM, Giovani Rinaldi wrote:

Hello!

I've been wondering if range requests are more efficient than doing 
"whole" requests for relatively large objects (100MB-1GB).
More precisely, my doubt is regarding the use of OSD/RGW resources, 
that is, does the entire object is retrieved from the OSD only to be 
sliced afterwards? Or only the requested portion is read/sent from the 
OSD to the RGW?


The reason is that, in my scenario, the entire object may be requested 
to ceph eventually, either via multiple range requests or a single 
request.
But, from my application point of view, it would be more efficient to 
retrieve such object partially as needed, although only if such range 
requests do not end up using more resources than necessary from my 
ceph cluster (such as retrieving the whole object for each range request).


I've searched the online documentation, as well as the mailing list, 
but failed to find any indicative of how range requests are processed 
by ceph.


Thanks in advance.
Giovani.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Giovani,

RGW will only fetch the minimum amount of data from rados needed to 
satisfy the range request.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw/s3: performance of range requests

2019-01-07 Thread Giovani Rinaldi
Hello!

I've been wondering if range requests are more efficient than doing "whole"
requests for relatively large objects (100MB-1GB).
More precisely, my doubt is regarding the use of OSD/RGW resources, that
is, does the entire object is retrieved from the OSD only to be sliced
afterwards? Or only the requested portion is read/sent from the OSD to the
RGW?

The reason is that, in my scenario, the entire object may be requested to
ceph eventually, either via multiple range requests or a single request.
But, from my application point of view, it would be more efficient to
retrieve such object partially as needed, although only if such range
requests do not end up using more resources than necessary from my ceph
cluster (such as retrieving the whole object for each range request).

I've searched the online documentation, as well as the mailing list, but
failed to find any indicative of how range requests are processed by ceph.

Thanks in advance.
Giovani.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS MDS optimal setup on Google Cloud

2019-01-07 Thread Patrick Donnelly
Hello Mahmoud,

On Fri, Dec 21, 2018 at 7:44 AM Mahmoud Ismail
 wrote:
> I'm doing benchmarks for metadata operations on CephFS, HDFS, and HopsFS on 
> Google Cloud. In my current setup, i'm using 32 vCPU machines with 29 GB 
> memory, and i have 1 MDS, 1 MON and 3 OSDs. The MDS and the MON nodes are 
> co-located on one vm, while each of the OSDs is on a separate vm with 1 SSD 
> disk attached. I'm using the default configuration for MDS, and OSDs.
>
> I'm running 300 clients on 10 machines (16 vCPU), each client creates a 
> CephFileSystem using the CephFS hadoop plugin, and then writes empty files 
> for 30 seconds followed by reading the empty files for another 30 seconds. 
> The aggregated throughput is around 2000 file create opertions/sec and 1 
> file read operations/sec. However, the MDS is not fully utilizing the 32 
> cores on the machine, is there any configuration that i should consider to 
> fully utilize the machine?.

The MDS is not yet very parallel; it can only utilize about 2.5 cores
in the best circumstances. Make sure you allocate plenty of RAM for
the MDS. 16GB or 32GB would be a good choice. See (and disregard the
warning on that page):
http://docs.ceph.com/docs/mimic/cephfs/cache-size-limits/

You may also try using multiple active metadata servers to increase
throughput. See: http://docs.ceph.com/docs/mimic/cephfs/multimds/

> Also, i noticed that running more than 20-30 clients (on different threads) 
> per machine degrade the aggregated throughput for read, is there a limitation 
> on CephFileSystem and libceph on the number of clients created per machine?

No. Can't give you any hints without more information about the test
setup. We also have not tested with the Hadoop plugin in years. There
may be limitations we're not presently aware of.

> Another issue,  Are the MDS operations single threaded as pointed here 
> "https://www.slideshare.net/XiaoxiChen3/cephfs-jewel-mds-performance-benchmark;?

Yes, this is still the case.

> Regarding the MDS global lock, is it it a single lock per MDS or is it a 
> global distributed lock for all MDSs?

per-MDS


-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is it possible to increase Ceph Mon store?

2019-01-07 Thread Pardhiv Karri
Hi,

We have a large Ceph cluster (Hammer version). We recently saw its mon
store growing too big > 15GB on all 3 monitors without any rebalancing
happening for quiet sometime. We have compacted the DB using  "#ceph tell
mon.[ID] compact" for now. But is there a way to increase the size of the
mon store to 32GB or something to avoid getting the Ceph health to warning
state due to Mon store growing too big?

-- 
Thanks,
*Pardhiv Karri*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mgr fails to restart after upgrade to mimic

2019-01-07 Thread Randall Smith
I upgraded to 13.2.4 and, unsurprisingly, it did not solve the problem.
ceph-mgr still fails. What else do I need to look at to try to solve this?

Thanks.

On Fri, Jan 4, 2019 at 3:20 PM Randall Smith  wrote:

> Some more info that may or may not matter. :-) First off, I am running
> 13.2.3 on Ubuntu Xenial (ceph version 13.2.3
> (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic (stable)).
>
> Next, when I try running ceph-mgr with --no-mon-config, the app core dumps.
>
>  0> 2019-01-04 14:56:56.416 7fbcc71db380 -1
> /build/ceph-13.2.3/src/common/Timer.cc: In function 'virtual
> SafeTimer::~SafeTimer()' thread 7fbcc71db380 time 2019-01-04 14:56:56.419012
> /build/ceph-13.2.3/src/common/Timer.cc: 50: FAILED assert(thread == __null)
>
>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x7fbcbe5093c2]
>  2: (()+0x2e5587) [0x7fbcbe509587]
>  3: (()+0x2e12de) [0x7fbcbe5052de]
>  4: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>  5: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>  6: (main()+0x24b) [0x49446b]
>
>  7: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>  8: (_start()+0x29) [0x497dc9]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> --- logging levels ---
>
>0/ 5 none
>0/ 1 lockdep
>0/ 1 context
>1/ 1 crush
>1/ 5 mds
>1/ 5 mds_balancer
>
>
>1/ 5 mds_locker
>
>1/ 5 mds_log
>
>1/ 5 mds_log_expire
>1/ 5 mds_migrator
>0/ 1 buffer
>0/ 1 timer
>0/ 1 filer
>0/ 1 striper
>0/ 1 objecter
>0/ 5 rados
>0/ 5 rbd
>0/ 5 rbd_mirror
>0/ 5 rbd_replay
>0/ 5 journaler
>0/ 5 objectcacher
>0/ 5 client
>1/ 5 osd
>0/ 5 optracker
>0/ 5 objclass
>1/ 3 filestore
>1/ 3 journal
>   10/10 ms
>1/ 5 mon
>0/10 monc
>1/ 5 paxos
>0/ 5 tp
>1/ 5 auth
>1/ 5 crypto
>1/ 1 finisher
>1/ 1 reserver
>1/ 5 heartbeatmap
>1/ 5 perfcounter
>1/ 5 rgw
>1/ 5 rgw_sync
>1/10 civetweb
>1/ 5 javaclient
>1/ 5 asok
>1/ 1 throttle
>0/ 0 refs
>1/ 5 xio
>1/ 5 compressor
>1/ 5 bluestore
>1/ 5 bluefs
>1/ 3 bdev
>1/ 5 kstore
>4/ 5 rocksdb
>4/ 5 leveldb
>4/ 5 memdb
>1/ 5 kinetic
>1/ 5 fuse
>1/ 5 mgr
>1/ 5 mgrc
>1/ 5 dpdk
>1/ 5 eventtrace
>   -2/-2 (syslog threshold)
>   99/99 (stderr threshold)
>   max_recent 1
>   max_new 1000
>   log_file
> --- end dump of recent events ---
> *** Caught signal (Aborted) **
>  in thread 7fbcc71db380 thread_name:ceph-mgr
>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
> (stable)
>  1: /usr/bin/ceph-mgr() [0x63ebd0]
>  2: (()+0x11390) [0x7fbcbd819390]
>  3: (gsignal()+0x38) [0x7fbcbcf66428]
>  4: (abort()+0x16a) [0x7fbcbcf6802a]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x250) [0x7fbcbe509510]
>  6: (()+0x2e5587) [0x7fbcbe509587]
>  7: (()+0x2e12de) [0x7fbcbe5052de]
>  8: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>  9: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>  10: (main()+0x24b) [0x49446b]
>  11: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>  12: (_start()+0x29) [0x497dc9]
> 2019-01-04 14:56:56.420 7fbcc71db380 -1 *** Caught signal (Aborted) **
>  in thread 7fbcc71db380 thread_name:ceph-mgr
>
>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
> (stable)
>  1: /usr/bin/ceph-mgr() [0x63ebd0]
>  2: (()+0x11390) [0x7fbcbd819390]
>  3: (gsignal()+0x38) [0x7fbcbcf66428]
>  4: (abort()+0x16a) [0x7fbcbcf6802a]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x250) [0x7fbcbe509510]
>  6: (()+0x2e5587) [0x7fbcbe509587]
>  7: (()+0x2e12de) [0x7fbcbe5052de]
>  8: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>  9: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>  10: (main()+0x24b) [0x49446b]
>  11: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>  12: (_start()+0x29) [0x497dc9]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
>
> On Fri, Jan 4, 2019 at 1:53 PM Randall Smith  wrote:
>
>> I think this is the relevant section of the debug log. There's no
>> AUTH_NONE error which would make things easy. You can see the same "Invalid
>> argument" error that I'm seeing in the mgr debug output. The malformed
>> request feels like a compatibility or protocol communication issue.
>>
>> 2019-01-04 13:41:58.972 7f88950f5700 10 mon.07@1(peon) e27
>> ms_verify_authorizer 192.168.253.148:0/3301807723 client protocol 0
>>
>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27 _ms_dispatch
>> new session 0x40a58c0 MonSession(client.? 192.168.253.148:0/3301807723
>> is open , features 0x3ffddff8ffa4fffb (luminous)) fea$ures
>> 0x3ffddff8ffa4fffb
>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697
>> preprocess_query auth(proto 0 26 bytes epoch 0) v1 from client.?
>> 

[ceph-users] Questions re mon_osd_cache_size increase

2019-01-07 Thread Anthony D'Atri



> * The default memory utilization for the mons has been increased
>  somewhat.  Rocksdb now uses 512 MB of RAM by default, which should
>  be sufficient for small to medium-sized clusters; large clusters
>  should tune this up.  Also, the `mon_osd_cache_size` has been
>  increase from 10 OSDMaps to 500, which will translate to an
>  additional 500 MB to 1 GB of RAM for large clusters, and much less
>  for small clusters.


Just I don't perseverate on this:   mon_osd_cache_size is a [mon] setting for 
ceph-mon only?  Does it relate to osd_map_cache_size?  ISTR that in the past 
the latter defaulted to 500; I had seen a presentation (I think from Dan) at an 
OpenStack Summit advising its decrease and it defaults to 50 now.  

I like to be very clear about where additional memory is needed, especially for 
dense systems.

-- Anthony

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v13.2.4 Mimic released

2019-01-07 Thread Alexandre DERUMIER
Hi,

>>* Ceph v13.2.2 includes a wrong backport, which may cause mds to go into 
>>'damaged' state when upgrading Ceph cluster from previous version. 
>>The bug is fixed in v13.2.3. If you are already running v13.2.2, 
>>upgrading to v13.2.3 does not require special action. 

Any special action for upgrading from 13.2.1 ?



- Mail original -
De: "Abhishek Lekshmanan" 
À: "ceph-announce" , "ceph-users" 
, ceph-maintain...@lists.ceph.com, "ceph-devel" 

Envoyé: Lundi 7 Janvier 2019 11:37:05
Objet: v13.2.4 Mimic released

This is the fourth bugfix release of the Mimic v13.2.x long term stable 
release series. This release includes two security fixes atop of v13.2.3 
We recommend all users upgrade to this version. If you've already 
upgraded to v13.2.3, the same restrictions from v13.2.2->v13.2.3 apply 
here as well. 

Notable Changes 
--- 

* CVE-2018-16846: rgw: enforce bounds on max-keys/max-uploads/max-parts 
(`issue#35994 `_) 
* CVE-2018-14662: mon: limit caps allowed to access the config store 

Notable Changes in v13.2.3 
--- 

* The default memory utilization for the mons has been increased 
somewhat. Rocksdb now uses 512 MB of RAM by default, which should 
be sufficient for small to medium-sized clusters; large clusters 
should tune this up. Also, the `mon_osd_cache_size` has been 
increase from 10 OSDMaps to 500, which will translate to an 
additional 500 MB to 1 GB of RAM for large clusters, and much less 
for small clusters. 

* Ceph v13.2.2 includes a wrong backport, which may cause mds to go into 
'damaged' state when upgrading Ceph cluster from previous version. 
The bug is fixed in v13.2.3. If you are already running v13.2.2, 
upgrading to v13.2.3 does not require special action. 

* The bluestore_cache_* options are no longer needed. They are replaced 
by osd_memory_target, defaulting to 4GB. BlueStore will expand 
and contract its cache to attempt to stay within this 
limit. Users upgrading should note this is a higher default 
than the previous bluestore_cache_size of 1GB, so OSDs using 
BlueStore will use more memory by default. 
For more details, see the `BlueStore docs 
`_.
 

* This version contains an upgrade bug, http://tracker.ceph.com/issues/36686, 
due to which upgrading during recovery/backfill can cause OSDs to fail. This 
bug can be worked around, either by restarting all the OSDs after the upgrade, 
or by upgrading when all PGs are in "active+clean" state. If you have already 
successfully upgraded to 13.2.2, this issue should not impact you. Going 
forward, we are working on a clean upgrade path for this feature. 


For more details please refer to the release blog at 
https://ceph.com/releases/13-2-4-mimic-released/ 

Getting ceph 
* Git at git://github.com/ceph/ceph.git 
* Tarball at http://download.ceph.com/tarballs/ceph-13.2.4.tar.gz 
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/ 
* Release git sha1: b10be4d44915a4d78a8e06aa31919e74927b142e 

-- 
Abhishek Lekshmanan 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, 
HRB 21284 (AG Nürnberg) 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Configure libvirt to 'see' already created snapshots of a vm rbd image

2019-01-07 Thread Jason Dillaman
I don't think libvirt has any facilities to list the snapshots of an
image for the purposes of display. It appears, after a quick scan of
the libvirt RBD backend [1] that it only internally lists image
snapshots for maintenance reasons.

[1] 
https://github.com/libvirt/libvirt/blob/master/src/storage/storage_backend_rbd.c

On Mon, Jan 7, 2019 at 7:16 AM Marc Roos  wrote:
>
>
>
> How do you configure libvirt so it sees the snapshots already created on
> the rbd image it is using for the vm?
>
> I have already a vm running connected to the rbd pool via
> protocol='rbd', and rbd snap ls is showing for snapshots.
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Configure libvirt to 'see' already created snapshots of a vm rbd image

2019-01-07 Thread Marc Roos



How do you configure libvirt so it sees the snapshots already created on 
the rbd image it is using for the vm?

I have already a vm running connected to the rbd pool via 
protocol='rbd', and rbd snap ls is showing for snapshots.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancer=on with crush-compat mode

2019-01-07 Thread Marc Roos
I am having with the change from pg 8 to pg 16

[@c01 ceph]# ceph osd df | egrep '^ID|^19|^20|^21|^30'
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
19   ssd 0.48000  1.0  447GiB  161GiB  286GiB 35.91 0.84  35
20   ssd 0.48000  1.0  447GiB  170GiB  277GiB 38.09 0.89  36
21   ssd 0.48000  1.0  447GiB  215GiB  232GiB 48.08 1.12  36
30   ssd 0.48000  1.0  447GiB  220GiB  227GiB 49.18 1.15  37

Coming from

[@c01 ~]# ceph osd df | egrep '^19|^20|^21|^30'
19   ssd 0.48000  1.0  447GiB  157GiB  290GiB 35.18 0.87  30
20   ssd 0.48000  1.0  447GiB  125GiB  322GiB 28.00 0.69  30
21   ssd 0.48000  1.0  447GiB  245GiB  202GiB 54.71 1.35  30
30   ssd 0.48000  1.0  447GiB  217GiB  230GiB 48.46 1.20  30

I guess I should know more about the technology behind this, to 
appreciate this result. 
(I guess the pgs stay this way until more data is added when
having "Error EDOM: Unable to find further optimization, ...")

 >>
 >>  >If I understand the balancer correct, it balances PGs not data.
 >>  >This worked perfectly fine in your case.
 >>  >
 >>  >I prefer a PG count of ~100 per OSD, you are at 30. Maybe it would
 >>  >help to bump the PGs.
 >>  >
 >
 >> I am not sure if I should increase from 8 to 16. Because that would 
just
 >>
 >> half the data in the pg's and they probably end up on the same osd's 
in
 >> the same ratio as now? Or am I assuming this incorrectly?
 >> Is 16 adviced?
 >>
 >
 >If you had only one PG (the very extremest usecase) it would always be 
optimally
 >misplaced. If you have lots, there are many more chances of ceph 
spreading them
 >correctly. There is some hashing and pseudorandom in there to screw it
 >up at times,
 >but considering what you can do with few -vs- many PGs, having many 
allows for
 >better spread than few, upto some point where the cpu handling all PGs 
eats more
 >resources than its worth.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ERR scrub mismatch

2019-01-07 Thread Marco Aroldi
Hello,
the errors are not resolved. Here is what I tried so far, without luck:

I have added a sixth monitor (ceph-mon06), then I deleted the first one
(ceph-mon01)
The mon IDs shift back (mon02 was ID1, now is 0 and so on...)
This is the actual monmap:
0: 192.168.50.21:6789/0 mon.ceph-mon02
1: 192.168.50.22:6789/0 mon.ceph-mon03
2: 192.168.50.23:6789/0 mon.ceph-mon04
3: 192.168.50.24:6789/0 mon.ceph-mon05
4: 192.168.50.25:6789/0 mon.ceph-mon06

Since this is a multi-mds setup (1 active, 3 standby), I've issued the
command
ceph fs set cephfs01 max_mds 2
Now we have 2 mds active and 2 standby

This is the mon log excerpt:
https://paste.ubuntu.com/p/2CHyDzxfSk/

I'm tempted to delete mon.0 (ceph-mon02 in my setup)
Please, any thoughts?
Thanks
Marco

Il giorno gio 8 nov 2018 alle ore 10:54 Marco Aroldi 
ha scritto:

> Hello,
> Since upgrade from Jewel to Luminous 12.2.8, in the logs are reported some
> errors related to "scrub mismatch", every day at the same time.
> I have 5 mon (from mon.0 to mon.4) and I need help to indentify and
> recover from this problem.
>
> This is the log:
> 2018-11-07 15:13:53.808128 [ERR]  mon.4 ScrubResult(keys
> {logm=46,mds_health=29,mds_metadata=1,mdsmap=24} crc
> {logm=1239992787,mds_health=3182263811,mds_metadata=3704185590,mdsmap=1114086003})
> 2018-11-07 15:13:53.808095 [ERR]  mon.0 ScrubResult(keys
> {logm=46,mds_health=30,mds_metadata=1,mdsmap=23} crc
> {logm=1239992787,mds_health=1194056063,mds_metadata=3704185590,mdsmap=3259702002})
> 2018-11-07 15:13:53.808061 [ERR]  scrub mismatch
> 2018-11-07 15:13:53.808026 [ERR]  mon.3 ScrubResult(keys
> {logm=46,mds_health=31,mds_metadata=1,mdsmap=22} crc
> {logm=1239992787,mds_health=807938287,mds_metadata=3704185590,mdsmap=662277977})
> 2018-11-07 15:13:53.807970 [ERR]  mon.0 ScrubResult(keys
> {logm=46,mds_health=30,mds_metadata=1,mdsmap=23} crc
> {logm=1239992787,mds_health=1194056063,mds_metadata=3704185590,mdsmap=3259702002})
> 2018-11-07 15:13:53.807939 [ERR]  scrub mismatch
> 2018-11-07 15:13:53.807916 [ERR]  mon.2 ScrubResult(keys
> {logm=46,mds_health=31,mds_metadata=1,mdsmap=22} crc
> {logm=1239992787,mds_health=807938287,mds_metadata=3704185590,mdsmap=662277977})
> 2018-11-07 15:13:53.807882 [ERR]  mon.0 ScrubResult(keys
> {logm=46,mds_health=30,mds_metadata=1,mdsmap=23} crc
> {logm=1239992787,mds_health=1194056063,mds_metadata=3704185590,mdsmap=3259702002})
> 2018-11-07 15:13:53.807844 [ERR]  scrub mismatch
>
> Any help will be appreciated
> Thanks
> Marco
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v13.2.4 Mimic released

2019-01-07 Thread Abhishek Lekshmanan

This is the fourth bugfix release of the Mimic v13.2.x long term stable
release series. This release includes two security fixes atop of v13.2.3
We recommend all users upgrade to this version. If you've already
upgraded to v13.2.3, the same restrictions from v13.2.2->v13.2.3 apply
here as well.

Notable Changes
---

* CVE-2018-16846: rgw: enforce bounds on max-keys/max-uploads/max-parts 
(`issue#35994 `_)
* CVE-2018-14662: mon: limit caps allowed to access the config store

Notable Changes in v13.2.3
---

* The default memory utilization for the mons has been increased
  somewhat.  Rocksdb now uses 512 MB of RAM by default, which should
  be sufficient for small to medium-sized clusters; large clusters
  should tune this up.  Also, the `mon_osd_cache_size` has been
  increase from 10 OSDMaps to 500, which will translate to an
  additional 500 MB to 1 GB of RAM for large clusters, and much less
  for small clusters.

* Ceph v13.2.2 includes a wrong backport, which may cause mds to go into
  'damaged' state when upgrading Ceph cluster from previous version.
  The bug is fixed in v13.2.3. If you are already running v13.2.2,
  upgrading to v13.2.3 does not require special action.

* The bluestore_cache_* options are no longer needed. They are replaced
  by osd_memory_target, defaulting to 4GB. BlueStore will expand
  and contract its cache to attempt to stay within this
  limit. Users upgrading should note this is a higher default
  than the previous bluestore_cache_size of 1GB, so OSDs using
  BlueStore will use more memory by default.
  For more details, see the `BlueStore docs 
`_.

* This version contains an upgrade bug, http://tracker.ceph.com/issues/36686,
  due to which upgrading during recovery/backfill can cause OSDs to fail. This
  bug can be worked around, either by restarting all the OSDs after the upgrade,
  or by upgrading when all PGs are in "active+clean" state. If you have already
  successfully upgraded to 13.2.2, this issue should not impact you. Going
  forward, we are working on a clean upgrade path for this feature.


For more details please refer to the release blog at
https://ceph.com/releases/13-2-4-mimic-released/

Getting ceph
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-13.2.4.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: b10be4d44915a4d78a8e06aa31919e74927b142e

-- 
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help Ceph Cluster Down

2019-01-07 Thread Caspar Smit
Arun,

This is what i already suggested in my first reply.

Kind regards,
Caspar

Op za 5 jan. 2019 om 06:52 schreef Arun POONIA <
arun.poo...@nuagenetworks.net>:

> Hi Kevin,
>
> You are right. Increasing number of PGs per OSD resolved the issue. I will
> probably add this config in /etc/ceph/ceph.conf file of ceph mon and OSDs
> so it applies on host boot.
>
> Thanks
> Arun
>
> On Fri, Jan 4, 2019 at 3:46 PM Kevin Olbrich  wrote:
>
>> Hi Arun,
>>
>> actually deleting was no good idea, thats why I wrote, that the OSDs
>> should be "out".
>> You have down PGs, that because the data is on OSDs that are
>> unavailable but known by the cluster.
>> This can be checked by using "ceph pg 0.5 query" (change PG name).
>>
>> Because your PG count is so much oversized, the overdose limits get
>> hit on every recovery on your cluster.
>> I had the same problem on a medium cluster when I added to many new
>> disks at once.
>> You already got this info from Caspar earlier in this thread.
>>
>>
>> https://ceph.com/planet/placement-groups-with-ceph-luminous-stay-in-activating-state/
>>
>> https://blog.widodh.nl/2018/01/placement-groups-with-ceph-luminous-stay-in-activating-state/
>>
>> The second link shows one of the config params you need to inject to
>> all your OSDs like this:
>> ceph tell osd.* injectargs --mon_max_pg_per_osd 1
>>
>> This might help you getting these PGs some sort of "active"
>> (+recovery/+degraded/+inconsistent/etc.).
>>
>> The down PGs will most likely never come back. It would bet, you will
>> find OSD IDs that are invalid in the acting set, meaning that
>> non-existent OSDs hold your data.
>> I had a similar problem on a test cluster with erasure code pools
>> where too many disks failed at the same time, you will then see
>> negative values as OSD IDs.
>>
>> Maybe this helps a little bit.
>>
>> Kevin
>>
>> Am Sa., 5. Jan. 2019 um 00:20 Uhr schrieb Arun POONIA
>> :
>> >
>> > Hi Kevin,
>> >
>> > I tried deleting newly added server from Ceph Cluster and looks like
>> Ceph is not recovering. I agree with unfound data but it doesn't say about
>> unfound data. It says inactive/down for PGs and I can't bring them up.
>> >
>> >
>> > [root@fre101 ~]# ceph health detail
>> > 2019-01-04 15:17:05.711641 7f27b0f31700 -1 asok(0x7f27ac0017a0)
>> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
>> bind the UNIX domain socket to
>> '/var/run/ceph-guests/ceph-client.admin.129552.139808366139728.asok': (2)
>> No such file or directory
>> > HEALTH_ERR 3 pools have many more objects per pg than average;
>> 523656/12393978 objects misplaced (4.225%); 6517 PGs pending on creation;
>> Reduced data availability: 6585 pgs inactive, 1267 pgs down, 2 pgs peering,
>> 2703 pgs stale; Degraded data redundancy: 86858/12393978 objects degraded
>> (0.701%), 717 pgs degraded, 21 pgs undersized; 99059 slow requests are
>> blocked > 32 sec; 4834 stuck requests are blocked > 4096 sec; too many PGs
>> per OSD (3003 > max 200)
>> > MANY_OBJECTS_PER_PG 3 pools have many more objects per pg than average
>> > pool glance-images objects per pg (10478) is more than 92.7257
>> times cluster average (113)
>> > pool vms objects per pg (4722) is more than 41.7876 times cluster
>> average (113)
>> > pool volumes objects per pg (1220) is more than 10.7965 times
>> cluster average (113)
>> > OBJECT_MISPLACED 523656/12393978 objects misplaced (4.225%)
>> > PENDING_CREATING_PGS 6517 PGs pending on creation
>> > osds
>> [osd.0,osd.1,osd.10,osd.11,osd.12,osd.13,osd.14,osd.15,osd.16,osd.17,osd.18,osd.19,osd.2,osd.20,osd.21,osd.22,osd.23,osd.24,osd.25,osd.26,osd.27,osd.28,osd.29,osd.3,osd.30,osd.31,osd.32,osd.33,osd.34,osd.35,osd.4,osd.5,osd.6,osd.7,osd.8,osd.9]
>> have pending PGs.
>> > PG_AVAILABILITY Reduced data availability: 6585 pgs inactive, 1267 pgs
>> down, 2 pgs peering, 2703 pgs stale
>> > pg 10.90e is stuck inactive for 94928.999109, current state
>> activating, last acting [2,6]
>> > pg 10.913 is stuck inactive for 95094.175400, current state
>> activating, last acting [9,5]
>> > pg 10.915 is stuck inactive for 94929.184177, current state
>> activating, last acting [30,26]
>> > pg 11.907 is stuck stale for 9612.906582, current state
>> stale+active+clean, last acting [38,24]
>> > pg 11.910 is stuck stale for 11822.359237, current state
>> stale+down, last acting [21]
>> > pg 11.915 is stuck stale for 9612.906604, current state
>> stale+active+clean, last acting [38,31]
>> > pg 11.919 is stuck inactive for 95636.716568, current state
>> activating, last acting [25,12]
>> > pg 12.902 is stuck stale for 10810.497213, current state
>> stale+activating, last acting [36,14]
>> > pg 13.901 is stuck stale for 94889.512234, current state
>> stale+active+clean, last acting [1,31]
>> > pg 13.904 is stuck stale for 10745.279158, current state
>> stale+active+clean, last acting [37,8]
>> > pg 13.908 is stuck stale for 10745.279176, current state
>>