date:20170223

[ceph-users] ceph upgrade from hammer to jewel

2017-02-23 Thread gjprabu

Hi Team,



We upgraded ceph version from 0.94.9 hammer to 10.2.5 jewel . Still 
some clients are showing older version while mounting with debug mode, is this 
caused any issue with OSD and MON. How to find the solution.





New version and properly working client


root@172.20.25.162/home/sas#ceph-fuse  --client-quota -d -m 
xxx.20.24.128,xxx.20.24.160,xxx.20.23.169:6789 /home/data

2017-02-22 12:05:55.282295 7fceee137ec0  0 ceph version 10.2.5 
(c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-fuse, pid 5762

ceph-fuse[5762]: starting ceph client2017-02-22 12:05:55.345005 7fceee137ec0 -1 
init, newargv = 0x7fcef81ce7e0 newargc=11



ceph-fuse[5762]: starting fuse







Older Version still showing
root@ /home/sas#ceph-fuse  --client-quota -d -m 
xxx.20.24.128,xxx.20.24.160,xxx.20.23.169:6789 /home/data

2017-02-22 11:56:41.078822 7f51dd4e2780  0 ceph version 0.94.9 
(fe6d859066244b97b24f09d46552afc2071e6f90), process ceph-fuse, pid 12790

2017-02-22 11:56:41.105082 7f51dd4e2780 -1 ceph-fuse[12790]: starting ceph 
clientinit, newargv = 0x2aa0fd0 newargc=11



ceph-fuse[12790]: starting fuse





Regards

PRabu GJ

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph upgrade from hammer to jewel

2017-02-23 Thread jiajia zhong

are you sure you have ceph-fuse upgraded?
#ceph-fuse --version


2017-02-23 16:07 GMT+08:00 gjprabu :

> Hi Team,
>
> We upgraded ceph version from 0.94.9 hammer to 10.2.5 jewel .
> Still some clients are showing older version while mounting with debug
> mode, is this caused any issue with OSD and MON. How to find the solution.
>
>
> *New version and properly working client*
>
> root@172.20.25.162/home/sas#ceph-fuse  --client-quota -d -m
> xxx.20.24.128,xxx.20.24.160,xxx.20.23.169:6789 /home/data
> 2017-02-22 12:05:55.282295 7fceee137ec0  0 *ceph version 10.2.5* (
> c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-fuse, pid 5762
> ceph-fuse[5762]: starting ceph client2017-02-22 12:05:55.345005
> 7fceee137ec0 -1 init, newargv = 0x7fcef81ce7e0 newargc=11
>
> ceph-fuse[5762]: starting fuse
>
>
>
> *Older Version still showing*
> root@ /home/sas#ceph-fuse  --client-quota -d -m
> xxx.20.24.128,xxx.20.24.160,xxx.20.23.169:6789 /home/data
> 2017-02-22 11:56:41.078822 7f51dd4e2780  0 *ceph version 0.94.9* (
> fe6d859066244b97b24f09d46552afc2071e6f90), process ceph-fuse, pid 12790
> 2017-02-22 11:56:41.105082 7f51dd4e2780 -1 ceph-fuse[12790]: starting ceph
> clientinit, newargv = 0x2aa0fd0 newargc=11
>
> ceph-fuse[12790]: starting fuse
>
>
> Regards
> PRabu GJ
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Authentication error CEPH installation

2017-02-23 Thread Chaitanya Ravuri

Hi Team,

I have recently deployed a new CEPH cluster for OEL6 boxes for my testing.
I am getting below error on the admin host. not sure how can i fix it.

2017-02-23 02:13:04.166366 7f9c85efb700  0 librados: client.admin
authentication error (1) Operation not permitted
Error connecting to cluster: PermissionError


I have reviewed few blogs and tried copying as below

 scp /etc/ceph/ceph.client.radosgw.keyring host1:/etc/ceph/

It didnt help .

Can anyone please suggest further.

Thanks,
RC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Authentication error CEPH installation

2017-02-23 Thread Brad Hubbard

You need ceph.client.admin.keyring in /etc/ceph/

On Thu, Feb 23, 2017 at 8:13 PM, Chaitanya Ravuri
 wrote:
> Hi Team,
>
> I have recently deployed a new CEPH cluster for OEL6 boxes for my testing. I
> am getting below error on the admin host. not sure how can i fix it.
>
> 2017-02-23 02:13:04.166366 7f9c85efb700  0 librados: client.admin
> authentication error (1) Operation not permitted
> Error connecting to cluster: PermissionError
>
>
> I have reviewed few blogs and tried copying as below
>
>  scp /etc/ceph/ceph.client.radosgw.keyring host1:/etc/ceph/
>
> It didnt help .
>
> Can anyone please suggest further.
>
> Thanks,
> RC
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg

I have a RADOS pool with nearly a million objects in it--but I don't 
exactly know how many, and that's the point.


I ran a long list_objects() overnight and, at first glance this morning, 
the output looks good, but it is thousands of objects fewer than 
get_stats() said are there. I am just doing early development and this 
cluster isn't deployed in any production, so I seriously doubt there is 
any other activity going on changing the total. (I know *my* code isn't 
doing anything--it is broken by this mismatch.)


A few weeks ago I learned that get_stats() can be out-of-date and report 
low, that it takes time for news of new objects to arrive, and that 
makes sense. But can get_stats() also report high (by over 1%)? Is 
get_stats() just a ballpark estimate?


Thanks,

-kb


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin bucket check kills SSD disks

2017-02-23 Thread Marius Vaitiekunas

On Wed, Feb 22, 2017 at 4:06 PM, Marius Vaitiekunas <
mariusvaitieku...@gmail.com> wrote:

> Hi Cephers,
>
> We are running latest jewel (10.2.5). Bucket index sharding is set to 8.
> rgw pools except data are placed on SSD.
> Today I've done some testing and run bucket index check on a bucket with
> ~120k objects:
>
> # radosgw-admin bucket check -b mybucket --fix --check-objects
> --rgw-realm=myrealm
>
> In a minute or two three SSD disks were down and flapping. My guess is
> that these disks host a PG with index of this bucket.
>
> Should we expect that with --check-objects flag and do not use it?
>
> --
> Marius Vaitiekūnas
>

Hi,

In case, somebody also hits the issue :) Actually bucket index shard in our
cluster is 1. We didn't know about the following setting:
'rgw_override_bucket_index_max_shards'. We though, that only
'bucket_index_max_shards' is enough.
Keep in mind, that you need to update existing zonegroups to make sharding
work after correct settings are set..

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bug maybe: osdmap failed undecoded

2017-02-23 Thread huang jun

you can copy the corrupt osdmap file from osd.1 and then restart osd,
we met this before, and that works for us.

2017-02-23 22:33 GMT+08:00 tao chang :
> HI,
>
> I have a ceph cluster  (ceph 10.2.5) witch 3 node, each has two osds.
>
> It was a power outage last night  and all the server are restarted
> this morning again.
> All osds are work well except the osd.0.
>
> ID WEIGHT  TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 0.04500 root volumes
> -2 0.01500 host zk25-02
>  0 0.01500 osd.0   down0  1.0
>  1 0.01500 osd.1 up  1.0  1.0
> -3 0.01500 host zk25-03
>  2 0.01500 osd.2 up  1.0  1.0
>  3 0.01500 osd.3 up  1.0  1.0
> -4 0.01500 host zk25-01
>  4 0.01500 osd.4 up  1.0  1.0
>  5 0.01500 osd.5 up  1.0  1.0
>
> I tried to run it again with gdb, it turned it like this:
>
> (gdb) bt
> #0  0x74cfd5f7 in raise () from /lib64/libc.so.6
> #1  0x74cfece8 in abort () from /lib64/libc.so.6
> #2  0x756019d5 in __gnu_cxx::__verbose_terminate_handler() ()
> from /lib64/libstdc++.so.6
> #3  0x755ff946 in ?? () from /lib64/libstdc++.so.6
> #4  0x755ff973 in std::terminate() () from /lib64/libstdc++.so.6
> #5  0x755ffb93 in __cxa_throw () from /lib64/libstdc++.so.6
> #6  0x55b93b7f in pg_pool_t::decode (this=,
> bl=...) at osd/osd_types.cc:1569
> #7  0x55f3a53f in decode (p=..., c=...) at osd/osd_types.h:1487
> #8  decode (m=Python Exception  'exceptions.IndexError'> list index out of range:
> std::map with 1 elements, p=...) at include/encoding.h:648
> #9  0x55f2fa8d in OSDMap::decode_classic
> (this=this@entry=0x5fdf6480, p=...) at osd/OSDMap.cc:2026
> #10 0x55f2fe8c in OSDMap::decode
> (this=this@entry=0x5fdf6480, bl=...) at osd/OSDMap.cc:2116
> #11 0x55f3116e in OSDMap::decode (this=0x5fdf6480, bl=...)
> at osd/OSDMap.cc:1985
> #12 0x558e51fc in OSDService::try_get_map
> (this=0x5ff51860, epoch=76) at osd/OSD.cc:1340
> #13 0x55947ece in OSDService::get_map (this=,
> e=, this=) at osd/OSD.h:884
> #14 0x558fb0f2 in OSD::init (this=0x5ff5) at osd/OSD.h:1917
> #15 0x5585eea5 in main (argc=, argv= out>) at ceph_osd.cc:605
>
> it was caused by failed undecoded of osdmap structure from osdmap
> file(/var/lib/ceph/osd/ceph-0/current/meta/osdmap.76__0_64173F9C__none)
> .
> And by comparing the same file on osd.1, It make sure the osdmap file
> has been corrupted.
>
>
> Any one know how to fix it ? Thanks for advance !
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg


On 02/23/2017 07:43 AM, Kent Borg wrote:
I ran a long list_objects() overnight and, at first glance this 
morning, the output looks good, but it is thousands of objects fewer 
than get_stats() said are there.


Update: I scripted up a quick check and every object name I would expect 
to be in my pool is in the list_objects() output, no extra names are 
there, and the reported sizes are all reasonable.


Yet get_stats() says there are thousands additional objects in that pool.

It took some time to put all those objects in the pool, and for most of 
that time get_stats() did report the number I expected. But then, it 
differed...


Is my cluster corrupted, is get_stats() sometimes (!) just a +/- 1%-ish 
estimate, or is my logic and use of Ceph wrong?


Thanks,

-kb

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph upgrade from hammer to jewel

2017-02-23 Thread gjprabu

Hi zhong, 

Yes, one of the client was not upgraded ceph-fuse version , now it's working 
thank you Regards
Prabu GJ 

 On Thu, 23 Feb 2017 15:08:42 +0530 zhong2p...@gmail.com wrote 

are you sure you have ceph-fuse upgraded? 
#ceph-fuse --version


2017-02-23 16:07 GMT+08:00 gjprabu :
Hi Team,

            We upgraded ceph version from 0.94.9 hammer to 10.2.5 jewel . Still 
some clients are showing older version while mounting with debug mode, is this 
caused any issue with OSD and MON. How to find the solution.


New version and properly working client

root@172.20.25.162/home/sas#ceph-fuse  --client-quota -d -m 
xxx.20.24.128,xxx.20.24.160,xxx.20.23.169:6789 /home/data
2017-02-22 12:05:55.282295 7fceee137ec0  0 ceph version 10.2.5 
(c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-fuse, pid 5762
ceph-fuse[5762]: starting ceph client2017-02-22 12:05:55.345005 7fceee137ec0 -1 
init, newargv = 0x7fcef81ce7e0 newargc=11

ceph-fuse[5762]: starting fuse



Older Version still showing
root@     /home/sas#ceph-fuse  --client-quota -d -m 
xxx.20.24.128,xxx.20.24.160,xxx.20.23.169:6789 /home/data
2017-02-22 11:56:41.078822 7f51dd4e2780  0 ceph version 0.94.9 
(fe6d859066244b97b24f09d46552afc2071e6f90), process ceph-fuse, pid 12790
2017-02-22 11:56:41.105082 7f51dd4e2780 -1 ceph-fuse[12790]: starting ceph 
clientinit, newargv = 0x2aa0fd0 newargc=11

ceph-fuse[12790]: starting fuse


Regards
PRabu GJ

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG stuck peering after host reboot

2017-02-23 Thread george.vasilakakos

Since we need this pool to work again, we decided to take the data loss and try 
to move on.

So far, no luck. We tried a force create but, as expected, with a PG that is 
not peering this did absolutely nothing.
We also tried rm-past-intervals and remove from ceph-objectstore-tool and 
manually deleting the data directories in the disks. The PG remains 
down+remapped with two OSDs failing to join the acting set. These have been 
restarted multiple times to no avail.

# ceph pg map 1.323
osdmap e23122 pg 1.323 (1.323) -> up 
[595,1391,240,127,937,362,267,320,986,634,716] acting 
[595,1391,240,127,937,362,267,320,986,2147483647,2147483647]

We have also seen some very odd behaviour. 
# ceph pg map 1.323
osdmap e22909 pg 1.323 (1.323) -> up 
[595,1391,240,127,937,362,267,320,986,634,716] acting 
[595,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647]

Straight after a restart of all OSDs in the PG and after everything else has 
settled down. From that state restarting 595 results in:

# ceph pg map 1.323
osdmap e22921 pg 1.323 (1.323) -> up 
[595,1391,240,127,937,362,267,320,986,634,716] acting 
[2147483647,1391,240,127,937,362,267,320,986,634,716]

Restarting 595 doesn't change this. Another restart of all OSDs in the PG 
results in the state seen above with the last two replaced by ITEM_NONE.

Another strange thing is that on osd.7 (the one originally at rank 8 that was 
restarted and caused this problem) the objectstore tool fails to remove the PG 
and crashes out:

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op remove --pgid 
1.323s8
 marking collection for removal
setting '_remove' omap key
finish_remove_pgs 1.323s8_head removing 1.323s8
 *** Caught signal (Aborted) **
 in thread 7fa713782700 thread_name:tp_fstore_op
 ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
 1: (()+0x97463a) [0x7fa71c47563a]
 2: (()+0xf370) [0x7fa71935a370]
 3: (snappy::RawUncompress(snappy::Source*, char*)+0x374) [0x7fa71abd0cd4]
 4: (snappy::RawUncompress(char const*, unsigned long, char*)+0x3d) 
[0x7fa71abd0e2d]
 5: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions 
const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x35e) 
[0x7fa71b08007e]
 6: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, 
leveldb::Slice const&)+0x276) [0x7fa71b081196]
 7: (()+0x3c820) [0x7fa71b083820]
 8: (()+0x3c9cd) [0x7fa71b0839cd]
 9: (()+0x3ca3e) [0x7fa71b083a3e]
 10: (()+0x39c75) [0x7fa71b080c75]
 11: (()+0x21e20) [0x7fa71b068e20]
 12: (()+0x223c5) [0x7fa71b0693c5]
 13: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::seek_to_first(std::string 
const&)+0x3d) [0x7fa71c3ecb1d]
 14: (LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(std::string 
const&)+0x138) [0x7fa71c3ec028]
 15: (DBObjectMap::clear_header(std::shared_ptr, 
std::shared_ptr)+0x1d0) [0x7fa71c400a40]
 16: (DBObjectMap::_clear(std::shared_ptr, 
std::shared_ptr)+0xa1) [0x7fa71c401171]
 17: (DBObjectMap::clear(ghobject_t const&, SequencerPosition const*)+0x1ff) 
[0x7fa71c4075bf]
 18: (FileStore::lfn_unlink(coll_t const&, ghobject_t const&, SequencerPosition 
const&, bool)+0x241) [0x7fa71c2c0d41]
 19: (FileStore::_remove(coll_t const&, ghobject_t const&, SequencerPosition 
const&)+0x8e) [0x7fa71c2c171e]
 20: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0x433e) [0x7fa71c2d8c6e]
 21: (FileStore::_do_transactions(std::vector >&, unsigned long, 
ThreadPool::TPHandle*)+0x3b) [0x7fa71c2db75b]
 22: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x2cd) 
[0x7fa71c2dba5d]
 23: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb59) [0x7fa71c63e189]
 24: (ThreadPool::WorkThread::entry()+0x10) [0x7fa71c63f160]
 25: (()+0x7dc5) [0x7fa719352dc5]
 26: (clone()+0x6d) [0x7fa71843e73d]
Aborted

At this point all we want to achieve is for the PG to peer again (and soon) 
without us having to delete the pool.

Any help would be appreciated...

From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of 
george.vasilaka...@stfc.ac.uk [george.vasilaka...@stfc.ac.uk]
Sent: 22 February 2017 14:35
To: w...@42on.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG stuck peering after host reboot

So what I see there is this for osd.307:

"empty": 1,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 0,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}

last_epoch_started is 0 and empty is 1. The other OSDs are reporting 
last_epoch_started 16806 and empty 0.

I noticed that too and was wondering why it never completed recovery and joined

> If you stop osd.307 and maybe mark it as out, does that help?

No, I see the same thing I saw when I took 595 out:

[root@ceph-mon1 ~]# ceph pg map 1.323
osdmap e22392 pg 1.323 (1.323) -> up 
[985,1391,240,127,937,362,267,320,7,634,716] acting 
[2147483647,1391,2

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Gregory Farnum

On Thu, Feb 23, 2017 at 6:55 AM, Kent Borg  wrote:
> On 02/23/2017 07:43 AM, Kent Borg wrote:
>>
>> I ran a long list_objects() overnight and, at first glance this morning,
>> the output looks good, but it is thousands of objects fewer than get_stats()
>> said are there.
>
>
> Update: I scripted up a quick check and every object name I would expect to
> be in my pool is in the list_objects() output, no extra names are there, and
> the reported sizes are all reasonable.
>
> Yet get_stats() says there are thousands additional objects in that pool.
>
> It took some time to put all those objects in the pool, and for most of that
> time get_stats() did report the number I expected. But then, it differed...
>
> Is my cluster corrupted, is get_stats() sometimes (!) just a +/- 1%-ish
> estimate, or is my logic and use of Ceph wrong?

Did you run a pg split or something? That's the only off-hand way I
can think of the number of objects going over, though I don't recall
how snapshots impact those numbers and obviously it's very wonky if
you were to use a cache tier.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg


On 02/23/2017 02:13 PM, Gregory Farnum wrote:

Did you run a pg split or something? That's the only off-hand way I
can think of the number of objects going over, though I don't recall
how snapshots impact those numbers and obviously it's very wonky if
you were to use a cache tier.



We did increase the number of PG/PGPs. I also noticed that the number of 
PGs should be a power of two, and our first bump up was not a power of 
two, so we increased it again so it would be. Would that break things?


Thanks,

-kb

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Gregory Farnum

Yeah, that's why. It'll fix itself once all the newly-split PGS have
scrubbed, but in order to keep the splitting operation constant-time it has
to estimate how many objects ended up in each of the new ones.
-Greg

On Thu, Feb 23, 2017 at 11:26 AM Kent Borg  wrote:

> On 02/23/2017 02:13 PM, Gregory Farnum wrote:
> > Did you run a pg split or something? That's the only off-hand way I
> > can think of the number of objects going over, though I don't recall
> > how snapshots impact those numbers and obviously it's very wonky if
> > you were to use a cache tier.
> >
>
> We did increase the number of PG/PGPs. I also noticed that the number of
> PGs should be a power of two, and our first bump up was not a power of
> two, so we increased it again so it would be. Would that break things?
>
> Thanks,
>
> -kb
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg


On 02/23/2017 02:51 PM, Gregory Farnum wrote:
Yeah, that's why. It'll fix itself once all the newly-split PGS have 
scrubbed, but in order to keep the splitting operation constant-time 
it has to estimate how many objects ended up in each of the new ones.


That makes some sense. Thanks!

While I have you... Why should the number of PGs be a power of two? 
(What do we break if it is not? Does later on bumping it up to a power 
of two fix all we broke?)


-kb

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Gregory Farnum

On Thu, Feb 23, 2017 at 12:11 PM Kent Borg  wrote:

> On 02/23/2017 02:51 PM, Gregory Farnum wrote:
> > Yeah, that's why. It'll fix itself once all the newly-split PGS have
> > scrubbed, but in order to keep the splitting operation constant-time
> > it has to estimate how many objects ended up in each of the new ones.
>
> That makes some sense. Thanks!
>
> While I have you... Why should the number of PGs be a power of two?
> (What do we break if it is not? Does later on bumping it up to a power
> of two fix all we broke?)
>
> -kb
>
> If your PG count isn't a power of two, some of them will have double the
number of objects of the others. It mostly doesn't matter, though at low
counts it can improve balance. There's no breakage that Ceph cares about.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg


On 02/23/2017 03:13 PM, Gregory Farnum wrote:
If your PG count isn't a power of two, some of them will have double 
the number of objects of the others. It mostly doesn't matter, though 
at low counts it can improve balance. There's no breakage that Ceph 
cares about.

-Greg


Good to know. This stuff is making more sense everyday.

Thank you very much!

-kb
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Random Health_warn

2017-02-23 Thread Scottix

ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

We are seeing a weird behavior or not sure how to diagnose what could be
going on. We started monitoring the overall_status from the json query and
every once in a while we would get a HEALTH_WARN for a minute or two.

Monitoring logs.
02/23/2017 07:25:54 AM HEALTH_OK
02/23/2017 07:24:54 AM HEALTH_WARN
02/23/2017 07:23:55 AM HEALTH_OK
02/23/2017 07:22:54 AM HEALTH_OK
...
02/23/2017 05:13:55 AM HEALTH_OK
02/23/2017 05:12:54 AM HEALTH_WARN
02/23/2017 05:11:54 AM HEALTH_WARN
02/23/2017 05:10:54 AM HEALTH_OK
02/23/2017 05:09:54 AM HEALTH_OK

When I check the mon leader logs there is no indication of an error or
issues that could be occuring. Is there a way to find what is causing the
HEALTH_WARN?

Best,
Scott
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Random Health_warn

2017-02-23 Thread Robin H. Johnson

On Thu, Feb 23, 2017 at 09:49:21PM +, Scottix wrote:
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> 
> We are seeing a weird behavior or not sure how to diagnose what could be
> going on. We started monitoring the overall_status from the json query and
> every once in a while we would get a HEALTH_WARN for a minute or two.
> 
> Monitoring logs.
> 02/23/2017 07:25:54 AM HEALTH_OK
> 02/23/2017 07:24:54 AM HEALTH_WARN
> 02/23/2017 07:23:55 AM HEALTH_OK
> 02/23/2017 07:22:54 AM HEALTH_OK
> ...
> 02/23/2017 05:13:55 AM HEALTH_OK
> 02/23/2017 05:12:54 AM HEALTH_WARN
> 02/23/2017 05:11:54 AM HEALTH_WARN
> 02/23/2017 05:10:54 AM HEALTH_OK
> 02/23/2017 05:09:54 AM HEALTH_OK
> 
> When I check the mon leader logs there is no indication of an error or
> issues that could be occuring. Is there a way to find what is causing the
> HEALTH_WARN?
By leader logs, do you mean the cluster log (mon_cluster_log_file), or
the mon log (log_file)? Eg /var/log/ceph/ceph.log vs 
/var/log/ceph/ceph-mon.$ID.log.

Could you post the log entries for a time period between two HEALTH_OK
states with a HEALTH_WARN in the middle?

The reason for WARN _should_ be included on the logged status line.

Alternatively, you should be able to just log the output of 'ceph -w'
for a while, and find the WARN status as well.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Gregory Farnum

On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof  wrote:
> I tried starting up just a couple OSD with debug_osd = 20 and
> debug_filestore = 20.
>
> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
> unusual but maybe someone else sees something in here that is a
> problem:  http://pastebin.com/uy8S7hps
>
> As this log is rolling on, our OSD has still not been marked up and is
> occupying 100% of a CPU core.  I've done this a couple times and in a
> matter of some hours it will be marked up and CPU will drop.  If more
> kraken OSD on another host are brought up the existing kraken OSD go
> back into max CPU usage again while pg recover.  The trend scales
> upward as OSD are started until the system is completely saturated.
>
> I was reading the docs on async messenger settings at
> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
> that under 'ms async max op threads' there is a note about one or more
> CPUs constantly on 100% load.  As an experiment I set max op threads
> to 20 and that is the setting during the period of the pasted log.  It
> seems to make no difference.
>
> Appreciate any thoughts on troubleshooting this.  For the time being
> I've aborted our kraken update and will probably re-initialize any
> already updated OSD to revert to Jewel except perhaps one host to
> continue testing.

Ah, that log looks like you're just generating OSDMaps so quickly that
rebooting 60 at a time leaves you with a ludicrous number to churn
through, and that takes a while. It would have been exacerbated by
having 60 daemons fight for the CPU to process them, leading to
flapping.

You might try restarting daemons sequentially on the node instead of
all at once. Depending on your needs it would be even cheaper if you
set the nodown flag, though obviously that will impede IO while it
happens.

I'd be concerned that this demonstrates you don't have enough CPU
power per daemon, though.
-Greg

>
> thanks,
> Ben
>
> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
>> wrote:
>>> Hi all,
>>>
>>> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
>>> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
>>> mons to Kraken.
>>>
>>> After updating ceph packages I restarted the 60 OSD on the box with
>>> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
>>> load flat-lines at 100% with top showing all of that being system load
>>> from ceph-osd processes.  Not long after we get OSD flapping due to
>>> the load on the system (noout was set to start this, but perhaps
>>> too-quickly unset post restart).
>>>
>>> This is causing problems in the cluster, and we reboot the box.  The
>>> OSD don't start up/mount automatically - not a new problem on this
>>> setup.  We run 'ceph-disk activate $disk' on a list of all the
>>> /dev/dm-X devices as output by ceph-disk list.  Everything activates
>>> and the CPU gradually climbs to once again be a solid 100%.  No OSD
>>> have joined cluster so it isn't causing issues.
>>>
>>> I leave the box overnight...by the time I leave I see that 1-2 OSD on
>>> this box are marked up/in.   By morning all are in, CPU is fine,
>>> cluster is still fine.
>>>
>>> This is not a show-stopping issue now that I know what happens though
>>> it means upgrades are a several hour or overnight affair.  Next box I
>>> will just mark all the OSD out before updating and restarting them or
>>> try leaving them up but being sure to set noout to avoid flapping
>>> while they churn.
>>>
>>> Here's a log snippet from one currently spinning in the startup
>>> process since 11am.  This is the second box we did, the first
>>> experience being as detailed above.  Could this have anything to do
>>> with the 'PGs are upgrading' message?
>>
>> It doesn't seem likely — there's a fixed per-PG overhead that doesn't
>> scale with the object count. I could be missing something but I don't
>> see anything in the upgrade notes that should be doing this either.
>> Try running an upgrade with "debug osd = 20" and "debug filestore =
>> 20" set and see what the log spits out.
>> -Greg
>>
>>>
>>> 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to load 
>>> lua
>>> 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514119978713088, adjusting msgr requires for clients
>>> 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514394856620032 was 8705, adjusting msgr requires for
>>> mons
>>> 2017-02-14 11:04:07.077324 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514394856620032, adjusting msgr requires for osds
>>> 2017-02-14 11:04:09.446832 7fd7a0372940  0 osd.585 135493 load_pgs
>>> 2017-02-14 11:04:09.522249 7fd7a0372940 -1 osd.585 135493 PGs are upgrading
>>> 2017-02-14 11:04:10.246166 7fd7a0372940  0 osd.585 135493 load_pgs
>>> opened 148 pgs
>>> 2017-02-14 11:04:10.246249 7f

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Benjeman Meekhof

Hi Greg,

Appreciate you looking into it.  I'm concerned about CPU power per
daemon as well...though we never had this issue when restarting our
dense nodes under Jewel.  Is the rapid rate of OSDmap generation a
one-time condition particular to post-update processing or to Kraken
in general?

We did eventually get all the OSD back up either by doing so in small
batches or setting nodown and waiting for the host to churn
through...a day or so later all the OSD pop up.  Now that we're in a
stable non-degraded state I have to do more tests to see what happens
under Kraken when we kill a node or several nodes.

I have to give ceph a lot of credit here.  Following my email the 16th
while we were in a marginal state with kraken OSD churning to come up
we lost a data center for a minute.  Subsequently we had our remaining
2 mons refuse to stay in quorom long enough to serve cluster sessions
(constant back and forth elections).  I believe the issue was timeouts
caused by explosive leveldb growth in combination with other activity
but eventually we got them to come back by increasing db lease time in
ceph settings.  We had some unfound objects at this point but after
waiting out all the OSD coming online with nodown/noout set everything
was fine.  I should have been more careful in applying the update but
as one of our team put it we definitely found out that Ceph is
resilient to admins as well as other disasters.

thanks,
Ben

On Thu, Feb 23, 2017 at 5:10 PM, Gregory Farnum  wrote:
> On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof  wrote:
>> I tried starting up just a couple OSD with debug_osd = 20 and
>> debug_filestore = 20.
>>
>> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
>> unusual but maybe someone else sees something in here that is a
>> problem:  http://pastebin.com/uy8S7hps
>>
>> As this log is rolling on, our OSD has still not been marked up and is
>> occupying 100% of a CPU core.  I've done this a couple times and in a
>> matter of some hours it will be marked up and CPU will drop.  If more
>> kraken OSD on another host are brought up the existing kraken OSD go
>> back into max CPU usage again while pg recover.  The trend scales
>> upward as OSD are started until the system is completely saturated.
>>
>> I was reading the docs on async messenger settings at
>> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
>> that under 'ms async max op threads' there is a note about one or more
>> CPUs constantly on 100% load.  As an experiment I set max op threads
>> to 20 and that is the setting during the period of the pasted log.  It
>> seems to make no difference.
>>
>> Appreciate any thoughts on troubleshooting this.  For the time being
>> I've aborted our kraken update and will probably re-initialize any
>> already updated OSD to revert to Jewel except perhaps one host to
>> continue testing.
>
> Ah, that log looks like you're just generating OSDMaps so quickly that
> rebooting 60 at a time leaves you with a ludicrous number to churn
> through, and that takes a while. It would have been exacerbated by
> having 60 daemons fight for the CPU to process them, leading to
> flapping.
>
> You might try restarting daemons sequentially on the node instead of
> all at once. Depending on your needs it would be even cheaper if you
> set the nodown flag, though obviously that will impede IO while it
> happens.
>
> I'd be concerned that this demonstrates you don't have enough CPU
> power per daemon, though.
> -Greg
>
>>
>> thanks,
>> Ben
>>
>> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
>>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
>>> wrote:
 Hi all,

 We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
 (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
 mons to Kraken.

 After updating ceph packages I restarted the 60 OSD on the box with
 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
 load flat-lines at 100% with top showing all of that being system load
 from ceph-osd processes.  Not long after we get OSD flapping due to
 the load on the system (noout was set to start this, but perhaps
 too-quickly unset post restart).

 This is causing problems in the cluster, and we reboot the box.  The
 OSD don't start up/mount automatically - not a new problem on this
 setup.  We run 'ceph-disk activate $disk' on a list of all the
 /dev/dm-X devices as output by ceph-disk list.  Everything activates
 and the CPU gradually climbs to once again be a solid 100%.  No OSD
 have joined cluster so it isn't causing issues.

 I leave the box overnight...by the time I leave I see that 1-2 OSD on
 this box are marked up/in.   By morning all are in, CPU is fine,
 cluster is still fine.

 This is not a show-stopping issue now that I know what happens though
 it means upgrades are a several hour or overnight affair

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Gregory Farnum

On Thu, Feb 23, 2017 at 2:34 PM, Benjeman Meekhof  wrote:
> Hi Greg,
>
> Appreciate you looking into it.  I'm concerned about CPU power per
> daemon as well...though we never had this issue when restarting our
> dense nodes under Jewel.  Is the rapid rate of OSDmap generation a
> one-time condition particular to post-update processing or to Kraken
> in general?

I'm not aware of anything that would have made this change in Kraken,
but it's possible. Sorry I don't have more detail on this.
-Greg

>
> We did eventually get all the OSD back up either by doing so in small
> batches or setting nodown and waiting for the host to churn
> through...a day or so later all the OSD pop up.  Now that we're in a
> stable non-degraded state I have to do more tests to see what happens
> under Kraken when we kill a node or several nodes.
>
> I have to give ceph a lot of credit here.  Following my email the 16th
> while we were in a marginal state with kraken OSD churning to come up
> we lost a data center for a minute.  Subsequently we had our remaining
> 2 mons refuse to stay in quorom long enough to serve cluster sessions
> (constant back and forth elections).  I believe the issue was timeouts
> caused by explosive leveldb growth in combination with other activity
> but eventually we got them to come back by increasing db lease time in
> ceph settings.  We had some unfound objects at this point but after
> waiting out all the OSD coming online with nodown/noout set everything
> was fine.  I should have been more careful in applying the update but
> as one of our team put it we definitely found out that Ceph is
> resilient to admins as well as other disasters.
>
> thanks,
> Ben
>
> On Thu, Feb 23, 2017 at 5:10 PM, Gregory Farnum  wrote:
>> On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof  wrote:
>>> I tried starting up just a couple OSD with debug_osd = 20 and
>>> debug_filestore = 20.
>>>
>>> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
>>> unusual but maybe someone else sees something in here that is a
>>> problem:  http://pastebin.com/uy8S7hps
>>>
>>> As this log is rolling on, our OSD has still not been marked up and is
>>> occupying 100% of a CPU core.  I've done this a couple times and in a
>>> matter of some hours it will be marked up and CPU will drop.  If more
>>> kraken OSD on another host are brought up the existing kraken OSD go
>>> back into max CPU usage again while pg recover.  The trend scales
>>> upward as OSD are started until the system is completely saturated.
>>>
>>> I was reading the docs on async messenger settings at
>>> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
>>> that under 'ms async max op threads' there is a note about one or more
>>> CPUs constantly on 100% load.  As an experiment I set max op threads
>>> to 20 and that is the setting during the period of the pasted log.  It
>>> seems to make no difference.
>>>
>>> Appreciate any thoughts on troubleshooting this.  For the time being
>>> I've aborted our kraken update and will probably re-initialize any
>>> already updated OSD to revert to Jewel except perhaps one host to
>>> continue testing.
>>
>> Ah, that log looks like you're just generating OSDMaps so quickly that
>> rebooting 60 at a time leaves you with a ludicrous number to churn
>> through, and that takes a while. It would have been exacerbated by
>> having 60 daemons fight for the CPU to process them, leading to
>> flapping.
>>
>> You might try restarting daemons sequentially on the node instead of
>> all at once. Depending on your needs it would be even cheaper if you
>> set the nodown flag, though obviously that will impede IO while it
>> happens.
>>
>> I'd be concerned that this demonstrates you don't have enough CPU
>> power per daemon, though.
>> -Greg
>>
>>>
>>> thanks,
>>> Ben
>>>
>>> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
 On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
 wrote:
> Hi all,
>
> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
> mons to Kraken.
>
> After updating ceph packages I restarted the 60 OSD on the box with
> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
> load flat-lines at 100% with top showing all of that being system load
> from ceph-osd processes.  Not long after we get OSD flapping due to
> the load on the system (noout was set to start this, but perhaps
> too-quickly unset post restart).
>
> This is causing problems in the cluster, and we reboot the box.  The
> OSD don't start up/mount automatically - not a new problem on this
> setup.  We run 'ceph-disk activate $disk' on a list of all the
> /dev/dm-X devices as output by ceph-disk list.  Everything activates
> and the CPU gradually climbs to once again be a solid 100%.  No OSD
> have joined cluster so it isn't causing issues.

Re: [ceph-users] Random Health_warn

2017-02-23 Thread Scottix

Ya the ceph-mon.$ID.log

I was running ceph -w when one of them occurred too and it never output
anything.

Here is a snippet for the the 5:11AM occurrence.

On Thu, Feb 23, 2017 at 1:56 PM Robin H. Johnson  wrote:

> On Thu, Feb 23, 2017 at 09:49:21PM +, Scottix wrote:
> > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> >
> > We are seeing a weird behavior or not sure how to diagnose what could be
> > going on. We started monitoring the overall_status from the json query
> and
> > every once in a while we would get a HEALTH_WARN for a minute or two.
> >
> > Monitoring logs.
> > 02/23/2017 07:25:54 AM HEALTH_OK
> > 02/23/2017 07:24:54 AM HEALTH_WARN
> > 02/23/2017 07:23:55 AM HEALTH_OK
> > 02/23/2017 07:22:54 AM HEALTH_OK
> > ...
> > 02/23/2017 05:13:55 AM HEALTH_OK
> > 02/23/2017 05:12:54 AM HEALTH_WARN
> > 02/23/2017 05:11:54 AM HEALTH_WARN
> > 02/23/2017 05:10:54 AM HEALTH_OK
> > 02/23/2017 05:09:54 AM HEALTH_OK
> >
> > When I check the mon leader logs there is no indication of an error or
> > issues that could be occuring. Is there a way to find what is causing the
> > HEALTH_WARN?
> By leader logs, do you mean the cluster log (mon_cluster_log_file), or
> the mon log (log_file)? Eg /var/log/ceph/ceph.log vs
> /var/log/ceph/ceph-mon.$ID.log.
>
> Could you post the log entries for a time period between two HEALTH_OK
> states with a HEALTH_WARN in the middle?
>
> The reason for WARN _should_ be included on the logged status line.
>
> Alternatively, you should be able to just log the output of 'ceph -w'
> for a while, and find the WARN status as well.
>
> --
> Robin Hugh Johnson
> Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
> E-Mail   : robb...@gentoo.org
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
2017-02-23 05:10:54.139358 7f5c17894700  0 mon.CephMon200@0(leader) e7 handle_command mon_command({"prefix": "status", "format": "json"} v 0) v1
2017-02-23 05:10:54.139549 7f5c17894700  0 log_channel(audit) log [DBG] : from='client.? 10.10.1.30:0/1031767' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2017-02-23 05:10:54.535319 7f5c1a25c700  0 log_channel(cluster) log [INF] : pgmap v77496604: 5120 pgs: 2 active+clean+scrubbing, 5111 active+clean, 7 active+clean+scrubbing+deep; 58071 GB data, 114 TB used, 113 TB / 227 TB avail; 16681 kB/s rd, 11886 kB/s wr, 705 op/s
2017-02-23 05:10:55.600104 7f5c1a25c700  0 log_channel(cluster) log [INF] : pgmap v77496605: 5120 pgs: 2 active+clean+scrubbing, 5111 active+clean, 7 active+clean+scrubbing+deep; 58071 GB data, 114 TB used, 113 TB / 227 TB avail; 14716 kB/s rd, 6627 kB/s wr, 408 op/s
2017-02-23 05:10:56.170435 7f5c17894700  0 mon.CephMon200@0(leader) e7 handle_command mon_command({"prefix": "status", "format": "json"} v 0) v1
2017-02-23 05:10:56.170502 7f5c17894700  0 log_channel(audit) log [DBG] : from='client.? 10.10.1.30:0/1031899' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2017-02-23 05:10:56.642040 7f5c1a25c700  0 log_channel(cluster) log [INF] : pgmap v77496606: 5120 pgs: 2 active+clean+scrubbing, 5111 active+clean, 7 active+clean+scrubbing+deep; 58071 GB data, 114 TB used, 113 TB / 227 TB avail; 14617 kB/s rd, 6580 kB/s wr, 537 op/s
2017-02-23 05:10:57.667496 7f5c1a25c700  0 log_channel(cluster) log [INF] : pgmap v77496607: 5120 pgs: 2 active+clean+scrubbing, 5110 active+clean, 8 active+clean+scrubbing+deep; 58071 GB data, 114 TB used, 113 TB / 227 TB avail; 8862 kB/s rd, 7126 kB/s wr, 552 op/s
2017-02-23 05:10:58.736114 7f5c1a25c700  0 log_channel(cluster) log [INF] : pgmap v77496608: 5120 pgs: 2 active+clean+scrubbing, 5110 active+clean, 8 active+clean+scrubbing+deep; 58071 GB data, 114 TB used, 113 TB / 227 TB avail; 14126 kB/s rd, 11254 kB/s wr, 974 op/s
2017-02-23 05:10:59.451884 7f5c17894700  0 mon.CephMon200@0(leader) e7 handle_command mon_command({"prefix": "status", "format": "json"} v 0) v1
2017-02-23 05:10:59.451903 7f5c17894700  0 log_channel(audit) log [DBG] : from='client.? 10.10.1.30:0/1031932' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
2017-02-23 05:10:59.812909 7f5c1a25c700  0 log_channel(cluster) log [INF] : pgmap v77496609: 5120 pgs: 2 active+clean+scrubbing, 5110 active+clean, 8 active+clean+scrubbing+deep; 58071 GB data, 114 TB used, 113 TB / 227 TB avail; 11238 kB/s rd, 8236 kB/s wr, 785 op/s
2017-02-23 05:11:00.829329 7f5c1a25c700  0 log_channel(cluster) log [INF] : pgmap v77496610: 5120 pgs: 2 active+clean+scrubbing, 5110 active+clean, 8 active+clean+scrubbing+deep; 58071 GB data, 114 TB used, 113 TB / 227 TB avail; 6193 kB/s rd, 7345 kB/s wr, 186 op/s
2017-02-23 05:11:01.850120 7f5c1a25c700  0 log_channel(cluster) log [INF] : pgmap v77496611: 5120 pgs: 2 active+clean+scru

Re: [ceph-users] Random Health_warn

2017-02-23 Thread John Spray

On Thu, Feb 23, 2017 at 9:49 PM, Scottix  wrote:
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
> We are seeing a weird behavior or not sure how to diagnose what could be
> going on. We started monitoring the overall_status from the json query and
> every once in a while we would get a HEALTH_WARN for a minute or two.
>
> Monitoring logs.
> 02/23/2017 07:25:54 AM HEALTH_OK
> 02/23/2017 07:24:54 AM HEALTH_WARN
> 02/23/2017 07:23:55 AM HEALTH_OK
> 02/23/2017 07:22:54 AM HEALTH_OK
> ...
> 02/23/2017 05:13:55 AM HEALTH_OK
> 02/23/2017 05:12:54 AM HEALTH_WARN
> 02/23/2017 05:11:54 AM HEALTH_WARN
> 02/23/2017 05:10:54 AM HEALTH_OK
> 02/23/2017 05:09:54 AM HEALTH_OK
>
> When I check the mon leader logs there is no indication of an error or
> issues that could be occuring. Is there a way to find what is causing the
> HEALTH_WARN?

Possibly not without grabbing more than just the overall status at the
same time as you're grabbing the OK/WARN status.

Internally, the OK/WARN/ERROR health state is generated on-demand by
applying a bunch of checks to the state of the system when the user
runs the health command -- the system doesn't know it's in a warning
state until it's asked.  Often you will see a corresponding log
message, but not necessarily.

John

> Best,
> Scott
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Random Health_warn

2017-02-23 Thread Robin H. Johnson

On Thu, Feb 23, 2017 at 10:40:31PM +, Scottix wrote:
> Ya the ceph-mon.$ID.log
> 
> I was running ceph -w when one of them occurred too and it never output
> anything.
> 
> Here is a snippet for the the 5:11AM occurrence.
Yep, I don't see anything in there that should have triggered
HEALTH_WARN.

All I can suggest is dumping the JSON health blob when it occurs again,
and seeing if anything stands out in it.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Random Health_warn

2017-02-23 Thread David Turner

There are multiple approaches to give you more information about the Health 
state.  CLI has these 2 options:
ceph health detail
ceph status

I also like using ceph-dash.  ( https://github.com/Crapworks/ceph-dash )  It 
has an associated nagios check to scrape the ceph-dash page.

I personally do `watch ceph status` when I'm monitoring the cluster closely.  
It will show you things like blocked requests, osds flapping, mon clock skew, 
or whatever your problem is causing the health_warn state.  The most likely 
cause for health_warn off and on is blocked requests.  Those are caused by any 
number of things that you would need to diagnose further if that is what is 
causing the health_warn state.

[cid:image52c4b1.JPG@3ecb414b.49abf25e]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of John Spray 
[jsp...@redhat.com]
Sent: Thursday, February 23, 2017 3:47 PM
To: Scottix
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Random Health_warn

On Thu, Feb 23, 2017 at 9:49 PM, Scottix  wrote:
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
> We are seeing a weird behavior or not sure how to diagnose what could be
> going on. We started monitoring the overall_status from the json query and
> every once in a while we would get a HEALTH_WARN for a minute or two.
>
> Monitoring logs.
> 02/23/2017 07:25:54 AM HEALTH_OK
> 02/23/2017 07:24:54 AM HEALTH_WARN
> 02/23/2017 07:23:55 AM HEALTH_OK
> 02/23/2017 07:22:54 AM HEALTH_OK
> ...
> 02/23/2017 05:13:55 AM HEALTH_OK
> 02/23/2017 05:12:54 AM HEALTH_WARN
> 02/23/2017 05:11:54 AM HEALTH_WARN
> 02/23/2017 05:10:54 AM HEALTH_OK
> 02/23/2017 05:09:54 AM HEALTH_OK
>
> When I check the mon leader logs there is no indication of an error or
> issues that could be occuring. Is there a way to find what is causing the
> HEALTH_WARN?

Possibly not without grabbing more than just the overall status at the
same time as you're grabbing the OK/WARN status.

Internally, the OK/WARN/ERROR health state is generated on-demand by
applying a bunch of checks to the state of the system when the user
runs the health command -- the system doesn't know it's in a warning
state until it's asked.  Often you will see a corresponding log
message, but not necessarily.

John

> Best,
> Scott
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Random Health_warn

2017-02-23 Thread Scottix

That sounds about right, I do see blocked requests sometimes when it is
under really heavy load.

Looking at some examples I think summary should list the issues.
"summary": [],
"overall_status": "HEALTH_OK",

I'll try logging that too.

Scott

On Thu, Feb 23, 2017 at 3:00 PM David Turner 
wrote:

> There are multiple approaches to give you more information about the
> Health state.  CLI has these 2 options:
> ceph health detail
> ceph status
>
> I also like using ceph-dash.  ( https://github.com/Crapworks/ceph-dash )
>  It has an associated nagios check to scrape the ceph-dash page.
>
> I personally do `watch ceph status` when I'm monitoring the cluster
> closely.  It will show you things like blocked requests, osds flapping, mon
> clock skew, or whatever your problem is causing the health_warn state.  The
> most likely cause for health_warn off and on is blocked requests.  Those
> are caused by any number of things that you would need to diagnose further
> if that is what is causing the health_warn state.
>
> --
>
>  David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> --
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
> --
>
> 
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of John
> Spray [jsp...@redhat.com]
> Sent: Thursday, February 23, 2017 3:47 PM
> To: Scottix
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Random Health_warn
>
>
> On Thu, Feb 23, 2017 at 9:49 PM, Scottix  wrote:
> > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> >
> > We are seeing a weird behavior or not sure how to diagnose what could be
> > going on. We started monitoring the overall_status from the json query
> and
> > every once in a while we would get a HEALTH_WARN for a minute or two.
> >
> > Monitoring logs.
> > 02/23/2017 07:25:54 AM HEALTH_OK
> > 02/23/2017 07:24:54 AM HEALTH_WARN
> > 02/23/2017 07:23:55 AM HEALTH_OK
> > 02/23/2017 07:22:54 AM HEALTH_OK
> > ...
> > 02/23/2017 05:13:55 AM HEALTH_OK
> > 02/23/2017 05:12:54 AM HEALTH_WARN
> > 02/23/2017 05:11:54 AM HEALTH_WARN
> > 02/23/2017 05:10:54 AM HEALTH_OK
> > 02/23/2017 05:09:54 AM HEALTH_OK
> >
> > When I check the mon leader logs there is no indication of an error or
> > issues that could be occuring. Is there a way to find what is causing the
> > HEALTH_WARN?
>
> Possibly not without grabbing more than just the overall status at the
> same time as you're grabbing the OK/WARN status.
>
> Internally, the OK/WARN/ERROR health state is generated on-demand by
> applying a bunch of checks to the state of the system when the user
> runs the health command -- the system doesn't know it's in a warning
> state until it's asked.  Often you will see a corresponding log
> message, but not necessarily.
>
> John
>
> > Best,
> > Scott
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard

On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ  wrote:
> So I updated suse leap, and now I'm getting the following error from
> ceph.  I know I need to disable some features, but I'm not sure what
> they are..  Looks like 14, 57, and 59, but I can't figure out what
> they correspond to, nor therefore, how to turn them off.
>
> libceph: mon0 10.0.0.67:6789 feature set mismatch, my 40106b84a842a42
> < server's e0106b84a846a42, missing a004000

http://cpp.sh/2rfy says...

Bit 14 set
Bit 57 set
Bit 59 set

Comparing this to
https://github.com/ceph/ceph/blob/master/src/include/ceph_features.h
shows...

DEFINE_CEPH_FEATURE(14, 2, SERVER_KRAKEN)
DEFINE_CEPH_FEATURE(57, 1, MON_STATEFUL_SUB)
DEFINE_CEPH_FEATURE(57, 1, MON_ROUTE_OSDMAP) // overlap
DEFINE_CEPH_FEATURE(57, 1, OSDSUBOP_NO_SNAPCONTEXT) // overlap
DEFINE_CEPH_FEATURE(57, 1, SERVER_JEWEL) // overlap
DEFINE_CEPH_FEATURE(59, 1, FS_BTIME)
DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap
DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap

$ echo "obase=16;ibase=16;$(echo e0106b84a846a42-a004000|tr
'[a-z]' '[A-Z]')"|bc -qi
obase=16;ibase=16;E0106B84A846A42-A004000
40106B84A842A42

So "me" (the client kernel) does not have the above features that are
present on the servers.

Can you post the output of "ceph osd crush show-tunables"?

>
> SuSE Leap 42.2 is Up to date as of tonight, no package updates available.
> All the ceph packages have the following version:
>
> 11.1.0+git.1486588482.ba197ae-72.1
>
> And the kernel has version:
>
> 4.4.49-16.1
>
> It was working perfectly before the upgrade.
>
> Thank you very much
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ

aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 1,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "jewel",
"optimal_tunables": 1,
"legacy_tunables": 0,
"minimum_required_version": "jewel",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 1,
"has_v5_rules": 0
}

On Thu, Feb 23, 2017 at 4:45 PM, Brad Hubbard  wrote:
> On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ  wrote:
>> So I updated suse leap, and now I'm getting the following error from
>> ceph.  I know I need to disable some features, but I'm not sure what
>> they are..  Looks like 14, 57, and 59, but I can't figure out what
>> they correspond to, nor therefore, how to turn them off.
>>
>> libceph: mon0 10.0.0.67:6789 feature set mismatch, my 40106b84a842a42
>> < server's e0106b84a846a42, missing a004000
>
> http://cpp.sh/2rfy says...
>
> Bit 14 set
> Bit 57 set
> Bit 59 set
>
> Comparing this to
> https://github.com/ceph/ceph/blob/master/src/include/ceph_features.h
> shows...
>
> DEFINE_CEPH_FEATURE(14, 2, SERVER_KRAKEN)
> DEFINE_CEPH_FEATURE(57, 1, MON_STATEFUL_SUB)
> DEFINE_CEPH_FEATURE(57, 1, MON_ROUTE_OSDMAP) // overlap
> DEFINE_CEPH_FEATURE(57, 1, OSDSUBOP_NO_SNAPCONTEXT) // overlap
> DEFINE_CEPH_FEATURE(57, 1, SERVER_JEWEL) // overlap
> DEFINE_CEPH_FEATURE(59, 1, FS_BTIME)
> DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap
> DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap
>
> $ echo "obase=16;ibase=16;$(echo e0106b84a846a42-a004000|tr
> '[a-z]' '[A-Z]')"|bc -qi
> obase=16;ibase=16;E0106B84A846A42-A004000
> 40106B84A842A42
>
> So "me" (the client kernel) does not have the above features that are
> present on the servers.
>
> Can you post the output of "ceph osd crush show-tunables"?
>
>>
>> SuSE Leap 42.2 is Up to date as of tonight, no package updates available.
>> All the ceph packages have the following version:
>>
>> 11.1.0+git.1486588482.ba197ae-72.1
>>
>> And the kernel has version:
>>
>> 4.4.49-16.1
>>
>> It was working perfectly before the upgrade.
>>
>> Thank you very much
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Cheers,
> Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard

On Fri, Feb 24, 2017 at 11:00 AM, Schlacta, Christ  wrote:
> aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables
> {
> "choose_local_tries": 0,
> "choose_local_fallback_tries": 0,
> "choose_total_tries": 50,
> "chooseleaf_descend_once": 1,
> "chooseleaf_vary_r": 1,
> "chooseleaf_stable": 1,
> "straw_calc_version": 1,
> "allowed_bucket_algs": 54,
> "profile": "jewel",
> "optimal_tunables": 1,
> "legacy_tunables": 0,
> "minimum_required_version": "jewel",
> "require_feature_tunables": 1,
> "require_feature_tunables2": 1,
> "has_v2_rules": 0,
> "require_feature_tunables3": 1,
> "has_v3_rules": 0,
> "has_v4_buckets": 0,
> "require_feature_tunables5": 1,

I suspect setting the above to 0 would resolve the issue with the
client but there may be a reason why this is set?

Where did those packages come from?

> "has_v5_rules": 0
> }
>
> On Thu, Feb 23, 2017 at 4:45 PM, Brad Hubbard  wrote:
>> On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ  
>> wrote:
>>> So I updated suse leap, and now I'm getting the following error from
>>> ceph.  I know I need to disable some features, but I'm not sure what
>>> they are..  Looks like 14, 57, and 59, but I can't figure out what
>>> they correspond to, nor therefore, how to turn them off.
>>>
>>> libceph: mon0 10.0.0.67:6789 feature set mismatch, my 40106b84a842a42
>>> < server's e0106b84a846a42, missing a004000
>>
>> http://cpp.sh/2rfy says...
>>
>> Bit 14 set
>> Bit 57 set
>> Bit 59 set
>>
>> Comparing this to
>> https://github.com/ceph/ceph/blob/master/src/include/ceph_features.h
>> shows...
>>
>> DEFINE_CEPH_FEATURE(14, 2, SERVER_KRAKEN)
>> DEFINE_CEPH_FEATURE(57, 1, MON_STATEFUL_SUB)
>> DEFINE_CEPH_FEATURE(57, 1, MON_ROUTE_OSDMAP) // overlap
>> DEFINE_CEPH_FEATURE(57, 1, OSDSUBOP_NO_SNAPCONTEXT) // overlap
>> DEFINE_CEPH_FEATURE(57, 1, SERVER_JEWEL) // overlap
>> DEFINE_CEPH_FEATURE(59, 1, FS_BTIME)
>> DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap
>> DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap
>>
>> $ echo "obase=16;ibase=16;$(echo e0106b84a846a42-a004000|tr
>> '[a-z]' '[A-Z]')"|bc -qi
>> obase=16;ibase=16;E0106B84A846A42-A004000
>> 40106B84A842A42
>>
>> So "me" (the client kernel) does not have the above features that are
>> present on the servers.
>>
>> Can you post the output of "ceph osd crush show-tunables"?
>>
>>>
>>> SuSE Leap 42.2 is Up to date as of tonight, no package updates available.
>>> All the ceph packages have the following version:
>>>
>>> 11.1.0+git.1486588482.ba197ae-72.1
>>>
>>> And the kernel has version:
>>>
>>> 4.4.49-16.1
>>>
>>> It was working perfectly before the upgrade.
>>>
>>> Thank you very much
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Cheers,
>> Brad



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ

-- Forwarded message --
From: Schlacta, Christ 
Date: Thu, Feb 23, 2017 at 5:56 PM
Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
To: Brad Hubbard 


They're from the suse leap ceph team.  They maintain ceph, and build
up to date versions for suse leap.  What I don't know is how to
disable it.  When I try, I get the following mess:

aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush set-tunable
require_feature_tunables5 0
Invalid command:  require_feature_tunables5 not in straw_calc_version
osd crush set-tunable straw_calc_version  :  set crush tunable
 to 
Error EINVAL: invalid command

On Thu, Feb 23, 2017 at 5:54 PM, Brad Hubbard  wrote:
> On Fri, Feb 24, 2017 at 11:00 AM, Schlacta, Christ  
> wrote:
>> aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables
>> {
>> "choose_local_tries": 0,
>> "choose_local_fallback_tries": 0,
>> "choose_total_tries": 50,
>> "chooseleaf_descend_once": 1,
>> "chooseleaf_vary_r": 1,
>> "chooseleaf_stable": 1,
>> "straw_calc_version": 1,
>> "allowed_bucket_algs": 54,
>> "profile": "jewel",
>> "optimal_tunables": 1,
>> "legacy_tunables": 0,
>> "minimum_required_version": "jewel",
>> "require_feature_tunables": 1,
>> "require_feature_tunables2": 1,
>> "has_v2_rules": 0,
>> "require_feature_tunables3": 1,
>> "has_v3_rules": 0,
>> "has_v4_buckets": 0,
>> "require_feature_tunables5": 1,
>
> I suspect setting the above to 0 would resolve the issue with the
> client but there may be a reason why this is set?
>
> Where did those packages come from?
>
>> "has_v5_rules": 0
>> }
>>
>> On Thu, Feb 23, 2017 at 4:45 PM, Brad Hubbard  wrote:
>>> On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ  
>>> wrote:
 So I updated suse leap, and now I'm getting the following error from
 ceph.  I know I need to disable some features, but I'm not sure what
 they are..  Looks like 14, 57, and 59, but I can't figure out what
 they correspond to, nor therefore, how to turn them off.

 libceph: mon0 10.0.0.67:6789 feature set mismatch, my 40106b84a842a42
 < server's e0106b84a846a42, missing a004000
>>>
>>> http://cpp.sh/2rfy says...
>>>
>>> Bit 14 set
>>> Bit 57 set
>>> Bit 59 set
>>>
>>> Comparing this to
>>> https://github.com/ceph/ceph/blob/master/src/include/ceph_features.h
>>> shows...
>>>
>>> DEFINE_CEPH_FEATURE(14, 2, SERVER_KRAKEN)
>>> DEFINE_CEPH_FEATURE(57, 1, MON_STATEFUL_SUB)
>>> DEFINE_CEPH_FEATURE(57, 1, MON_ROUTE_OSDMAP) // overlap
>>> DEFINE_CEPH_FEATURE(57, 1, OSDSUBOP_NO_SNAPCONTEXT) // overlap
>>> DEFINE_CEPH_FEATURE(57, 1, SERVER_JEWEL) // overlap
>>> DEFINE_CEPH_FEATURE(59, 1, FS_BTIME)
>>> DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap
>>> DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap
>>>
>>> $ echo "obase=16;ibase=16;$(echo e0106b84a846a42-a004000|tr
>>> '[a-z]' '[A-Z]')"|bc -qi
>>> obase=16;ibase=16;E0106B84A846A42-A004000
>>> 40106B84A842A42
>>>
>>> So "me" (the client kernel) does not have the above features that are
>>> present on the servers.
>>>
>>> Can you post the output of "ceph osd crush show-tunables"?
>>>

 SuSE Leap 42.2 is Up to date as of tonight, no package updates available.
 All the ceph packages have the following version:

 11.1.0+git.1486588482.ba197ae-72.1

 And the kernel has version:

 4.4.49-16.1

 It was working perfectly before the upgrade.

 Thank you very much
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Brad
>
>
>
> --
> Cheers,
> Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ

-- Forwarded message --
From: Schlacta, Christ 
Date: Thu, Feb 23, 2017 at 6:06 PM
Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
To: Brad Hubbard 

So setting the above to 0 by sheer brute force didn't work, so it's
not crush or osd problem..  also, the errors still say mon0, so I
suspect it's related to communication between libceph in kernel and
the mon.

aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush tunables hammer
adjusted tunables profile to hammer
aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "hammer",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "firefly",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}

aarcane@densetsu:/etc/target$ sudo rbd --cluster rk map rt1
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (110) Connection timed out
aarcane@densetsu:~$ dmesg | tail
[10118.778868] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
40106b84a842a52 < server's e0106b84a846a52, missing a004000
[10118.779597] libceph: mon0 10.0.0.67:6789 missing required protocol features
[10119.834634] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
40106b84a842a52 < server's e0106b84a846a52, missing a004000
[10119.835174] libceph: mon0 10.0.0.67:6789 missing required protocol features
[10120.762983] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
40106b84a842a52 < server's e0106b84a846a52, missing a004000
[10120.763707] libceph: mon0 10.0.0.67:6789 missing required protocol features
[10121.787128] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
40106b84a842a52 < server's e0106b84a846a52, missing a004000
[10121.787847] libceph: mon0 10.0.0.67:6789 missing required protocol features
[10122.97] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
40106b84a842a52 < server's e0106b84a846a52, missing a004000
[10122.911872] libceph: mon0 10.0.0.67:6789 missing required protocol features
aarcane@densetsu:~$

On Thu, Feb 23, 2017 at 5:56 PM, Schlacta, Christ  wrote:
> They're from the suse leap ceph team.  They maintain ceph, and build
> up to date versions for suse leap.  What I don't know is how to
> disable it.  When I try, I get the following mess:
>
> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush set-tunable
> require_feature_tunables5 0
> Invalid command:  require_feature_tunables5 not in straw_calc_version
> osd crush set-tunable straw_calc_version  :  set crush tunable
>  to 
> Error EINVAL: invalid command
>
> On Thu, Feb 23, 2017 at 5:54 PM, Brad Hubbard  wrote:
>> On Fri, Feb 24, 2017 at 11:00 AM, Schlacta, Christ  
>> wrote:
>>> aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables
>>> {
>>> "choose_local_tries": 0,
>>> "choose_local_fallback_tries": 0,
>>> "choose_total_tries": 50,
>>> "chooseleaf_descend_once": 1,
>>> "chooseleaf_vary_r": 1,
>>> "chooseleaf_stable": 1,
>>> "straw_calc_version": 1,
>>> "allowed_bucket_algs": 54,
>>> "profile": "jewel",
>>> "optimal_tunables": 1,
>>> "legacy_tunables": 0,
>>> "minimum_required_version": "jewel",
>>> "require_feature_tunables": 1,
>>> "require_feature_tunables2": 1,
>>> "has_v2_rules": 0,
>>> "require_feature_tunables3": 1,
>>> "has_v3_rules": 0,
>>> "has_v4_buckets": 0,
>>> "require_feature_tunables5": 1,
>>
>> I suspect setting the above to 0 would resolve the issue with the
>> client but there may be a reason why this is set?
>>
>> Where did those packages come from?
>>
>>> "has_v5_rules": 0
>>> }
>>>
>>> On Thu, Feb 23, 2017 at 4:45 PM, Brad Hubbard  wrote:
 On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ  
 wrote:
> So I updated suse leap, and now I'm getting the following error from
> ceph.  I know I need to disable some features, but I'm not sure what
> they are..  Looks like 14, 57, and 59, but I can't figure out what
> they correspond to, nor therefore, how to turn them off.
>
> libceph: mon0 10.0.0.67:6789 feature set mismatch, my 40106b84a842a42
> < server's e0106b84a846a42, missing a004000

 http://cpp.sh/2rfy says...

 Bit 14 set
 Bit 57 set
 Bit 59 set

 Comparing this to
 https://github.com/ceph/ceph/blob/master/src/include/ceph_features.h
 shows...

 DEFINE_CEPH_FEATURE(14, 2, SERVER_KRAKEN)
 DEFINE_CEPH_FEATURE(57, 1, MON_STATEFUL_SUB)
 DEFINE_CEPH_FEATUR

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard

Is your change reflected in the current crushmap?

On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ  wrote:
> -- Forwarded message --
> From: Schlacta, Christ 
> Date: Thu, Feb 23, 2017 at 6:06 PM
> Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
> To: Brad Hubbard 
>
>
> So setting the above to 0 by sheer brute force didn't work, so it's
> not crush or osd problem..  also, the errors still say mon0, so I
> suspect it's related to communication between libceph in kernel and
> the mon.
>
> aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush tunables hammer
> adjusted tunables profile to hammer
> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush show-tunables
> {
> "choose_local_tries": 0,
> "choose_local_fallback_tries": 0,
> "choose_total_tries": 50,
> "chooseleaf_descend_once": 1,
> "chooseleaf_vary_r": 1,
> "chooseleaf_stable": 0,
> "straw_calc_version": 1,
> "allowed_bucket_algs": 54,
> "profile": "hammer",
> "optimal_tunables": 0,
> "legacy_tunables": 0,
> "minimum_required_version": "firefly",
> "require_feature_tunables": 1,
> "require_feature_tunables2": 1,
> "has_v2_rules": 0,
> "require_feature_tunables3": 1,
> "has_v3_rules": 0,
> "has_v4_buckets": 0,
> "require_feature_tunables5": 0,
> "has_v5_rules": 0
> }
>
> aarcane@densetsu:/etc/target$ sudo rbd --cluster rk map rt1
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail" or so.
> rbd: map failed: (110) Connection timed out
> aarcane@densetsu:~$ dmesg | tail
> [10118.778868] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
> [10118.779597] libceph: mon0 10.0.0.67:6789 missing required protocol features
> [10119.834634] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
> [10119.835174] libceph: mon0 10.0.0.67:6789 missing required protocol features
> [10120.762983] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
> [10120.763707] libceph: mon0 10.0.0.67:6789 missing required protocol features
> [10121.787128] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
> [10121.787847] libceph: mon0 10.0.0.67:6789 missing required protocol features
> [10122.97] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
> [10122.911872] libceph: mon0 10.0.0.67:6789 missing required protocol features
> aarcane@densetsu:~$
>
>
> On Thu, Feb 23, 2017 at 5:56 PM, Schlacta, Christ  wrote:
>> They're from the suse leap ceph team.  They maintain ceph, and build
>> up to date versions for suse leap.  What I don't know is how to
>> disable it.  When I try, I get the following mess:
>>
>> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush set-tunable
>> require_feature_tunables5 0
>> Invalid command:  require_feature_tunables5 not in straw_calc_version
>> osd crush set-tunable straw_calc_version  :  set crush tunable
>>  to 
>> Error EINVAL: invalid command
>>
>> On Thu, Feb 23, 2017 at 5:54 PM, Brad Hubbard  wrote:
>>> On Fri, Feb 24, 2017 at 11:00 AM, Schlacta, Christ  
>>> wrote:
 aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables
 {
 "choose_local_tries": 0,
 "choose_local_fallback_tries": 0,
 "choose_total_tries": 50,
 "chooseleaf_descend_once": 1,
 "chooseleaf_vary_r": 1,
 "chooseleaf_stable": 1,
 "straw_calc_version": 1,
 "allowed_bucket_algs": 54,
 "profile": "jewel",
 "optimal_tunables": 1,
 "legacy_tunables": 0,
 "minimum_required_version": "jewel",
 "require_feature_tunables": 1,
 "require_feature_tunables2": 1,
 "has_v2_rules": 0,
 "require_feature_tunables3": 1,
 "has_v3_rules": 0,
 "has_v4_buckets": 0,
 "require_feature_tunables5": 1,
>>>
>>> I suspect setting the above to 0 would resolve the issue with the
>>> client but there may be a reason why this is set?
>>>
>>> Where did those packages come from?
>>>
 "has_v5_rules": 0
 }

 On Thu, Feb 23, 2017 at 4:45 PM, Brad Hubbard  wrote:
> On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ  
> wrote:
>> So I updated suse leap, and now I'm getting the following error from
>> ceph.  I know I need to disable some features, but I'm not sure what
>> they are..  Looks like 14, 57, and 59, but I can't figure out what
>> they correspond to, nor therefore, how to turn them off.
>>
>> libceph: mon0 10.0.0.67:6789 feature set mismatch, my 40106b84a842a42
>> < server's e0106b84a846a42, missing a004000
>
> http://cpp.sh/2rfy says...
>

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ

insofar as I can tell, yes.  Everything indicates that they are in effect.

On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard  wrote:
> Is your change reflected in the current crushmap?
>
> On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ  
> wrote:
>> -- Forwarded message --
>> From: Schlacta, Christ 
>> Date: Thu, Feb 23, 2017 at 6:06 PM
>> Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
>> To: Brad Hubbard 
>>
>>
>> So setting the above to 0 by sheer brute force didn't work, so it's
>> not crush or osd problem..  also, the errors still say mon0, so I
>> suspect it's related to communication between libceph in kernel and
>> the mon.
>>
>> aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush tunables 
>> hammer
>> adjusted tunables profile to hammer
>> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush show-tunables
>> {
>> "choose_local_tries": 0,
>> "choose_local_fallback_tries": 0,
>> "choose_total_tries": 50,
>> "chooseleaf_descend_once": 1,
>> "chooseleaf_vary_r": 1,
>> "chooseleaf_stable": 0,
>> "straw_calc_version": 1,
>> "allowed_bucket_algs": 54,
>> "profile": "hammer",
>> "optimal_tunables": 0,
>> "legacy_tunables": 0,
>> "minimum_required_version": "firefly",
>> "require_feature_tunables": 1,
>> "require_feature_tunables2": 1,
>> "has_v2_rules": 0,
>> "require_feature_tunables3": 1,
>> "has_v3_rules": 0,
>> "has_v4_buckets": 0,
>> "require_feature_tunables5": 0,
>> "has_v5_rules": 0
>> }
>>
>> aarcane@densetsu:/etc/target$ sudo rbd --cluster rk map rt1
>> rbd: sysfs write failed
>> In some cases useful info is found in syslog - try "dmesg | tail" or so.
>> rbd: map failed: (110) Connection timed out
>> aarcane@densetsu:~$ dmesg | tail
>> [10118.778868] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>> [10118.779597] libceph: mon0 10.0.0.67:6789 missing required protocol 
>> features
>> [10119.834634] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>> [10119.835174] libceph: mon0 10.0.0.67:6789 missing required protocol 
>> features
>> [10120.762983] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>> [10120.763707] libceph: mon0 10.0.0.67:6789 missing required protocol 
>> features
>> [10121.787128] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>> [10121.787847] libceph: mon0 10.0.0.67:6789 missing required protocol 
>> features
>> [10122.97] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>> [10122.911872] libceph: mon0 10.0.0.67:6789 missing required protocol 
>> features
>> aarcane@densetsu:~$
>>
>>
>> On Thu, Feb 23, 2017 at 5:56 PM, Schlacta, Christ  
>> wrote:
>>> They're from the suse leap ceph team.  They maintain ceph, and build
>>> up to date versions for suse leap.  What I don't know is how to
>>> disable it.  When I try, I get the following mess:
>>>
>>> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush set-tunable
>>> require_feature_tunables5 0
>>> Invalid command:  require_feature_tunables5 not in straw_calc_version
>>> osd crush set-tunable straw_calc_version  :  set crush tunable
>>>  to 
>>> Error EINVAL: invalid command
>>>
>>> On Thu, Feb 23, 2017 at 5:54 PM, Brad Hubbard  wrote:
 On Fri, Feb 24, 2017 at 11:00 AM, Schlacta, Christ  
 wrote:
> aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables
> {
> "choose_local_tries": 0,
> "choose_local_fallback_tries": 0,
> "choose_total_tries": 50,
> "chooseleaf_descend_once": 1,
> "chooseleaf_vary_r": 1,
> "chooseleaf_stable": 1,
> "straw_calc_version": 1,
> "allowed_bucket_algs": 54,
> "profile": "jewel",
> "optimal_tunables": 1,
> "legacy_tunables": 0,
> "minimum_required_version": "jewel",
> "require_feature_tunables": 1,
> "require_feature_tunables2": 1,
> "has_v2_rules": 0,
> "require_feature_tunables3": 1,
> "has_v3_rules": 0,
> "has_v4_buckets": 0,
> "require_feature_tunables5": 1,

 I suspect setting the above to 0 would resolve the issue with the
 client but there may be a reason why this is set?

 Where did those packages come from?

> "has_v5_rules": 0
> }
>
> On Thu, Feb 23, 2017 at 4:45 PM, Brad Hubbard  wrote:
>> On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ  
>> wrote:
>>> So I updated suse leap, and now I'm getting the following error from
>>> ceph.  I know I need to disable some features, but I'm not sure what
>>> they are..  Looks like 14, 57, and 59, but I

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard

Did you dump out the crushmap and look?

On Fri, Feb 24, 2017 at 1:36 PM, Schlacta, Christ  wrote:
> insofar as I can tell, yes.  Everything indicates that they are in effect.
>
> On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard  wrote:
>> Is your change reflected in the current crushmap?
>>
>> On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ  
>> wrote:
>>> -- Forwarded message --
>>> From: Schlacta, Christ 
>>> Date: Thu, Feb 23, 2017 at 6:06 PM
>>> Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
>>> To: Brad Hubbard 
>>>
>>>
>>> So setting the above to 0 by sheer brute force didn't work, so it's
>>> not crush or osd problem..  also, the errors still say mon0, so I
>>> suspect it's related to communication between libceph in kernel and
>>> the mon.
>>>
>>> aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush tunables 
>>> hammer
>>> adjusted tunables profile to hammer
>>> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush show-tunables
>>> {
>>> "choose_local_tries": 0,
>>> "choose_local_fallback_tries": 0,
>>> "choose_total_tries": 50,
>>> "chooseleaf_descend_once": 1,
>>> "chooseleaf_vary_r": 1,
>>> "chooseleaf_stable": 0,
>>> "straw_calc_version": 1,
>>> "allowed_bucket_algs": 54,
>>> "profile": "hammer",
>>> "optimal_tunables": 0,
>>> "legacy_tunables": 0,
>>> "minimum_required_version": "firefly",
>>> "require_feature_tunables": 1,
>>> "require_feature_tunables2": 1,
>>> "has_v2_rules": 0,
>>> "require_feature_tunables3": 1,
>>> "has_v3_rules": 0,
>>> "has_v4_buckets": 0,
>>> "require_feature_tunables5": 0,
>>> "has_v5_rules": 0
>>> }
>>>
>>> aarcane@densetsu:/etc/target$ sudo rbd --cluster rk map rt1
>>> rbd: sysfs write failed
>>> In some cases useful info is found in syslog - try "dmesg | tail" or so.
>>> rbd: map failed: (110) Connection timed out
>>> aarcane@densetsu:~$ dmesg | tail
>>> [10118.778868] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>>> [10118.779597] libceph: mon0 10.0.0.67:6789 missing required protocol 
>>> features
>>> [10119.834634] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>>> [10119.835174] libceph: mon0 10.0.0.67:6789 missing required protocol 
>>> features
>>> [10120.762983] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>>> [10120.763707] libceph: mon0 10.0.0.67:6789 missing required protocol 
>>> features
>>> [10121.787128] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>>> [10121.787847] libceph: mon0 10.0.0.67:6789 missing required protocol 
>>> features
>>> [10122.97] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>>> [10122.911872] libceph: mon0 10.0.0.67:6789 missing required protocol 
>>> features
>>> aarcane@densetsu:~$
>>>
>>>
>>> On Thu, Feb 23, 2017 at 5:56 PM, Schlacta, Christ  
>>> wrote:
 They're from the suse leap ceph team.  They maintain ceph, and build
 up to date versions for suse leap.  What I don't know is how to
 disable it.  When I try, I get the following mess:

 aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush set-tunable
 require_feature_tunables5 0
 Invalid command:  require_feature_tunables5 not in straw_calc_version
 osd crush set-tunable straw_calc_version  :  set crush tunable
  to 
 Error EINVAL: invalid command

 On Thu, Feb 23, 2017 at 5:54 PM, Brad Hubbard  wrote:
> On Fri, Feb 24, 2017 at 11:00 AM, Schlacta, Christ  
> wrote:
>> aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables
>> {
>> "choose_local_tries": 0,
>> "choose_local_fallback_tries": 0,
>> "choose_total_tries": 50,
>> "chooseleaf_descend_once": 1,
>> "chooseleaf_vary_r": 1,
>> "chooseleaf_stable": 1,
>> "straw_calc_version": 1,
>> "allowed_bucket_algs": 54,
>> "profile": "jewel",
>> "optimal_tunables": 1,
>> "legacy_tunables": 0,
>> "minimum_required_version": "jewel",
>> "require_feature_tunables": 1,
>> "require_feature_tunables2": 1,
>> "has_v2_rules": 0,
>> "require_feature_tunables3": 1,
>> "has_v3_rules": 0,
>> "has_v4_buckets": 0,
>> "require_feature_tunables5": 1,
>
> I suspect setting the above to 0 would resolve the issue with the
> client but there may be a reason why this is set?
>
> Where did those packages come from?
>
>> "has_v5_rules": 0
>> }
>>
>> On Thu, Feb 23, 2017 at 4:45 PM, Brad Hubbard  
>> wrote:
>>> On Thu, Feb 23, 2017 at 5:18 PM, Schlacta,

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host densetsu {
id -2   # do not change unnecessarily
# weight 0.293
alg straw
hash 0  # rjenkins1
item osd.0 weight 0.146
item osd.1 weight 0.146
}
host density {
id -3   # do not change unnecessarily
# weight 0.145
alg straw
hash 0  # rjenkins1
item osd.2 weight 0.145
}
root default {
id -1   # do not change unnecessarily
# weight 0.438
alg straw
hash 0  # rjenkins1
item densetsu weight 0.293
item density weight 0.145
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map

On Thu, Feb 23, 2017 at 7:37 PM, Brad Hubbard  wrote:
> Did you dump out the crushmap and look?
>
> On Fri, Feb 24, 2017 at 1:36 PM, Schlacta, Christ  wrote:
>> insofar as I can tell, yes.  Everything indicates that they are in effect.
>>
>> On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard  wrote:
>>> Is your change reflected in the current crushmap?
>>>
>>> On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ  
>>> wrote:
 -- Forwarded message --
 From: Schlacta, Christ 
 Date: Thu, Feb 23, 2017 at 6:06 PM
 Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
 To: Brad Hubbard 

 So setting the above to 0 by sheer brute force didn't work, so it's
 not crush or osd problem..  also, the errors still say mon0, so I
 suspect it's related to communication between libceph in kernel and
 the mon.

 aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush tunables 
 hammer
 adjusted tunables profile to hammer
 aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush show-tunables
 {
 "choose_local_tries": 0,
 "choose_local_fallback_tries": 0,
 "choose_total_tries": 50,
 "chooseleaf_descend_once": 1,
 "chooseleaf_vary_r": 1,
 "chooseleaf_stable": 0,
 "straw_calc_version": 1,
 "allowed_bucket_algs": 54,
 "profile": "hammer",
 "optimal_tunables": 0,
 "legacy_tunables": 0,
 "minimum_required_version": "firefly",
 "require_feature_tunables": 1,
 "require_feature_tunables2": 1,
 "has_v2_rules": 0,
 "require_feature_tunables3": 1,
 "has_v3_rules": 0,
 "has_v4_buckets": 0,
 "require_feature_tunables5": 0,
 "has_v5_rules": 0
 }

 aarcane@densetsu:/etc/target$ sudo rbd --cluster rk map rt1
 rbd: sysfs write failed
 In some cases useful info is found in syslog - try "dmesg | tail" or so.
 rbd: map failed: (110) Connection timed out
 aarcane@densetsu:~$ dmesg | tail
 [10118.778868] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
 40106b84a842a52 < server's e0106b84a846a52, missing a004000
 [10118.779597] libceph: mon0 10.0.0.67:6789 missing required protocol 
 features
 [10119.834634] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
 40106b84a842a52 < server's e0106b84a846a52, missing a004000
 [10119.835174] libceph: mon0 10.0.0.67:6789 missing required protocol 
 features
 [10120.762983] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
 40106b84a842a52 < server's e0106b84a846a52, missing a004000
 [10120.763707] libceph: mon0 10.0.0.67:6789 missing required protocol 
 features
 [10121.787128] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
 40106b84a842a52 < server's e0106b84a846a52, missing a004000
 [10121.787847] libceph: mon0 10.0.0.67:6789 missing required protocol 
 features
 [10122.97] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
 40106b84a842a52 < server's e0106b84a846a52, missing a004000
 [10122.911872] libceph: mon0 10.0.0.67:6789 missing required protocol 
 features
 aarcane@densetsu:~$

 On Thu, Feb 23, 2017 at 5:56 PM, Schlacta, Christ  
 wrote:
> They're from the suse leap ceph team.  They maintain ceph, and build
> up to date versions for suse leap.  What I don't know is how to
> disable it.  When I try, I get the following mess:
>
> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush set-tunable
> require_feature_tunables5 0
> Invalid command:  require_feature_tunables5 not in straw_

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard

Hmm,

What's interesting is the feature set reported by the servers has only
changed from

e0106b84a846a42

Bit 1 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14 set Bit 18
set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set Bit 36 set
Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48 set Bit 57 set Bit
58 set Bit 59 set

to

e0106b84a846a52

Bit 1 set Bit 4 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14
set Bit 18 set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set
Bit 36 set Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48 set Bit
57 set Bit 58 set Bit 59 set

So all it's done is *added* Bit 4 which is DEFINE_CEPH_FEATURE( 4, 1,
SUBSCRIBE2)


On Fri, Feb 24, 2017 at 1:40 PM, Schlacta, Christ  wrote:
> # begin crush map
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> tunable straw_calc_version 1
> tunable allowed_bucket_algs 54
>
> # devices
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
>
> # types
> type 0 osd
> type 1 host
> type 2 chassis
> type 3 rack
> type 4 row
> type 5 pdu
> type 6 pod
> type 7 room
> type 8 datacenter
> type 9 region
> type 10 root
>
> # buckets
> host densetsu {
> id -2   # do not change unnecessarily
> # weight 0.293
> alg straw
> hash 0  # rjenkins1
> item osd.0 weight 0.146
> item osd.1 weight 0.146
> }
> host density {
> id -3   # do not change unnecessarily
> # weight 0.145
> alg straw
> hash 0  # rjenkins1
> item osd.2 weight 0.145
> }
> root default {
> id -1   # do not change unnecessarily
> # weight 0.438
> alg straw
> hash 0  # rjenkins1
> item densetsu weight 0.293
> item density weight 0.145
> }
>
> # rules
> rule replicated_ruleset {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> # end crush map
>
> On Thu, Feb 23, 2017 at 7:37 PM, Brad Hubbard  wrote:
>> Did you dump out the crushmap and look?
>>
>> On Fri, Feb 24, 2017 at 1:36 PM, Schlacta, Christ  
>> wrote:
>>> insofar as I can tell, yes.  Everything indicates that they are in effect.
>>>
>>> On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard  wrote:
 Is your change reflected in the current crushmap?

 On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ  
 wrote:
> -- Forwarded message --
> From: Schlacta, Christ 
> Date: Thu, Feb 23, 2017 at 6:06 PM
> Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
> To: Brad Hubbard 
>
>
> So setting the above to 0 by sheer brute force didn't work, so it's
> not crush or osd problem..  also, the errors still say mon0, so I
> suspect it's related to communication between libceph in kernel and
> the mon.
>
> aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush tunables 
> hammer
> adjusted tunables profile to hammer
> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush show-tunables
> {
> "choose_local_tries": 0,
> "choose_local_fallback_tries": 0,
> "choose_total_tries": 50,
> "chooseleaf_descend_once": 1,
> "chooseleaf_vary_r": 1,
> "chooseleaf_stable": 0,
> "straw_calc_version": 1,
> "allowed_bucket_algs": 54,
> "profile": "hammer",
> "optimal_tunables": 0,
> "legacy_tunables": 0,
> "minimum_required_version": "firefly",
> "require_feature_tunables": 1,
> "require_feature_tunables2": 1,
> "has_v2_rules": 0,
> "require_feature_tunables3": 1,
> "has_v3_rules": 0,
> "has_v4_buckets": 0,
> "require_feature_tunables5": 0,
> "has_v5_rules": 0
> }
>
> aarcane@densetsu:/etc/target$ sudo rbd --cluster rk map rt1
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail" or so.
> rbd: map failed: (110) Connection timed out
> aarcane@densetsu:~$ dmesg | tail
> [10118.778868] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
> [10118.779597] libceph: mon0 10.0.0.67:6789 missing required protocol 
> features
> [10119.834634] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
> [10119.835174] libceph: mon0 10.0.0.67:6789 missing required protocol 
> features
> [10120.762983] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
> [10120.763707] libceph: mon0 10.0.0.67:6789 missing required protocol 
> features
> [10121.787128] lib

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard

Kefu has just pointed out that this has the hallmarks of
https://github.com/ceph/ceph/pull/13275

On Fri, Feb 24, 2017 at 3:00 PM, Brad Hubbard  wrote:
> Hmm,
>
> What's interesting is the feature set reported by the servers has only
> changed from
>
> e0106b84a846a42
>
> Bit 1 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14 set Bit 18
> set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set Bit 36 set
> Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48 set Bit 57 set Bit
> 58 set Bit 59 set
>
> to
>
> e0106b84a846a52
>
> Bit 1 set Bit 4 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14
> set Bit 18 set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set
> Bit 36 set Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48 set Bit
> 57 set Bit 58 set Bit 59 set
>
> So all it's done is *added* Bit 4 which is DEFINE_CEPH_FEATURE( 4, 1,
> SUBSCRIBE2)
>
>
> On Fri, Feb 24, 2017 at 1:40 PM, Schlacta, Christ  wrote:
>> # begin crush map
>> tunable choose_local_tries 0
>> tunable choose_local_fallback_tries 0
>> tunable choose_total_tries 50
>> tunable chooseleaf_descend_once 1
>> tunable chooseleaf_vary_r 1
>> tunable straw_calc_version 1
>> tunable allowed_bucket_algs 54
>>
>> # devices
>> device 0 osd.0
>> device 1 osd.1
>> device 2 osd.2
>>
>> # types
>> type 0 osd
>> type 1 host
>> type 2 chassis
>> type 3 rack
>> type 4 row
>> type 5 pdu
>> type 6 pod
>> type 7 room
>> type 8 datacenter
>> type 9 region
>> type 10 root
>>
>> # buckets
>> host densetsu {
>> id -2   # do not change unnecessarily
>> # weight 0.293
>> alg straw
>> hash 0  # rjenkins1
>> item osd.0 weight 0.146
>> item osd.1 weight 0.146
>> }
>> host density {
>> id -3   # do not change unnecessarily
>> # weight 0.145
>> alg straw
>> hash 0  # rjenkins1
>> item osd.2 weight 0.145
>> }
>> root default {
>> id -1   # do not change unnecessarily
>> # weight 0.438
>> alg straw
>> hash 0  # rjenkins1
>> item densetsu weight 0.293
>> item density weight 0.145
>> }
>>
>> # rules
>> rule replicated_ruleset {
>> ruleset 0
>> type replicated
>> min_size 1
>> max_size 10
>> step take default
>> step chooseleaf firstn 0 type host
>> step emit
>> }
>>
>> # end crush map
>>
>> On Thu, Feb 23, 2017 at 7:37 PM, Brad Hubbard  wrote:
>>> Did you dump out the crushmap and look?
>>>
>>> On Fri, Feb 24, 2017 at 1:36 PM, Schlacta, Christ  
>>> wrote:
 insofar as I can tell, yes.  Everything indicates that they are in effect.

 On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard  wrote:
> Is your change reflected in the current crushmap?
>
> On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ  
> wrote:
>> -- Forwarded message --
>> From: Schlacta, Christ 
>> Date: Thu, Feb 23, 2017 at 6:06 PM
>> Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
>> To: Brad Hubbard 
>>
>>
>> So setting the above to 0 by sheer brute force didn't work, so it's
>> not crush or osd problem..  also, the errors still say mon0, so I
>> suspect it's related to communication between libceph in kernel and
>> the mon.
>>
>> aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush tunables 
>> hammer
>> adjusted tunables profile to hammer
>> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush show-tunables
>> {
>> "choose_local_tries": 0,
>> "choose_local_fallback_tries": 0,
>> "choose_total_tries": 50,
>> "chooseleaf_descend_once": 1,
>> "chooseleaf_vary_r": 1,
>> "chooseleaf_stable": 0,
>> "straw_calc_version": 1,
>> "allowed_bucket_algs": 54,
>> "profile": "hammer",
>> "optimal_tunables": 0,
>> "legacy_tunables": 0,
>> "minimum_required_version": "firefly",
>> "require_feature_tunables": 1,
>> "require_feature_tunables2": 1,
>> "has_v2_rules": 0,
>> "require_feature_tunables3": 1,
>> "has_v3_rules": 0,
>> "has_v4_buckets": 0,
>> "require_feature_tunables5": 0,
>> "has_v5_rules": 0
>> }
>>
>> aarcane@densetsu:/etc/target$ sudo rbd --cluster rk map rt1
>> rbd: sysfs write failed
>> In some cases useful info is found in syslog - try "dmesg | tail" or so.
>> rbd: map failed: (110) Connection timed out
>> aarcane@densetsu:~$ dmesg | tail
>> [10118.778868] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>> [10118.779597] libceph: mon0 10.0.0.67:6789 missing required protocol 
>> features
>> [10119.834634] libceph: mon0 10.0.0.67:6789 feature set mismatch, my
>> 40106b84a842a52 < server's e0106b84a846a52, missing a004000
>> [10119.835174] libceph: mon0

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ

So hopefully when the suse ceph team get 11.2 released it should fix this,
yes?

On Feb 23, 2017 21:06, "Brad Hubbard"  wrote:

> Kefu has just pointed out that this has the hallmarks of
> https://github.com/ceph/ceph/pull/13275
>
> On Fri, Feb 24, 2017 at 3:00 PM, Brad Hubbard  wrote:
> > Hmm,
> >
> > What's interesting is the feature set reported by the servers has only
> > changed from
> >
> > e0106b84a846a42
> >
> > Bit 1 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14 set Bit 18
> > set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set Bit 36 set
> > Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48 set Bit 57 set Bit
> > 58 set Bit 59 set
> >
> > to
> >
> > e0106b84a846a52
> >
> > Bit 1 set Bit 4 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14
> > set Bit 18 set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set
> > Bit 36 set Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48 set Bit
> > 57 set Bit 58 set Bit 59 set
> >
> > So all it's done is *added* Bit 4 which is DEFINE_CEPH_FEATURE( 4, 1,
> > SUBSCRIBE2)
> >
> >
> > On Fri, Feb 24, 2017 at 1:40 PM, Schlacta, Christ 
> wrote:
> >> # begin crush map
> >> tunable choose_local_tries 0
> >> tunable choose_local_fallback_tries 0
> >> tunable choose_total_tries 50
> >> tunable chooseleaf_descend_once 1
> >> tunable chooseleaf_vary_r 1
> >> tunable straw_calc_version 1
> >> tunable allowed_bucket_algs 54
> >>
> >> # devices
> >> device 0 osd.0
> >> device 1 osd.1
> >> device 2 osd.2
> >>
> >> # types
> >> type 0 osd
> >> type 1 host
> >> type 2 chassis
> >> type 3 rack
> >> type 4 row
> >> type 5 pdu
> >> type 6 pod
> >> type 7 room
> >> type 8 datacenter
> >> type 9 region
> >> type 10 root
> >>
> >> # buckets
> >> host densetsu {
> >> id -2   # do not change unnecessarily
> >> # weight 0.293
> >> alg straw
> >> hash 0  # rjenkins1
> >> item osd.0 weight 0.146
> >> item osd.1 weight 0.146
> >> }
> >> host density {
> >> id -3   # do not change unnecessarily
> >> # weight 0.145
> >> alg straw
> >> hash 0  # rjenkins1
> >> item osd.2 weight 0.145
> >> }
> >> root default {
> >> id -1   # do not change unnecessarily
> >> # weight 0.438
> >> alg straw
> >> hash 0  # rjenkins1
> >> item densetsu weight 0.293
> >> item density weight 0.145
> >> }
> >>
> >> # rules
> >> rule replicated_ruleset {
> >> ruleset 0
> >> type replicated
> >> min_size 1
> >> max_size 10
> >> step take default
> >> step chooseleaf firstn 0 type host
> >> step emit
> >> }
> >>
> >> # end crush map
> >>
> >> On Thu, Feb 23, 2017 at 7:37 PM, Brad Hubbard 
> wrote:
> >>> Did you dump out the crushmap and look?
> >>>
> >>> On Fri, Feb 24, 2017 at 1:36 PM, Schlacta, Christ 
> wrote:
>  insofar as I can tell, yes.  Everything indicates that they are in
> effect.
> 
>  On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard 
> wrote:
> > Is your change reflected in the current crushmap?
> >
> > On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ <
> aarc...@aarcane.org> wrote:
> >> -- Forwarded message --
> >> From: Schlacta, Christ 
> >> Date: Thu, Feb 23, 2017 at 6:06 PM
> >> Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
> >> To: Brad Hubbard 
> >>
> >>
> >> So setting the above to 0 by sheer brute force didn't work, so it's
> >> not crush or osd problem..  also, the errors still say mon0, so I
> >> suspect it's related to communication between libceph in kernel and
> >> the mon.
> >>
> >> aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush
> tunables hammer
> >> adjusted tunables profile to hammer
> >> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush
> show-tunables
> >> {
> >> "choose_local_tries": 0,
> >> "choose_local_fallback_tries": 0,
> >> "choose_total_tries": 50,
> >> "chooseleaf_descend_once": 1,
> >> "chooseleaf_vary_r": 1,
> >> "chooseleaf_stable": 0,
> >> "straw_calc_version": 1,
> >> "allowed_bucket_algs": 54,
> >> "profile": "hammer",
> >> "optimal_tunables": 0,
> >> "legacy_tunables": 0,
> >> "minimum_required_version": "firefly",
> >> "require_feature_tunables": 1,
> >> "require_feature_tunables2": 1,
> >> "has_v2_rules": 0,
> >> "require_feature_tunables3": 1,
> >> "has_v3_rules": 0,
> >> "has_v4_buckets": 0,
> >> "require_feature_tunables5": 0,
> >> "has_v5_rules": 0
> >> }
> >>
> >> aarcane@densetsu:/etc/target$ sudo rbd --cluster rk map rt1
> >> rbd: sysfs write failed
> >> In some cases useful info is found in syslog - try "dmesg | tail"
> or so.
> >> rbd: map failed: (110) Connection timed out
> >> aarcane@densetsu:~$ d

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard

On Fri, Feb 24, 2017 at 3:07 PM, Schlacta, Christ  wrote:
> So hopefully when the suse ceph team get 11.2 released it should fix this,
> yes?

Definitely not a question I can answer.

What I can tell you is the fix is only in master atm, not yet
backported to kraken http://tracker.ceph.com/issues/18842

>
> On Feb 23, 2017 21:06, "Brad Hubbard"  wrote:
>>
>> Kefu has just pointed out that this has the hallmarks of
>> https://github.com/ceph/ceph/pull/13275
>>
>> On Fri, Feb 24, 2017 at 3:00 PM, Brad Hubbard  wrote:
>> > Hmm,
>> >
>> > What's interesting is the feature set reported by the servers has only
>> > changed from
>> >
>> > e0106b84a846a42
>> >
>> > Bit 1 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14 set Bit 18
>> > set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set Bit 36 set
>> > Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48 set Bit 57 set Bit
>> > 58 set Bit 59 set
>> >
>> > to
>> >
>> > e0106b84a846a52
>> >
>> > Bit 1 set Bit 4 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14
>> > set Bit 18 set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set
>> > Bit 36 set Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48 set Bit
>> > 57 set Bit 58 set Bit 59 set
>> >
>> > So all it's done is *added* Bit 4 which is DEFINE_CEPH_FEATURE( 4, 1,
>> > SUBSCRIBE2)
>> >
>> >
>> > On Fri, Feb 24, 2017 at 1:40 PM, Schlacta, Christ 
>> > wrote:
>> >> # begin crush map
>> >> tunable choose_local_tries 0
>> >> tunable choose_local_fallback_tries 0
>> >> tunable choose_total_tries 50
>> >> tunable chooseleaf_descend_once 1
>> >> tunable chooseleaf_vary_r 1
>> >> tunable straw_calc_version 1
>> >> tunable allowed_bucket_algs 54
>> >>
>> >> # devices
>> >> device 0 osd.0
>> >> device 1 osd.1
>> >> device 2 osd.2
>> >>
>> >> # types
>> >> type 0 osd
>> >> type 1 host
>> >> type 2 chassis
>> >> type 3 rack
>> >> type 4 row
>> >> type 5 pdu
>> >> type 6 pod
>> >> type 7 room
>> >> type 8 datacenter
>> >> type 9 region
>> >> type 10 root
>> >>
>> >> # buckets
>> >> host densetsu {
>> >> id -2   # do not change unnecessarily
>> >> # weight 0.293
>> >> alg straw
>> >> hash 0  # rjenkins1
>> >> item osd.0 weight 0.146
>> >> item osd.1 weight 0.146
>> >> }
>> >> host density {
>> >> id -3   # do not change unnecessarily
>> >> # weight 0.145
>> >> alg straw
>> >> hash 0  # rjenkins1
>> >> item osd.2 weight 0.145
>> >> }
>> >> root default {
>> >> id -1   # do not change unnecessarily
>> >> # weight 0.438
>> >> alg straw
>> >> hash 0  # rjenkins1
>> >> item densetsu weight 0.293
>> >> item density weight 0.145
>> >> }
>> >>
>> >> # rules
>> >> rule replicated_ruleset {
>> >> ruleset 0
>> >> type replicated
>> >> min_size 1
>> >> max_size 10
>> >> step take default
>> >> step chooseleaf firstn 0 type host
>> >> step emit
>> >> }
>> >>
>> >> # end crush map
>> >>
>> >> On Thu, Feb 23, 2017 at 7:37 PM, Brad Hubbard 
>> >> wrote:
>> >>> Did you dump out the crushmap and look?
>> >>>
>> >>> On Fri, Feb 24, 2017 at 1:36 PM, Schlacta, Christ
>> >>>  wrote:
>>  insofar as I can tell, yes.  Everything indicates that they are in
>>  effect.
>> 
>>  On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard 
>>  wrote:
>> > Is your change reflected in the current crushmap?
>> >
>> > On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ
>> >  wrote:
>> >> -- Forwarded message --
>> >> From: Schlacta, Christ 
>> >> Date: Thu, Feb 23, 2017 at 6:06 PM
>> >> Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.
>> >> To: Brad Hubbard 
>> >>
>> >>
>> >> So setting the above to 0 by sheer brute force didn't work, so it's
>> >> not crush or osd problem..  also, the errors still say mon0, so I
>> >> suspect it's related to communication between libceph in kernel and
>> >> the mon.
>> >>
>> >> aarcane@densetsu:/etc/target$ sudo ceph --cluster rk osd crush
>> >> tunables hammer
>> >> adjusted tunables profile to hammer
>> >> aarcane@densetsu:/etc/target$ ceph --cluster rk osd crush
>> >> show-tunables
>> >> {
>> >> "choose_local_tries": 0,
>> >> "choose_local_fallback_tries": 0,
>> >> "choose_total_tries": 50,
>> >> "chooseleaf_descend_once": 1,
>> >> "chooseleaf_vary_r": 1,
>> >> "chooseleaf_stable": 0,
>> >> "straw_calc_version": 1,
>> >> "allowed_bucket_algs": 54,
>> >> "profile": "hammer",
>> >> "optimal_tunables": 0,
>> >> "legacy_tunables": 0,
>> >> "minimum_required_version": "firefly",
>> >> "require_feature_tunables": 1,
>> >> "require_feature_tunables2": 1,
>> >> "has_v2_rules": 0,
>> >> "require_feature_tunables3": 1,
>> >> "has_v3_rules": 0,
>> >>>

39 matches

Mail list logo