Do those logs have a higher debugging level than the default? If not
nevermind as they will not have enough information. If they do however,
we'd be interested in the portion around the moment you set the tunables.
Say, before the upgrade and a bit after you set the tunable. If you want to
be finer grained, then ideally it would be the moment where those maps were
created, but you'd have to grep the logs for that.
Or drop the logs somewhere and I'll take a look.
-Joao
On Jul 3, 2014 5:48 PM, Pierre BLONDEAU pierre.blond...@unicaen.fr
wrote:
Le 03/07/2014 13:49, Joao Eduardo Luis a écrit :
On 07/03/2014 12:15 AM, Pierre BLONDEAU wrote:
Le 03/07/2014 00:55, Samuel Just a écrit :
Ah,
~/logs » for i in 20 23; do ../ceph/src/osdmaptool --export-crush
/tmp/crush$i osd-$i*; ../ceph/src/crushtool -d /tmp/crush$i
/tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d
../ceph/src/osdmaptool: osdmap file
'osd-20_osdmap.13258__0_4E62BB79__none'
../ceph/src/osdmaptool: exported crush map to /tmp/crush20
../ceph/src/osdmaptool: osdmap file
'osd-23_osdmap.13258__0_4E62BB79__none'
../ceph/src/osdmaptool: exported crush map to /tmp/crush23
6d5
tunable chooseleaf_vary_r 1
Looks like the chooseleaf_vary_r tunable somehow ended up divergent?
The only thing that comes to mind that could cause this is if we changed
the leader's in-memory map, proposed it, it failed, and only the leader
got to write the map to disk somehow. This happened once on a totally
different issue (although I can't pinpoint right now which).
In such a scenario, the leader would serve the incorrect osdmap to
whoever asked osdmaps from it, the remaining quorum would serve the
correct osdmaps to all the others. This could cause this divergence. Or
it could be something else.
Are there logs for the monitors for the timeframe this may have happened
in?
Which exactly timeframe you want ? I have 7 days of logs, I should have
informations about the upgrade from firefly to 0.82.
Which mon's log do you want ? Three ?
Regards
-Joao
Pierre: do you recall how and when that got set?
I am not sure to understand, but if I good remember after the update in
firefly, I was in state : HEALTH_WARN crush map has legacy tunables and
I see feature set mismatch in log.
So if I good remeber, i do : ceph osd crush tunables optimal for the
problem of crush map and I update my client and server kernel to
3.16rc.
It's could be that ?
Pierre
-Sam
On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just sam.j...@inktank.com
wrote:
Yeah, divergent osdmaps:
555ed048e73024687fc8b106a570db4f osd-20_osdmap.13258__0_
4E62BB79__none
6037911f31dc3c18b05499d24dcdbe5c osd-23_osdmap.13258__0_
4E62BB79__none
Joao: thoughts?
-Sam
On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU
pierre.blond...@unicaen.fr wrote:
The files
When I upgrade :
ceph-deploy install --stable firefly servers...
on each servers service ceph restart mon
on each servers service ceph restart osd
on each servers service ceph restart mds
I upgraded from emperor to firefly. After repair, remap, replace,
etc ... I
have some PG which pass in peering state.
I thought why not try the version 0.82, it could solve my problem. (
It's my mistake ). So, I upgrade from firefly to 0.83 with :
ceph-deploy install --testing servers...
..
Now, all programs are in version 0.82.
I have 3 mons, 36 OSD and 3 mds.
Pierre
PS : I find also inc\uosdmap.13258__0_469271DE__none on each meta
directory.
Le 03/07/2014 00:10, Samuel Just a écrit :
Also, what version did you upgrade from, and how did you upgrade?
-Sam
On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just sam.j...@inktank.com
wrote:
Ok, in current/meta on osd 20 and osd 23, please attach all files
matching
^osdmap.13258.*
There should be one such file on each osd. (should look something
like
osdmap.6__0_FD6E4C01__none, probably hashed into a subdirectory,
you'll want to use find).
What version of ceph is running on your mons? How many mons do
you have?
-Sam
On Wed, Jul 2, 2014 at 2:21 PM, Pierre BLONDEAU
pierre.blond...@unicaen.fr wrote:
Hi,
I do it, the log files are available here :
https://blondeau.users.greyc.fr/cephlog/debug20/
The OSD's files are really big +/- 80M .
After starting the osd.20 some other osd crash. I pass from 31
osd up to
16.
I remark that after this the number of down+peering PG decrease
from 367
to
248. It's normal ? May be it's temporary, the time that the
cluster
verifies all the PG ?
Regards
Pierre
Le 02/07/2014 19:16, Samuel Just a écrit :
You should add
debug osd = 20
debug filestore = 20
debug ms = 1
to the [osd] section of the ceph.conf and restart the osds. I'd
like
all three logs if possible.
Thanks
-Sam
On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU
pierre.blond...@unicaen.fr wrote:
Yes, but how i do that ?
With a command like that ?
ceph tell osd.20 injectargs '--debug-osd 20