On 01/22/2014 11:34 AM, Guang wrote:
Thanks Sage.

If we use the debug_mon and debug_paxos as 20, the log file is growing too 
fast, I set the log level as 10 and then: 1) run the 'ceph osd set noin' 
command, 2) grep the log with keyword 'noin', attached is the monitor log. 
Please help to check. Thanks very much!




The log doesn't show the relevant part due to only containing log messages mentioning the 'noin' keyword.

We need the portion of the log between a line containing

'(leader).*handle_command mon_command({"prefix": "osd set", "key": "noin"}.*'

and the first line (after that) containing 'won leader election'.

Otherwise we are missing what is causing the election to be triggered.

  -Joao



Thanks,
Guang

On Jan 21, 2014, at 12:35 AM, Sage Weil <s...@inktank.com> wrote:

On Sun, 19 Jan 2014, Guang wrote:
Thanks Sage.

I just captured part of the log (it was fast growing), the process did
not hang but I saw the same pattern repeatedly. Should I increase the
log level and send over email (it constantly reproduced)?

Sure!  A representative fragment of the repeating fragment shoudl be
enough.

s

Thanks,
Guang

On Jan 18, 2014, at 12:05 AM, Sage Weil <s...@inktank.com> wrote:

On Fri, 17 Jan 2014, Guang wrote:
Thanks Sage.

I further narrow down the problem to #any command using paxos service would 
hang#, following are details:

1. I am able to run ceph status / osd dump, etc., however, the result are out 
of date (though I stopped all OSDs, it does not reflect in ceph status report).

-bash-4.1$ sudo ceph -s
cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
  health HEALTH_WARN 2797 pgs degraded; 107 pgs down; 7503 pgs peering; 917 pgs 
recovering; 6079 pgs recovery_wait; 2957 pgs stale; 7771 pgs stuck inactive; 
2957 pgs stuck stale; 16567 pgs stuck unclean; recovery 54346804/779462977 
degraded (6.972%); 9/259724199 unfound (0.000%); 2 near full osd(s); 57/751 in 
osds are down; noout,nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
  monmap e1: 3 mons at 
{osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
 election epoch 123278, quorum 0,1,2 osd151,osd152,osd153
  osdmap e134893: 781 osds: 694 up, 751 in
   pgmap v2388518: 22203 pgs: 26 inactive, 14 active, 79 
stale+active+recovering, 5020 active+clean, 242 stale, 4352 
active+recovery_wait, 616 stale+active+clean, 177 active+recovering+degraded, 
6714 peering, 925 stale+active+recovery_wait, 86 down+peering, 1547 
active+degraded, 32 stale+active+recovering+degraded, 648 stale+peering, 21 
stale+down+peering, 239 stale+active+degraded, 651 
active+recovery_wait+degraded, 30 remapped+peering, 151 
stale+active+recovery_wait+degraded, 4 stale+remapped+peering, 629 
active+recovering; 79656 GB data, 363 TB used, 697 TB / 1061 TB avail; 
54346804/779462977 degraded (6.972%); 9/259724199 unfound (0.000%)
  mdsmap e1: 0/0/1 up

2. If I run a command which uses paxos, the command will hang forever, this 
includes, ceph osd set noup (and also including those commands osd send to 
monitor when being started (create-or-add)).

I attached the corresponding monitor log (it is like a bug).

I see the osd set command coming through, but it arrives while paxos is
converging and the log seems to end before the mon would normally process
te delayed messages.  Is there a reason why the log fragment you attached
ends there, or did the process hang or something?

Thanks-
sage

I

On Jan 17, 2014, at 1:35 AM, Sage Weil <s...@inktank.com> wrote:

Hi Guang,

On Thu, 16 Jan 2014, Guang wrote:
I still have bad the luck to figure out what is the problem making 
authentication failure, so in order to get the cluster back, I tried:
1. stop all daemons (mon & osd)
2. change the configuration to disable cephx
3. start mon daemons (3 in total)
4. start osd daemon one by one

After finishing step 3, the cluster can be reachable ('ceph -s' give results):
-bash-4.1$ sudo ceph -s
cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
health HEALTH_WARN 2797 pgs degraded; 107 pgs down; 7503 pgs peering; 917 pgs 
recovering; 6079 pgs recovery_wait; 2957 pgs stale; 7771 pgs stuck inactive; 
2957 pgs stuck stale; 16567 pgs stuck unclean; recovery 54346804/779462977 
degraded (6.972%); 9/259724199 unfound (0.000%); 2 near full osd(s); 57/751 in 
osds are down; noout,nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
monmap e1: 3 mons at 
{osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
 election epoch 106022, quorum 0,1,2 osd151,osd152,osd153
osdmap e134893: 781 osds: 694 up, 751 in
  pgmap v2388518: 22203 pgs: 26 inactive, 14 active, 79 
stale+active+recovering, 5020 active+clean, 242 stale, 4352 
active+recovery_wait, 616 stale+active+clean, 177 active+recovering+degraded, 
6714 peering, 925 stale+active+recovery_wait, 86 down+peering, 1547 
active+degraded, 32 stale+active+recovering+degraded, 648 stale+peering, 21 
stale+down+peering, 239 stale+active+degraded, 651 
active+recovery_wait+degraded, 30 remapped+peering, 151 
stale+active+recovery_wait+degraded, 4 stale+remapped+peering, 629 
active+recovering; 79656 GB data, 363 TB used, 697 TB / 1061 TB avail; 
54346804/779462977 degraded (6.972%); 9/259724199 unfound (0.000%)
mdsmap e1: 0/0/1 up
(at this point, all OSDs should be down).

When I tried to start OSD daemon, the starting script got hang, and the process 
hang is:
root      80497  80496  0 08:18 pts/0    00:00:00 python /usr/bin/ceph 
--name=osd.22 --keyring=/var/lib/ceph/osd/ceph-22/keyring osd crush 
create-or-move -- 22 0.40 root=default host=osd173

When I strace the starting script, I got the following traces (process 75873 is 
the above process), it failed with futex and then do a infinite loop:
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
Any idea what might trigger this?

It is hard to tell from the strace what is going on from this.  Do you see
that the OSDs are booting in ceph.log (or ceph -w output)?  If not, I
would look at the osd daemon log for clues.  You may need to turn up
debugging to see (ceph daemon osd.NNN config set debug_osd 20 to adjust
the level on the running daemon).

If they are booting, it is mostly a matter of letting it recover and come
up.  We have seen patterns where configuration or network issues have let
the system bury itself under a series of osdmap updates.  If you see that
in the log when you turn up debugging, or see the osds going up and down
when you try to bring the cluster up, that could be what is going on.  A
strategy that has worked there is to let all the osds catch up on their
maps before trying to peer and join the cluster.  To do that, 'ceph osd
set noup' (which prevents the osds from joining), wait for the ceph-osd
processes to stop chewing on maps (watch the cpu utilization in top), and
once they are all ready 'ceph osd unset noup' and let them join and peer
all at once.

sage


======= STRACE (PARTIAL) ==========
[pid 75873] futex(0xf707a0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 75878] mmap(NULL, 134217728, PROT_NONE, 
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5da6529000
[pid 75878] munmap(0x7f5da6529000, 28143616) = 0
[pid 75878] munmap(0x7f5dac000000, 38965248) = 0
[pid 75878] mprotect(0x7f5da8000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 75878] futex(0xf707a0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 75873] <... futex resumed> )       = 0
[pid 75873] futex(0xdd3cb0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 75878] futex(0xdd3cb0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 75873] <... futex resumed> )       = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 75878] <... futex resumed> )       = 0
[pid 75873] select(0, NULL, NULL, NULL, {0, 1000} <unfinished ...>
[pid 75878] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
[pid 75878] mmap(NULL, 10489856, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f5dadb28000
[pid 75878] mprotect(0x7f5dadb28000, 4096, PROT_NONE) = 0
[ omit some entries?]
[pid 75873] select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
[pid 75873] select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
[pid 75873] select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
[pid 75873] select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
[pid 75873] select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
[pid 75873] select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
[pid 75873] select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)


Thanks,
Guang

On Jan 15, 2014, at 5:54 AM, Guang <yguan...@outlook.com> wrote:

Thanks Sage.

-bash-4.1$ sudo ceph --admin-daemon /var/run/ceph/ceph-mon.osd151.asok 
mon_status
{ "name": "osd151",
"rank": 2,
"state": "electing",
"election_epoch": 85469,
"quorum": [],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": { "epoch": 1,
   "fsid": "b9cb3ea9-e1de-48b4-9e86-6921e2c537d2",
   "modified": "0.000000",
   "created": "0.000000",
   "mons": [
         { "rank": 0,
           "name": "osd152",
           "addr": "10.193.207.130:6789\/0"},
         { "rank": 1,
           "name": "osd153",
           "addr": "10.193.207.131:6789\/0"},
         { "rank": 2,
           "name": "osd151",
           "addr": "10.194.0.68:6789\/0"}]}}

And:

-bash-4.1$ sudo ceph --admin-daemon /var/run/ceph/ceph-mon.osd151.asok 
quorum_status
{ "election_epoch": 85480,
"quorum": [
     0,
     1,
     2],
"quorum_names": [
     "osd151",
     "osd152",
     "osd153"],
"quorum_leader_name": "osd152",
"monmap": { "epoch": 1,
   "fsid": "b9cb3ea9-e1de-48b4-9e86-6921e2c537d2",
   "modified": "0.000000",
   "created": "0.000000",
   "mons": [
         { "rank": 0,
           "name": "osd152",
           "addr": "10.193.207.130:6789\/0"},
         { "rank": 1,
           "name": "osd153",
           "addr": "10.193.207.131:6789\/0"},
         { "rank": 2,
           "name": "osd151",
           "addr": "10.194.0.68:6789\/0"}]}}


The election has been finished with leader selected from the above status.

Thanks,
Guang

On Jan 14, 2014, at 10:55 PM, Sage Weil <s...@inktank.com> wrote:

On Tue, 14 Jan 2014, GuangYang wrote:
Hi ceph-users and ceph-devel,
I came across an issue after restarting monitors of the cluster, that 
authentication fails which prevents running any ceph command.

After we did some maintenance work, I restart OSD, however, I found that the 
OSD would not join the cluster automatically after being restarted, though TCP 
dump showed it had already sent messenger to monitor telling add me into the 
cluster.

So that I suspected there might be some issues of monitor and I restarted 
monitor one by one (3 in total), however, after restarting monitors, all ceph 
command would fail saying authentication timeout?

2014-01-14 12:00:30.499397 7fc7f195e700  0 monclient(hunting): authenticate 
timed out after 300
2014-01-14 12:00:30.499440 7fc7f195e700  0 librados: client.admin 
authentication error (110) Connection timed out
Error connecting to cluster: Error

Any idea why such error happened (restarting OSD would result in the same 
error)?

I am thinking the authentication information is persisted in mon local disk and 
is there a chance those data got corrupted?

That sounds unlikely, but you're right that the core problem is with the
mons.  What does

ceph daemon mon.`hostname` mon_status

say?  Perhaps they are not forming a quorum and that is what is preventing
authentication.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html













_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to