Re: [ceph-users] MDS crash when client goes to sleep
On Sun, Mar 23, 2014 at 11:47 AM, Sage Weil wrote: > Hi, > > I looked at this a bit earlier and wasn't sure why we would be getting a > remote_reset event after a sleep/wake cycle. The patch should fix the > crash, but I'm a bit worried something is not quite right on the client > side, too... > When client wakes up, it first tries reconnecting the old session. MDS refuses the reconnect request and sends a session close message to the client. After receiving the session close message, client closes the old session, then sends a session open message to the MDS. The MDS receives the open request and triggers a remote reset (Pipe.cc:466) > sage > > On Sun, 23 Mar 2014, Yan, Zheng wrote: > >> thank you for reporting this. Below patch should fix this issue >> >> --- >> diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc >> index 57c7f4a..6b53c14 100644 >> --- a/src/mds/MDS.cc >> +++ b/src/mds/MDS.cc >> @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con) >>if (session->is_closed()) { >> dout(3) << "ms_handle_reset closing connection for session " << >> session->info.inst << dendl; >> messenger->mark_down(con); >> + con->set_priv(NULL); >> sessionmap.remove_session(session); >>} >>session->put(); >> @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con) >>if (session->is_closed()) { >> dout(3) << "ms_handle_remote_reset closing connection for session " >> << session->info.inst << dendl; >> messenger->mark_down(con); >> + con->set_priv(NULL); >> sessionmap.remove_session(session); >>} >>session->put(); >> >> On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim >> wrote: >> > Hi Hong, >> > >> > >> > >> > How's the client now? Would it able to mount to the filesystem now? It >> > looks >> > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html >> > >> > However, you need to collect some logs to confirm this. >> > >> > >> > >> > Thanks. >> > >> > >> > >> > >> > >> > From: hjcho616 [mailto:hjcho...@yahoo.com] >> > Sent: Friday, March 21, 2014 2:30 PM >> > >> > >> > To: Luke Jing Yuan >> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com >> > Subject: Re: [ceph-users] MDS crash when client goes to sleep >> > >> > >> > >> > Luke, >> > >> > >> > >> > Not sure what flapping ceph-mds daemon mean, but when I connected to MDS >> > when this happened there no longer was any process with ceph-mds when I ran >> > one daemon. When I ran three there were one left but wasn't doing much. I >> > didn't record the logs but behavior was very similar in 0.72 emperor. I am >> > using debian packages. >> > >> > >> > >> > Client went to sleep for a while (like 8+ hours). There was no I/O prior >> > to >> > the sleep other than the fact that cephfs was still mounted. >> > >> > >> > >> > Regards, >> > >> > Hong >> > >> > >> > >> > >> > >> > From: Luke Jing Yuan >> > >> > >> > To: hjcho616 >> > Cc: Mohd Bazli Ab Karim ; >> > "ceph-users@lists.ceph.com" >> > Sent: Friday, March 21, 2014 1:17 AM >> > >> > Subject: RE: [ceph-users] MDS crash when client goes to sleep >> > >> > >> > Hi Hong, >> > >> > That's interesting, for Mr. Bazli and I, we ended with MDS stuck in >> > (up:replay) and a flapping ceph-mds daemon, but then again we are using >> > version 0.72.2. Having said so the triggering point seem similar to us as >> > well, which is the following line: >> > >> > -38> 2014-03-20 20:08:44.495565 7fee3d7c4700 0 -- >> > 192.168.1.20:6801/17079 >> >>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 >> > c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION >> > >> > So how long did your client go into sleep? Was there any I/O prior to the >> > sleep? >> > >> > Regards, >> > Luke >> > >> > From: hjcho616 [mailto:hjcho...@yahoo.com] >> > Sent: Friday, 21 March, 2014 12:09 PM >> > To: Luke Jing Yuan >> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com >> > Subject: Re: [ceph-users] MDS crash when client goes to sleep >> > >> > Nope just these segfaults. >> > >> > [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp >> > 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000] >> > [211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp >> > 7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000] >> > [214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp >> > 7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000] >> > [289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp >> > 7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000] >> > [373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp >> > 7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000] >> > >> > Regards, >> > Hong >> > >> > >> > From: Luke Jing Yuan >> > To: hjcho616 >> > Cc: Mohd Bazli Ab Karim ; >> > "ceph-users@lists.ceph.
Re: [ceph-users] Error initializing cluster client: Error
> I have two nodes with 8 OSDs on each. First node running 2 monitors on > different virtual machines (mon.1 and mon.2), second node runing mon.3 > After several reboots (I have tested power failure scenarios) "ceph -w" on > node 2 always fails with message: > > root@bes-mon3:~# ceph --verbose -w > Error initializing cluster client: Error The cluster is simply protecting itself from a split brain situation. Say you have: mon.1 mon.2 mon.3 If mon.1 fails, no big deal, you still have 2/3 so no problem. Now instead, say mon.1 is separated from mon.2 and mon.3 because of a network partition (trunk failure, whatever). If one monitor of the three could elect itself as leader then you might have divergence between your monitors. Self-elected mon.1 thinks it's the leader and mon.{2,3} have elected a leader amongst themselves. The harsh reality is you really need to have monitors on 3 distinct physical hosts to protect against the failure of a physical host. -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crash when client goes to sleep
Hi, I looked at this a bit earlier and wasn't sure why we would be getting a remote_reset event after a sleep/wake cycle. The patch should fix the crash, but I'm a bit worried something is not quite right on the client side, too... sage On Sun, 23 Mar 2014, Yan, Zheng wrote: > thank you for reporting this. Below patch should fix this issue > > --- > diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc > index 57c7f4a..6b53c14 100644 > --- a/src/mds/MDS.cc > +++ b/src/mds/MDS.cc > @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con) >if (session->is_closed()) { > dout(3) << "ms_handle_reset closing connection for session " << > session->info.inst << dendl; > messenger->mark_down(con); > + con->set_priv(NULL); > sessionmap.remove_session(session); >} >session->put(); > @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con) >if (session->is_closed()) { > dout(3) << "ms_handle_remote_reset closing connection for session " > << session->info.inst << dendl; > messenger->mark_down(con); > + con->set_priv(NULL); > sessionmap.remove_session(session); >} >session->put(); > > On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim > wrote: > > Hi Hong, > > > > > > > > How's the client now? Would it able to mount to the filesystem now? It looks > > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html > > > > However, you need to collect some logs to confirm this. > > > > > > > > Thanks. > > > > > > > > > > > > From: hjcho616 [mailto:hjcho...@yahoo.com] > > Sent: Friday, March 21, 2014 2:30 PM > > > > > > To: Luke Jing Yuan > > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] MDS crash when client goes to sleep > > > > > > > > Luke, > > > > > > > > Not sure what flapping ceph-mds daemon mean, but when I connected to MDS > > when this happened there no longer was any process with ceph-mds when I ran > > one daemon. When I ran three there were one left but wasn't doing much. I > > didn't record the logs but behavior was very similar in 0.72 emperor. I am > > using debian packages. > > > > > > > > Client went to sleep for a while (like 8+ hours). There was no I/O prior to > > the sleep other than the fact that cephfs was still mounted. > > > > > > > > Regards, > > > > Hong > > > > > > > > > > > > From: Luke Jing Yuan > > > > > > To: hjcho616 > > Cc: Mohd Bazli Ab Karim ; > > "ceph-users@lists.ceph.com" > > Sent: Friday, March 21, 2014 1:17 AM > > > > Subject: RE: [ceph-users] MDS crash when client goes to sleep > > > > > > Hi Hong, > > > > That's interesting, for Mr. Bazli and I, we ended with MDS stuck in > > (up:replay) and a flapping ceph-mds daemon, but then again we are using > > version 0.72.2. Having said so the triggering point seem similar to us as > > well, which is the following line: > > > > -38> 2014-03-20 20:08:44.495565 7fee3d7c4700 0 -- 192.168.1.20:6801/17079 > >>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 > > c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION > > > > So how long did your client go into sleep? Was there any I/O prior to the > > sleep? > > > > Regards, > > Luke > > > > From: hjcho616 [mailto:hjcho...@yahoo.com] > > Sent: Friday, 21 March, 2014 12:09 PM > > To: Luke Jing Yuan > > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] MDS crash when client goes to sleep > > > > Nope just these segfaults. > > > > [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp > > 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000] > > [211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp > > 7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000] > > [214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp > > 7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000] > > [289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp > > 7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000] > > [373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp > > 7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000] > > > > Regards, > > Hong > > > > > > From: Luke Jing Yuan > > To: hjcho616 > > Cc: Mohd Bazli Ab Karim ; > > "ceph-users@lists.ceph.com" > > Sent: Thursday, March 20, 2014 10:53 PM > > Subject: Re: [ceph-users] MDS crash when client goes to sleep > > > > Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like > > that? > > > > Regards, > > Luke > > > > On Mar 21, 2014, at 11:09 AM, "hjcho616" wrote: > > On client, I was no longer able to access the filesystem. It would hang. > > Makes sense since MDS has crashed. I tried running 3 MDS demon on the same > > machine. Two crashes and one appears to be hung up(?). ceph health says MDS > > is in degraded state when that
Re: [ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"
Hi Kyle, Thanks, I turned on debug ms = 1 and debug osd = 10 and restarted osd.54 heres here's log for that one. ceph-osd.54.log.bz2 http://www67.zippyshare.com/v/99704627/file.html Strace osd 53, strace.zip http://www43.zippyshare.com/v/17581165/file.html Thanks, Quenten -Original Message- From: Kyle Bader [mailto:kyle.ba...@gmail.com] Sent: Sunday, 23 March 2014 12:10 PM To: Quenten Grasso Subject: Re: [ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec" > Any ideas on why the load average goes so crazy & starts to block IO? Could you turn on "debug ms = 1" and "debug osd = 10" prior to restarting the OSDs on one of your hosts and sharing the logs so we can take a look? It also might be worth while to strace one of the OSDs to try to determine what it's working so hard on, maybe: strace -fc -p > strace.osd1.log Thanks! -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crash when client goes to sleep
thank you for reporting this. Below patch should fix this issue --- diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc index 57c7f4a..6b53c14 100644 --- a/src/mds/MDS.cc +++ b/src/mds/MDS.cc @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con) if (session->is_closed()) { dout(3) << "ms_handle_reset closing connection for session " << session->info.inst << dendl; messenger->mark_down(con); + con->set_priv(NULL); sessionmap.remove_session(session); } session->put(); @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con) if (session->is_closed()) { dout(3) << "ms_handle_remote_reset closing connection for session " << session->info.inst << dendl; messenger->mark_down(con); + con->set_priv(NULL); sessionmap.remove_session(session); } session->put(); On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim wrote: > Hi Hong, > > > > How's the client now? Would it able to mount to the filesystem now? It looks > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html > > However, you need to collect some logs to confirm this. > > > > Thanks. > > > > > > From: hjcho616 [mailto:hjcho...@yahoo.com] > Sent: Friday, March 21, 2014 2:30 PM > > > To: Luke Jing Yuan > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] MDS crash when client goes to sleep > > > > Luke, > > > > Not sure what flapping ceph-mds daemon mean, but when I connected to MDS > when this happened there no longer was any process with ceph-mds when I ran > one daemon. When I ran three there were one left but wasn't doing much. I > didn't record the logs but behavior was very similar in 0.72 emperor. I am > using debian packages. > > > > Client went to sleep for a while (like 8+ hours). There was no I/O prior to > the sleep other than the fact that cephfs was still mounted. > > > > Regards, > > Hong > > > > > > From: Luke Jing Yuan > > > To: hjcho616 > Cc: Mohd Bazli Ab Karim ; > "ceph-users@lists.ceph.com" > Sent: Friday, March 21, 2014 1:17 AM > > Subject: RE: [ceph-users] MDS crash when client goes to sleep > > > Hi Hong, > > That's interesting, for Mr. Bazli and I, we ended with MDS stuck in > (up:replay) and a flapping ceph-mds daemon, but then again we are using > version 0.72.2. Having said so the triggering point seem similar to us as > well, which is the following line: > > -38> 2014-03-20 20:08:44.495565 7fee3d7c4700 0 -- 192.168.1.20:6801/17079 >>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 > c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION > > So how long did your client go into sleep? Was there any I/O prior to the > sleep? > > Regards, > Luke > > From: hjcho616 [mailto:hjcho...@yahoo.com] > Sent: Friday, 21 March, 2014 12:09 PM > To: Luke Jing Yuan > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] MDS crash when client goes to sleep > > Nope just these segfaults. > > [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp > 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000] > [211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp > 7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000] > [214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp > 7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000] > [289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp > 7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000] > [373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp > 7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000] > > Regards, > Hong > > > From: Luke Jing Yuan > To: hjcho616 > Cc: Mohd Bazli Ab Karim ; > "ceph-users@lists.ceph.com" > Sent: Thursday, March 20, 2014 10:53 PM > Subject: Re: [ceph-users] MDS crash when client goes to sleep > > Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like > that? > > Regards, > Luke > > On Mar 21, 2014, at 11:09 AM, "hjcho616" wrote: > On client, I was no longer able to access the filesystem. It would hang. > Makes sense since MDS has crashed. I tried running 3 MDS demon on the same > machine. Two crashes and one appears to be hung up(?). ceph health says MDS > is in degraded state when that happened. > > I was able to recover by restarting every node. I currently have three > machine, one with MDS and MON, and two with OSDs. > > It is failing everytime my client machine goes to sleep. If you need me to > run something let me know what and how. > > Regards, > Hong > > > From: Mohd Bazli Ab Karim > To: hjcho616 ; "ceph-users@lists.ceph.com" > > Sent: Thursday, March 20, 2014 9:40 PM > Subject: RE: [ceph-users] MDS crash when client goes to sleep > > Hi Hong, > May I know what has happened to your MDS once it crashed? W
Re: [ceph-users] why object can't be recovered when delete one replica
> I upload a file through swift API, then delete it in the “current” directory > in the secondary OSD manually, why the object can’t be recovered? > > If I delete it in the primary OSD, the object is deleted directly in the > pool .rgw.bucket and it can’t be recovered from the secondary OSD. > > Do anyone know this behavior? This is because the placement group containing that object likely needs to scrub (just a light scrub should do). The scrub will compare the two replicas, notice the replica is missing from the secondary and trigger recovery/backfill. Can you try scrubbing the placement group containing the removed object and let us know if it triggers recovery? -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mounting with dmcrypt still fails
> ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir > /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb > ceph-disk: Error: Device /dev/sdb2 is in use by a device-mapper mapping > (dm-crypt?): dm-0 It sounds like device-mapper still thinks it's using the the volume, you might be able to track it down with this: for i in `ls -1 /sys/block/ | grep sd`; do echo $i: `ls /sys/block/$i/${i}1/holders/`; done Then it's a matter of making sure there are no open file handles on the encrypted volume and unmounting it. You will still need to completely clear out the partition table on that disk, which can be tricky with GPT because it's not as simple as dd'in the start of the volume. This is what the zapdisk parameter is for in ceph-disk-prepare, I don't know enough about ceph-deploy to know if you can somehow pass it. After you know the device/dm mapping you can use udevadm to find out where it should map to (uuids replaced with xxx's): udevadm test /block/sdc/sdc1 run: '/sbin/cryptsetup --key-file /etc/ceph/dmcrypt-keys/x --key-size 256 create /dev/sdc1' run: '/bin/bash -c 'while [ ! -e /dev/mapper/x ];do sleep 1; done'' run: '/usr/sbin/ceph-disk-activate /dev/mapper/x' -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd rebalance question
> I need to add a extend server, which reside several osds, to a > running ceph cluster. During add osds, ceph would not automatically modify > the ceph.conf. So I manually modify the ceph.conf > > And restart the whole ceph cluster with command: ’service ceph –a restart’. > I just confused that if I restart the ceph cluster, ceph would rebalance the > whole data(redistribution whole data) among osds? Or just move some > > Data from existed osds to new osds? Anybody knows? It depends on how you added the OSDs, if the initial crush weight is set to 0 then no data will be moved to the OSD when it joins the cluster. Only once the weight has been increased with the rest of the OSD population will data start to move to the new OSD(s). If you add new OSD(s) with an initial weight > 0 then they will start accepting data from peers as soon as they are up/in. -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD + FlashCache vs. Cache Pool for RBD...
> One downside of the above arrangement: I read that support for mapping > newer-format RBDs is only present in fairly recent kernels. I'm running > Ubuntu 12.04 on the cluster at present with its stock 3.2 kernel. There > is a PPA for the 3.11 kernel used in Ubuntu 13.10, but if you're looking > at a new deployment it might be better to wait until 14.04: then you'll > get kernel 3.13. > > Anyone else have any ideas on the above? I don't think there are any hairy udev issues or similar that will make using a newer kernel on precise problematic. The only thing I can think of that is a caveat of this kind of setup if if you lose a hypervisor the cache will go with it and you likely wont be able to migrate the guest to another host. The alternative is to use flashcache on top of the OSD partition but then you introduce network hops and is closer to what the tiering feature will offer, except the flashcache OSD method is more particular about disk:ssd ratio, whereas in a tier the flash could be on s completely separate hosts (possibly dedicated flash machines). -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What's the difference between using /dev/sdb and /dev/sdb1 as osd?
> If I want to use a disk dedicated for osd, can I just use something like > /dev/sdb instead of /dev/sdb1? Is there any negative impact on performance? You can pass /dev/sdb to ceph-disk-prepare and it will create two partitions, one for the journal (raw partition) and one for the data volume (defaults to formatting xfs). This is known as a single device OSD, in contrast with a multi-device OSD where the journal is on a completely different device (like a partition on a shared journaling SSD). -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Error initializing cluster client: Error
> You have file config sync? > ceph.conf are same on all servers, keys also not differs. I have checked the problem now and see ceph -w working fine on all hosts. Mysterious :-/ Pavel. > 22 марта 2014 г. 16:11 пользователь "Pavel V. Kaygorodov" > написал: > Hi! > > I have two nodes with 8 OSDs on each. First node running 2 monitors on > different virtual machines (mon.1 and mon.2), second node runing mon.3 > After several reboots (I have tested power failure scenarios) "ceph -w" on > node 2 always fails with message: > > root@bes-mon3:~# ceph --verbose -w > Error initializing cluster client: Error > > Logs files are not show any error: > > 2014-03-22 16:05:51.288526 osd.3 10.92.8.103:6800/7492 3510 : [INF] 0.262 > deep-scrub ok > 2014-03-22 16:05:54.997444 osd.1 10.92.8.101:6800/7688 3288 : [INF] 1.22b > deep-scrub ok > 2014-03-22 16:06:09.350377 mon.0 10.92.8.80:6789/0 11104 : [INF] pgmap > v28682: 12288 pgs: 12288 active+clean; 246 MB data, 18131 MB used, 12928 GB / > 12945 GB avail > > 2014-03-22 16:07:24.795144 7f7bf42b4700 1 mon.3@2(peon).paxos(paxos active c > 67771..68517) is_readable now=2014-03-22 16:07:24.795145 > lease_expire=2014-03-22 16:07:29.791889 has v0 lc 68517 > 2014-03-22 16:07:27.795042 7f7bf42b4700 1 mon.3@2(peon).paxos(paxos active c > 67771..68517) is_readable now=2014-03-22 16:07:27.795044 > lease_expire=2014-03-22 16:07:32.792003 has v0 lc 68517 > > On the node 1 I have got the same error just after reboots, but now > everything seems to be ok: > > root@bastet-mon2:/# ceph -w > cluster fffeafa2-a664-48a7-979a-517e3ffa0da1 > health HEALTH_OK > monmap e3: 3 mons at > {1=10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election epoch > 62, quorum 0,1,2 1,2,3 > osdmap e680: 16 osds: 16 up, 16 in > pgmap v28692: 12288 pgs, 6 pools, 246 MB data, 36 objects > 18131 MB used, 12928 GB / 12945 GB avail >12288 active+clean > > > 2014-03-22 16:08:10.611578 mon.0 [INF] pgmap v28692: 12288 pgs: 12288 > active+clean; 246 MB data, 18131 MB used, 12928 GB / 12945 GB avail > > > > How to debug and fix "Error initializing cluster client: Error" problem ? > > With best regards, > Pavel. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Error initializing cluster client: Error
You have file config sync? 22 марта 2014 г. 16:11 пользователь "Pavel V. Kaygorodov" написал: > Hi! > > I have two nodes with 8 OSDs on each. First node running 2 monitors on > different virtual machines (mon.1 and mon.2), second node runing mon.3 > After several reboots (I have tested power failure scenarios) "ceph -w" on > node 2 always fails with message: > > root@bes-mon3:~# ceph --verbose -w > Error initializing cluster client: Error > > Logs files are not show any error: > > 2014-03-22 16:05:51.288526 osd.3 10.92.8.103:6800/7492 3510 : [INF] 0.262 > deep-scrub ok > 2014-03-22 16:05:54.997444 osd.1 10.92.8.101:6800/7688 3288 : [INF] 1.22b > deep-scrub ok > 2014-03-22 16:06:09.350377 mon.0 10.92.8.80:6789/0 11104 : [INF] pgmap > v28682: 12288 pgs: 12288 active+clean; 246 MB data, 18131 MB used, 12928 GB > / 12945 GB avail > > 2014-03-22 16:07:24.795144 7f7bf42b4700 1 mon.3@2(peon).paxos(paxos > active c 67771..68517) is_readable now=2014-03-22 16:07:24.795145 > lease_expire=2014-03-22 16:07:29.791889 has v0 lc 68517 > 2014-03-22 16:07:27.795042 7f7bf42b4700 1 mon.3@2(peon).paxos(paxos > active c 67771..68517) is_readable now=2014-03-22 16:07:27.795044 > lease_expire=2014-03-22 16:07:32.792003 has v0 lc 68517 > > On the node 1 I have got the same error just after reboots, but now > everything seems to be ok: > > root@bastet-mon2:/# ceph -w > cluster fffeafa2-a664-48a7-979a-517e3ffa0da1 > health HEALTH_OK > monmap e3: 3 mons at {1= > 10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election > epoch 62, quorum 0,1,2 1,2,3 > osdmap e680: 16 osds: 16 up, 16 in > pgmap v28692: 12288 pgs, 6 pools, 246 MB data, 36 objects > 18131 MB used, 12928 GB / 12945 GB avail >12288 active+clean > > > 2014-03-22 16:08:10.611578 mon.0 [INF] pgmap v28692: 12288 pgs: 12288 > active+clean; 246 MB data, 18131 MB used, 12928 GB / 12945 GB avail > > > > How to debug and fix "Error initializing cluster client: Error" problem ? > > With best regards, > Pavel. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Error initializing cluster client: Error
Hi! I have two nodes with 8 OSDs on each. First node running 2 monitors on different virtual machines (mon.1 and mon.2), second node runing mon.3 After several reboots (I have tested power failure scenarios) "ceph -w" on node 2 always fails with message: root@bes-mon3:~# ceph --verbose -w Error initializing cluster client: Error Logs files are not show any error: 2014-03-22 16:05:51.288526 osd.3 10.92.8.103:6800/7492 3510 : [INF] 0.262 deep-scrub ok 2014-03-22 16:05:54.997444 osd.1 10.92.8.101:6800/7688 3288 : [INF] 1.22b deep-scrub ok 2014-03-22 16:06:09.350377 mon.0 10.92.8.80:6789/0 11104 : [INF] pgmap v28682: 12288 pgs: 12288 active+clean; 246 MB data, 18131 MB used, 12928 GB / 12945 GB avail 2014-03-22 16:07:24.795144 7f7bf42b4700 1 mon.3@2(peon).paxos(paxos active c 67771..68517) is_readable now=2014-03-22 16:07:24.795145 lease_expire=2014-03-22 16:07:29.791889 has v0 lc 68517 2014-03-22 16:07:27.795042 7f7bf42b4700 1 mon.3@2(peon).paxos(paxos active c 67771..68517) is_readable now=2014-03-22 16:07:27.795044 lease_expire=2014-03-22 16:07:32.792003 has v0 lc 68517 On the node 1 I have got the same error just after reboots, but now everything seems to be ok: root@bastet-mon2:/# ceph -w cluster fffeafa2-a664-48a7-979a-517e3ffa0da1 health HEALTH_OK monmap e3: 3 mons at {1=10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election epoch 62, quorum 0,1,2 1,2,3 osdmap e680: 16 osds: 16 up, 16 in pgmap v28692: 12288 pgs, 6 pools, 246 MB data, 36 objects 18131 MB used, 12928 GB / 12945 GB avail 12288 active+clean 2014-03-22 16:08:10.611578 mon.0 [INF] pgmap v28692: 12288 pgs: 12288 active+clean; 246 MB data, 18131 MB used, 12928 GB / 12945 GB avail How to debug and fix "Error initializing cluster client: Error" problem ? With best regards, Pavel. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com