Re: [ceph-users] MDS crash when client goes to sleep

2014-03-22 Thread Yan, Zheng
On Sun, Mar 23, 2014 at 11:47 AM, Sage Weil  wrote:
> Hi,
>
> I looked at this a bit earlier and wasn't sure why we would be getting a
> remote_reset event after a sleep/wake cycle.  The patch should fix the
> crash, but I'm a bit worried something is not quite right on the client
> side, too...
>

When client wakes up, it first tries reconnecting the old session. MDS
refuses the reconnect request and sends a session close message to the
client. After receiving the session close message, client closes the
old session, then sends a session open message to the MDS.  The MDS
receives the open request and triggers a remote reset
(Pipe.cc:466)

> sage
>
> On Sun, 23 Mar 2014, Yan, Zheng wrote:
>
>> thank you for reporting this. Below patch should fix this issue
>>
>> ---
>> diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc
>> index 57c7f4a..6b53c14 100644
>> --- a/src/mds/MDS.cc
>> +++ b/src/mds/MDS.cc
>> @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
>>if (session->is_closed()) {
>>   dout(3) << "ms_handle_reset closing connection for session " <<
>> session->info.inst << dendl;
>>   messenger->mark_down(con);
>> + con->set_priv(NULL);
>>   sessionmap.remove_session(session);
>>}
>>session->put();
>> @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
>>if (session->is_closed()) {
>>   dout(3) << "ms_handle_remote_reset closing connection for session "
>> << session->info.inst << dendl;
>>   messenger->mark_down(con);
>> + con->set_priv(NULL);
>>   sessionmap.remove_session(session);
>>}
>>session->put();
>>
>> On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
>>  wrote:
>> > Hi Hong,
>> >
>> >
>> >
>> > How's the client now? Would it able to mount to the filesystem now? It 
>> > looks
>> > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
>> >
>> > However, you need to collect some logs to confirm this.
>> >
>> >
>> >
>> > Thanks.
>> >
>> >
>> >
>> >
>> >
>> > From: hjcho616 [mailto:hjcho...@yahoo.com]
>> > Sent: Friday, March 21, 2014 2:30 PM
>> >
>> >
>> > To: Luke Jing Yuan
>> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
>> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
>> >
>> >
>> >
>> > Luke,
>> >
>> >
>> >
>> > Not sure what flapping ceph-mds daemon mean, but when I connected to MDS
>> > when this happened there no longer was any process with ceph-mds when I ran
>> > one daemon.  When I ran three there were one left but wasn't doing much.  I
>> > didn't record the logs but behavior was very similar in 0.72 emperor.  I am
>> > using debian packages.
>> >
>> >
>> >
>> > Client went to sleep for a while (like 8+ hours).  There was no I/O prior 
>> > to
>> > the sleep other than the fact that cephfs was still mounted.
>> >
>> >
>> >
>> > Regards,
>> >
>> > Hong
>> >
>> >
>> >
>> > 
>> >
>> > From: Luke Jing Yuan 
>> >
>> >
>> > To: hjcho616 
>> > Cc: Mohd Bazli Ab Karim ;
>> > "ceph-users@lists.ceph.com" 
>> > Sent: Friday, March 21, 2014 1:17 AM
>> >
>> > Subject: RE: [ceph-users] MDS crash when client goes to sleep
>> >
>> >
>> > Hi Hong,
>> >
>> > That's interesting, for Mr. Bazli and I, we ended with MDS stuck in
>> > (up:replay) and a flapping ceph-mds daemon, but then again we are using
>> > version 0.72.2. Having said so the triggering point seem similar to us as
>> > well, which is the following line:
>> >
>> >   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 
>> > 192.168.1.20:6801/17079
>> >>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0
>> > c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
>> >
>> > So how long did your client go into sleep? Was there any I/O prior to the
>> > sleep?
>> >
>> > Regards,
>> > Luke
>> >
>> > From: hjcho616 [mailto:hjcho...@yahoo.com]
>> > Sent: Friday, 21 March, 2014 12:09 PM
>> > To: Luke Jing Yuan
>> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
>> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
>> >
>> > Nope just these segfaults.
>> >
>> > [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp
>> > 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
>> > [211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp
>> > 7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
>> > [214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp
>> > 7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
>> > [289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp
>> > 7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
>> > [373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp
>> > 7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]
>> >
>> > Regards,
>> > Hong
>> >
>> > 
>> > From: Luke Jing Yuan 
>> > To: hjcho616 
>> > Cc: Mohd Bazli Ab Karim ;
>> > "ceph-users@lists.ceph.

Re: [ceph-users] Error initializing cluster client: Error

2014-03-22 Thread Kyle Bader
> I have two nodes with 8 OSDs on each. First node running 2 monitors on 
> different virtual machines (mon.1 and mon.2), second node runing mon.3
> After several reboots (I have tested power failure scenarios) "ceph -w" on 
> node 2 always fails with message:
>
> root@bes-mon3:~# ceph --verbose -w
> Error initializing cluster client: Error

The cluster is simply protecting itself from a split brain situation.
Say you have:

mon.1  mon.2  mon.3

If mon.1 fails, no big deal, you still have 2/3 so no problem.

Now instead, say mon.1 is separated from mon.2 and mon.3 because of a
network partition (trunk failure, whatever). If one monitor of the
three could elect itself as leader then you might have divergence
between your monitors. Self-elected mon.1 thinks it's the leader and
mon.{2,3} have elected a leader amongst themselves. The harsh reality
is you really need to have monitors on 3 distinct physical hosts to
protect against the failure of a physical host.

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-22 Thread Sage Weil
Hi,

I looked at this a bit earlier and wasn't sure why we would be getting a 
remote_reset event after a sleep/wake cycle.  The patch should fix the 
crash, but I'm a bit worried something is not quite right on the client 
side, too...

sage

On Sun, 23 Mar 2014, Yan, Zheng wrote:

> thank you for reporting this. Below patch should fix this issue
> 
> ---
> diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc
> index 57c7f4a..6b53c14 100644
> --- a/src/mds/MDS.cc
> +++ b/src/mds/MDS.cc
> @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
>if (session->is_closed()) {
>   dout(3) << "ms_handle_reset closing connection for session " <<
> session->info.inst << dendl;
>   messenger->mark_down(con);
> + con->set_priv(NULL);
>   sessionmap.remove_session(session);
>}
>session->put();
> @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
>if (session->is_closed()) {
>   dout(3) << "ms_handle_remote_reset closing connection for session "
> << session->info.inst << dendl;
>   messenger->mark_down(con);
> + con->set_priv(NULL);
>   sessionmap.remove_session(session);
>}
>session->put();
> 
> On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
>  wrote:
> > Hi Hong,
> >
> >
> >
> > How's the client now? Would it able to mount to the filesystem now? It looks
> > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
> >
> > However, you need to collect some logs to confirm this.
> >
> >
> >
> > Thanks.
> >
> >
> >
> >
> >
> > From: hjcho616 [mailto:hjcho...@yahoo.com]
> > Sent: Friday, March 21, 2014 2:30 PM
> >
> >
> > To: Luke Jing Yuan
> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
> >
> >
> >
> > Luke,
> >
> >
> >
> > Not sure what flapping ceph-mds daemon mean, but when I connected to MDS
> > when this happened there no longer was any process with ceph-mds when I ran
> > one daemon.  When I ran three there were one left but wasn't doing much.  I
> > didn't record the logs but behavior was very similar in 0.72 emperor.  I am
> > using debian packages.
> >
> >
> >
> > Client went to sleep for a while (like 8+ hours).  There was no I/O prior to
> > the sleep other than the fact that cephfs was still mounted.
> >
> >
> >
> > Regards,
> >
> > Hong
> >
> >
> >
> > 
> >
> > From: Luke Jing Yuan 
> >
> >
> > To: hjcho616 
> > Cc: Mohd Bazli Ab Karim ;
> > "ceph-users@lists.ceph.com" 
> > Sent: Friday, March 21, 2014 1:17 AM
> >
> > Subject: RE: [ceph-users] MDS crash when client goes to sleep
> >
> >
> > Hi Hong,
> >
> > That's interesting, for Mr. Bazli and I, we ended with MDS stuck in
> > (up:replay) and a flapping ceph-mds daemon, but then again we are using
> > version 0.72.2. Having said so the triggering point seem similar to us as
> > well, which is the following line:
> >
> >   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079
> >>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0
> > c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
> >
> > So how long did your client go into sleep? Was there any I/O prior to the
> > sleep?
> >
> > Regards,
> > Luke
> >
> > From: hjcho616 [mailto:hjcho...@yahoo.com]
> > Sent: Friday, 21 March, 2014 12:09 PM
> > To: Luke Jing Yuan
> > Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
> >
> > Nope just these segfaults.
> >
> > [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp
> > 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
> > [211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp
> > 7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
> > [214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp
> > 7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
> > [289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp
> > 7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
> > [373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp
> > 7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]
> >
> > Regards,
> > Hong
> >
> > 
> > From: Luke Jing Yuan 
> > To: hjcho616 
> > Cc: Mohd Bazli Ab Karim ;
> > "ceph-users@lists.ceph.com" 
> > Sent: Thursday, March 20, 2014 10:53 PM
> > Subject: Re: [ceph-users] MDS crash when client goes to sleep
> >
> > Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like
> > that?
> >
> > Regards,
> > Luke
> >
> > On Mar 21, 2014, at 11:09 AM, "hjcho616"  wrote:
> > On client, I was no longer able to access the filesystem.  It would hang.
> > Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same
> > machine.  Two crashes and one appears to be hung up(?). ceph health says MDS
> > is in degraded state when that

Re: [ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"

2014-03-22 Thread Quenten Grasso
Hi Kyle,

Thanks, I turned on debug ms = 1 and debug osd = 10 and restarted osd.54 heres 
here's log for that one.

ceph-osd.54.log.bz2
http://www67.zippyshare.com/v/99704627/file.html


Strace osd 53,
strace.zip
http://www43.zippyshare.com/v/17581165/file.html


Thanks,
Quenten
-Original Message-
From: Kyle Bader [mailto:kyle.ba...@gmail.com] 
Sent: Sunday, 23 March 2014 12:10 PM
To: Quenten Grasso
Subject: Re: [ceph-users] OSD Restarts cause excessively high load average and 
"requests are blocked > 32 sec"

> Any ideas on why the load average goes so crazy & starts to block IO?

Could you turn on "debug ms = 1" and "debug osd = 10" prior to restarting the 
OSDs on one of your hosts and sharing the logs so we can take a look?

It also might be worth while to strace one of the OSDs to try to determine what 
it's working so hard on, maybe:

strace -fc -p   > strace.osd1.log

Thanks!

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash when client goes to sleep

2014-03-22 Thread Yan, Zheng
thank you for reporting this. Below patch should fix this issue

---
diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc
index 57c7f4a..6b53c14 100644
--- a/src/mds/MDS.cc
+++ b/src/mds/MDS.cc
@@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
   if (session->is_closed()) {
  dout(3) << "ms_handle_reset closing connection for session " <<
session->info.inst << dendl;
  messenger->mark_down(con);
+ con->set_priv(NULL);
  sessionmap.remove_session(session);
   }
   session->put();
@@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
   if (session->is_closed()) {
  dout(3) << "ms_handle_remote_reset closing connection for session "
<< session->info.inst << dendl;
  messenger->mark_down(con);
+ con->set_priv(NULL);
  sessionmap.remove_session(session);
   }
   session->put();

On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
 wrote:
> Hi Hong,
>
>
>
> How's the client now? Would it able to mount to the filesystem now? It looks
> similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
>
> However, you need to collect some logs to confirm this.
>
>
>
> Thanks.
>
>
>
>
>
> From: hjcho616 [mailto:hjcho...@yahoo.com]
> Sent: Friday, March 21, 2014 2:30 PM
>
>
> To: Luke Jing Yuan
> Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] MDS crash when client goes to sleep
>
>
>
> Luke,
>
>
>
> Not sure what flapping ceph-mds daemon mean, but when I connected to MDS
> when this happened there no longer was any process with ceph-mds when I ran
> one daemon.  When I ran three there were one left but wasn't doing much.  I
> didn't record the logs but behavior was very similar in 0.72 emperor.  I am
> using debian packages.
>
>
>
> Client went to sleep for a while (like 8+ hours).  There was no I/O prior to
> the sleep other than the fact that cephfs was still mounted.
>
>
>
> Regards,
>
> Hong
>
>
>
> 
>
> From: Luke Jing Yuan 
>
>
> To: hjcho616 
> Cc: Mohd Bazli Ab Karim ;
> "ceph-users@lists.ceph.com" 
> Sent: Friday, March 21, 2014 1:17 AM
>
> Subject: RE: [ceph-users] MDS crash when client goes to sleep
>
>
> Hi Hong,
>
> That's interesting, for Mr. Bazli and I, we ended with MDS stuck in
> (up:replay) and a flapping ceph-mds daemon, but then again we are using
> version 0.72.2. Having said so the triggering point seem similar to us as
> well, which is the following line:
>
>   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079
>>> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0
> c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
>
> So how long did your client go into sleep? Was there any I/O prior to the
> sleep?
>
> Regards,
> Luke
>
> From: hjcho616 [mailto:hjcho...@yahoo.com]
> Sent: Friday, 21 March, 2014 12:09 PM
> To: Luke Jing Yuan
> Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] MDS crash when client goes to sleep
>
> Nope just these segfaults.
>
> [149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp
> 7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
> [211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp
> 7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
> [214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp
> 7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
> [289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp
> 7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
> [373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp
> 7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]
>
> Regards,
> Hong
>
> 
> From: Luke Jing Yuan 
> To: hjcho616 
> Cc: Mohd Bazli Ab Karim ;
> "ceph-users@lists.ceph.com" 
> Sent: Thursday, March 20, 2014 10:53 PM
> Subject: Re: [ceph-users] MDS crash when client goes to sleep
>
> Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like
> that?
>
> Regards,
> Luke
>
> On Mar 21, 2014, at 11:09 AM, "hjcho616"  wrote:
> On client, I was no longer able to access the filesystem.  It would hang.
> Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same
> machine.  Two crashes and one appears to be hung up(?). ceph health says MDS
> is in degraded state when that happened.
>
> I was able to recover by restarting every node.  I currently have three
> machine, one with MDS and MON, and two with OSDs.
>
> It is failing everytime my client machine goes to sleep.  If you need me to
> run something let me know what and how.
>
> Regards,
> Hong
>
> 
> From: Mohd Bazli Ab Karim 
> To: hjcho616 ; "ceph-users@lists.ceph.com"
> 
> Sent: Thursday, March 20, 2014 9:40 PM
> Subject: RE: [ceph-users] MDS crash when client goes to sleep
>
> Hi Hong,
> May I know what has happened to your MDS once it crashed? W

Re: [ceph-users] why object can't be recovered when delete one replica

2014-03-22 Thread Kyle Bader
> I upload a file through swift API, then delete it in the “current” directory
> in the secondary OSD manually, why the object can’t be recovered?
>
> If I delete it in the primary OSD, the object is deleted directly in the
> pool .rgw.bucket and it can’t be recovered from the secondary OSD.
>
> Do anyone know this behavior?

This is because the placement group containing that object likely
needs to scrub (just a light scrub should do). The scrub will compare
the two replicas, notice the replica is missing from the secondary and
trigger recovery/backfill. Can you try scrubbing the placement group
containing the removed object and let us know if it triggers recovery?

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mounting with dmcrypt still fails

2014-03-22 Thread Kyle Bader
> ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir 
> /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb
> ceph-disk: Error: Device /dev/sdb2 is in use by a device-mapper mapping 
> (dm-crypt?): dm-0

It sounds like device-mapper still thinks it's using the the volume,
you might be able to track it down with this:

for i in `ls -1 /sys/block/ | grep sd`; do echo $i: `ls
/sys/block/$i/${i}1/holders/`; done

Then it's a matter of making sure there are no open file handles on
the encrypted volume and unmounting it. You will still need to
completely clear out the partition table on that disk, which can be
tricky with GPT because it's not as simple as dd'in the start of the
volume. This is what the zapdisk parameter is for in
ceph-disk-prepare, I don't know enough about ceph-deploy to know if
you can somehow pass it.

After you know the device/dm mapping you can use udevadm to find out
where it should map to (uuids replaced with xxx's):

udevadm test /block/sdc/sdc1

run: '/sbin/cryptsetup --key-file /etc/ceph/dmcrypt-keys/x
--key-size 256 create  /dev/sdc1'
run: '/bin/bash -c 'while [ ! -e /dev/mapper/x ];do sleep 1; done''
run: '/usr/sbin/ceph-disk-activate /dev/mapper/x'

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd rebalance question

2014-03-22 Thread Kyle Bader
>  I need to add a extend server, which reside several osds, to a
> running ceph cluster. During add osds, ceph would not automatically modify
> the ceph.conf. So I manually modify the ceph.conf
>
> And restart the whole ceph cluster with command: ’service ceph –a restart’.
> I just confused that if I restart the ceph cluster, ceph would rebalance the
> whole data(redistribution whole data) among osds? Or just move some
>
> Data from existed osds to new osds? Anybody knows?

It depends on how you added the OSDs, if the initial crush weight is
set to 0 then no data will be moved to the OSD when it joins the
cluster. Only once the weight has been increased with the rest of the
OSD population will data start to move to the new OSD(s). If you add
new OSD(s) with an initial weight > 0 then they will start accepting
data from peers as soon as they are up/in.

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD + FlashCache vs. Cache Pool for RBD...

2014-03-22 Thread Kyle Bader
> One downside of the above arrangement: I read that support for mapping
> newer-format RBDs is only present in fairly recent kernels.  I'm running
> Ubuntu 12.04 on the cluster at present with its stock 3.2 kernel.  There
> is a PPA for the 3.11 kernel used in Ubuntu 13.10, but if you're looking
> at a new deployment it might be better to wait until 14.04: then you'll
> get kernel 3.13.
>
> Anyone else have any ideas on the above?

I don't think there are any hairy udev issues or similar that will
make using a newer kernel on precise problematic. The only thing I can
think of that is a caveat of this kind of setup if if you lose a
hypervisor the cache will go with it and you likely wont be able to
migrate the guest to another host. The alternative is to use
flashcache on top of the OSD partition but then you introduce network
hops and is closer to what the tiering feature will offer, except the
flashcache OSD method is more particular about disk:ssd ratio, whereas
in a tier the flash could be on s completely separate hosts (possibly
dedicated flash machines).

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What's the difference between using /dev/sdb and /dev/sdb1 as osd?

2014-03-22 Thread Kyle Bader
> If I want to use a disk dedicated for osd, can I just use something like
> /dev/sdb instead of /dev/sdb1? Is there any negative impact on performance?

You can pass /dev/sdb to ceph-disk-prepare and it will create two
partitions, one for the journal (raw partition) and one for the data
volume (defaults to formatting xfs). This is known as a single device
OSD, in contrast with a multi-device OSD where the journal is on a
completely different device (like a partition on a shared journaling
SSD).

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error initializing cluster client: Error

2014-03-22 Thread Pavel V. Kaygorodov
> You have file config sync?
> 
ceph.conf are same on all servers, keys also not differs.
I have checked the problem now and see ceph -w working fine on all hosts.
Mysterious :-/

Pavel.



> 22 марта 2014 г. 16:11 пользователь "Pavel V. Kaygorodov"  
> написал:
> Hi!
> 
> I have two nodes with 8 OSDs on each. First node running 2 monitors on 
> different virtual machines (mon.1 and mon.2), second node runing mon.3
> After several reboots (I have tested power failure scenarios) "ceph -w" on 
> node 2 always fails with message:
> 
> root@bes-mon3:~# ceph --verbose -w
> Error initializing cluster client: Error
> 
> Logs files are not show any error:
> 
> 2014-03-22 16:05:51.288526 osd.3 10.92.8.103:6800/7492 3510 : [INF] 0.262 
> deep-scrub ok
> 2014-03-22 16:05:54.997444 osd.1 10.92.8.101:6800/7688 3288 : [INF] 1.22b 
> deep-scrub ok
> 2014-03-22 16:06:09.350377 mon.0 10.92.8.80:6789/0 11104 : [INF] pgmap 
> v28682: 12288 pgs: 12288 active+clean; 246 MB data, 18131 MB used, 12928 GB / 
> 12945 GB avail
> 
> 2014-03-22 16:07:24.795144 7f7bf42b4700  1 mon.3@2(peon).paxos(paxos active c 
> 67771..68517) is_readable now=2014-03-22 16:07:24.795145 
> lease_expire=2014-03-22 16:07:29.791889 has v0 lc 68517
> 2014-03-22 16:07:27.795042 7f7bf42b4700  1 mon.3@2(peon).paxos(paxos active c 
> 67771..68517) is_readable now=2014-03-22 16:07:27.795044 
> lease_expire=2014-03-22 16:07:32.792003 has v0 lc 68517
> 
> On the node 1 I have got the same error just after reboots, but now 
> everything seems to be ok:
> 
> root@bastet-mon2:/# ceph -w
> cluster fffeafa2-a664-48a7-979a-517e3ffa0da1
>  health HEALTH_OK
>  monmap e3: 3 mons at 
> {1=10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election epoch 
> 62, quorum 0,1,2 1,2,3
>  osdmap e680: 16 osds: 16 up, 16 in
>   pgmap v28692: 12288 pgs, 6 pools, 246 MB data, 36 objects
> 18131 MB used, 12928 GB / 12945 GB avail
>12288 active+clean
> 
> 
> 2014-03-22 16:08:10.611578 mon.0 [INF] pgmap v28692: 12288 pgs: 12288 
> active+clean; 246 MB data, 18131 MB used, 12928 GB / 12945 GB avail
> 
> 
> 
> How to debug and fix "Error initializing cluster client: Error" problem ?
> 
> With best regards,
>   Pavel.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error initializing cluster client: Error

2014-03-22 Thread Ирек Фасихов
You have file config sync?
22 марта 2014 г. 16:11 пользователь "Pavel V. Kaygorodov" 
написал:

> Hi!
>
> I have two nodes with 8 OSDs on each. First node running 2 monitors on
> different virtual machines (mon.1 and mon.2), second node runing mon.3
> After several reboots (I have tested power failure scenarios) "ceph -w" on
> node 2 always fails with message:
>
> root@bes-mon3:~# ceph --verbose -w
> Error initializing cluster client: Error
>
> Logs files are not show any error:
>
> 2014-03-22 16:05:51.288526 osd.3 10.92.8.103:6800/7492 3510 : [INF] 0.262
> deep-scrub ok
> 2014-03-22 16:05:54.997444 osd.1 10.92.8.101:6800/7688 3288 : [INF] 1.22b
> deep-scrub ok
> 2014-03-22 16:06:09.350377 mon.0 10.92.8.80:6789/0 11104 : [INF] pgmap
> v28682: 12288 pgs: 12288 active+clean; 246 MB data, 18131 MB used, 12928 GB
> / 12945 GB avail
>
> 2014-03-22 16:07:24.795144 7f7bf42b4700  1 mon.3@2(peon).paxos(paxos
> active c 67771..68517) is_readable now=2014-03-22 16:07:24.795145
> lease_expire=2014-03-22 16:07:29.791889 has v0 lc 68517
> 2014-03-22 16:07:27.795042 7f7bf42b4700  1 mon.3@2(peon).paxos(paxos
> active c 67771..68517) is_readable now=2014-03-22 16:07:27.795044
> lease_expire=2014-03-22 16:07:32.792003 has v0 lc 68517
>
> On the node 1 I have got the same error just after reboots, but now
> everything seems to be ok:
>
> root@bastet-mon2:/# ceph -w
> cluster fffeafa2-a664-48a7-979a-517e3ffa0da1
>  health HEALTH_OK
>  monmap e3: 3 mons at {1=
> 10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election
> epoch 62, quorum 0,1,2 1,2,3
>  osdmap e680: 16 osds: 16 up, 16 in
>   pgmap v28692: 12288 pgs, 6 pools, 246 MB data, 36 objects
> 18131 MB used, 12928 GB / 12945 GB avail
>12288 active+clean
>
>
> 2014-03-22 16:08:10.611578 mon.0 [INF] pgmap v28692: 12288 pgs: 12288
> active+clean; 246 MB data, 18131 MB used, 12928 GB / 12945 GB avail
>
> 
>
> How to debug and fix "Error initializing cluster client: Error" problem ?
>
> With best regards,
>   Pavel.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error initializing cluster client: Error

2014-03-22 Thread Pavel V. Kaygorodov
Hi!

I have two nodes with 8 OSDs on each. First node running 2 monitors on 
different virtual machines (mon.1 and mon.2), second node runing mon.3
After several reboots (I have tested power failure scenarios) "ceph -w" on node 
2 always fails with message:

root@bes-mon3:~# ceph --verbose -w
Error initializing cluster client: Error

Logs files are not show any error:

2014-03-22 16:05:51.288526 osd.3 10.92.8.103:6800/7492 3510 : [INF] 0.262 
deep-scrub ok
2014-03-22 16:05:54.997444 osd.1 10.92.8.101:6800/7688 3288 : [INF] 1.22b 
deep-scrub ok
2014-03-22 16:06:09.350377 mon.0 10.92.8.80:6789/0 11104 : [INF] pgmap v28682: 
12288 pgs: 12288 active+clean; 246 MB data, 18131 MB used, 12928 GB / 12945 GB 
avail

2014-03-22 16:07:24.795144 7f7bf42b4700  1 mon.3@2(peon).paxos(paxos active c 
67771..68517) is_readable now=2014-03-22 16:07:24.795145 
lease_expire=2014-03-22 16:07:29.791889 has v0 lc 68517
2014-03-22 16:07:27.795042 7f7bf42b4700  1 mon.3@2(peon).paxos(paxos active c 
67771..68517) is_readable now=2014-03-22 16:07:27.795044 
lease_expire=2014-03-22 16:07:32.792003 has v0 lc 68517

On the node 1 I have got the same error just after reboots, but now everything 
seems to be ok:

root@bastet-mon2:/# ceph -w
cluster fffeafa2-a664-48a7-979a-517e3ffa0da1
 health HEALTH_OK
 monmap e3: 3 mons at 
{1=10.92.8.80:6789/0,2=10.92.8.81:6789/0,3=10.92.8.82:6789/0}, election epoch 
62, quorum 0,1,2 1,2,3
 osdmap e680: 16 osds: 16 up, 16 in
  pgmap v28692: 12288 pgs, 6 pools, 246 MB data, 36 objects
18131 MB used, 12928 GB / 12945 GB avail
   12288 active+clean


2014-03-22 16:08:10.611578 mon.0 [INF] pgmap v28692: 12288 pgs: 12288 
active+clean; 246 MB data, 18131 MB used, 12928 GB / 12945 GB avail



How to debug and fix "Error initializing cluster client: Error" problem ?

With best regards,
  Pavel.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com