Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-09 Thread Matthew Anderson
So I've had a chance to re-visit this since Bécholey Alexandre was kind
enough to let me know how to compile Ceph with the RDMACM library (thankyou
again!).

At this stage it compiles and runs but there appears to be a problem with
calling rshutdown in Pipe as it seems to just wait forever for the pipe to
close which causes commands like 'ceph osd tree' to hang indefinitely after
they work successfully. Debug MS is here - http://pastebin.com/WzMJNKZY

I also tried RADOS bench but it appears to be doing something similar.
Debug MS is here - http://pastebin.com/3aXbjzqS

It seems like it's very close to working... I must be missing something
small that's causing some grief. You can see the OSD coming up in the ceph
monitor and the PG's all become active+clean. When shutting down the
monitor I get the below which show's it waiting for the pipes to close -

2013-08-09 15:08:31.339394 7f4643cfd700 20 accepter.accepter closing
2013-08-09 15:08:31.382075 7f4643cfd700 10 accepter.accepter stopping
2013-08-09 15:08:31.382115 7f464bd397c0 20 -- 172.16.0.1:6789/0 wait:
stopped accepter thread
2013-08-09 15:08:31.382127 7f464bd397c0 20 -- 172.16.0.1:6789/0 wait:
stopping reaper thread
2013-08-09 15:08:31.382146 7f4645500700 10 -- 172.16.0.1:6789/0reaper_entry done
2013-08-09 15:08:31.382182 7f464bd397c0 20 -- 172.16.0.1:6789/0 wait:
stopped reaper thread
2013-08-09 15:08:31.382194 7f464bd397c0 10 -- 172.16.0.1:6789/0 wait:
closing pipes
2013-08-09 15:08:31.382200 7f464bd397c0 10 -- 172.16.0.1:6789/0 reaper
2013-08-09 15:08:31.382205 7f464bd397c0 10 -- 172.16.0.1:6789/0 reaper done
2013-08-09 15:08:31.382210 7f464bd397c0 10 -- 172.16.0.1:6789/0 wait:
waiting for pipes 0x3014c80,0x3015180,0x3015400 to close

The git repo has been updated if anyone has a few spare minutes to take a
look - https://github.com/funkBuild/ceph-rsockets

Thanks again
-Matt





On Thu, Jun 20, 2013 at 5:09 PM, Matthew Anderson
manderson8...@gmail.comwrote:

 Hi All,

 I've had a few conversations on IRC about getting RDMA support into Ceph
 and thought I would give it a quick attempt to hopefully spur some
 interest. What I would like to accomplish is an RSockets only
 implementation so I'm able to use Ceph, RBD and QEMU at full speed over an
 Infiniband fabric.

 What I've tried to do is port Pipe.cc and Acceptor.cc to rsockets by
 replacing the regular socket calls with the rsocket equivalent.
 Unfortunately it doesn't compile and I get an error of -

  CXXLD  ceph-osd
 ./.libs/libglobal.a(libcommon_la-Accepter.o): In function
 `Accepter::stop()':
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:243: undefined
 reference to `rshutdown'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:251: undefined
 reference to `rclose'
 ./.libs/libglobal.a(libcommon_la-Accepter.o): In function
 `Accepter::entry()':
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:213: undefined
 reference to `raccept'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:230: undefined
 reference to `rclose'
 ./.libs/libglobal.a(libcommon_la-Accepter.o): In function
 `Accepter::bind(entity_addr_t const, int, int)':
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:61: undefined
 reference to `rsocket'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:80: undefined
 reference to `rsetsockopt'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:87: undefined
 reference to `rbind'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:118: undefined
 reference to `rgetsockname'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:128: undefined
 reference to `rlisten'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:100: undefined
 reference to `rbind'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:87: undefined
 reference to `rbind'
 ./.libs/libglobal.a(libcommon_la-Pipe.o): In function
 `Pipe::tcp_write(char const*, int)':
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Pipe.cc:2175: undefined
 reference to `rsend'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Pipe.cc:2162: undefined
 reference to `rshutdown'
 ./.libs/libglobal.a(libcommon_la-Pipe.o): In function
 `Pipe::do_sendmsg(msghdr*, int, bool)':
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Pipe.cc:1867: undefined
 reference to `rsendmsg'
 ./.libs/libglobal.a(libcommon_la-Pipe.o): In function
 `Pipe::tcp_read_nonblocking(char*, int)':
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Pipe.cc:2129: undefined
 reference to `rrecv'
 ./.libs/libglobal.a(libcommon_la-Pipe.o): In function
 `Pipe::tcp_read(char*, int)':
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Pipe.cc:2079: undefined
 reference to `rshutdown'
 ./.libs/libglobal.a(libcommon_la-Pipe.o): In function `Pipe::connect()':
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Pipe.cc:768: undefined
 reference to `rclose'
 /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Pipe.cc:773: undefined
 reference to `rsocket'
 

Re: [ceph-users] Openstack glance ceph rbd_store_user authentification problem

2013-08-09 Thread Steffen Thorhauer

Hi,
thanks for your answers. It was my fault. I configured all at the 
beginning of the [DEFAULT] section of glance-api.conf and
overlooked the default settings later ( the default ubuntu 
glance-api.conf has later  a default  RBD Store Options part )



On 08/08/2013 05:04 PM, Josh Durgin wrote:

On 08/08/2013 06:01 AM, Steffen Thorhauer wrote:

Hi,
recently I had a problem with openstack glance and ceph.
I used the
http://ceph.com/docs/master/rbd/rbd-openstack/#configuring-glance
documentation and
http://docs.openstack.org/developer/glance/configuring.html 
documentation

I'm using ubuntu 12.04 LTS with grizzly from Ubuntu Cloud Archive and
ceph 61.7.

glance-api.conf had following config options

default_store = rbd
rbd_store_user=images
rbd_store_pool = images
rbd_store_ceph_conf = /etc/ceph/ceph.conf


All the time when doing glance image create I get errors. In the glance
api log I only found error like

2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images Traceback (most
recent call last):
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File
/usr/lib/python2.7/dist-packages/glance/api/v1/images.py, line 444, in
_upload
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images 
image_meta['size'])

2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File
/usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 241, in add
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images with
rados.Rados(conffile=self.conf_file, rados_id=self.user) as conn:
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File
/usr/lib/python2.7/dist-packages/rados.py, line 134, in __enter__
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images self.connect()
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File
/usr/lib/python2.7/dist-packages/rados.py, line 192, in connect
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images raise
make_ex(ret, error calling connect)
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images ObjectNotFound:
error calling connect

This trace message helped me not very much :-(
My google search glance.api.v1.images ObjectNotFound: error calling
connect did only find
http://irclogs.ceph.widodh.nl/index.php?date=2012-10-26
This  points me to an ceph authentification problem. But the ceph tools
worked fine for me.
The I tried the debug option in glance-api.conf and I found following
entry .

DEBUG glance.common.config [-] rbd_store_pool = images
log_opt_values /usr/lib/python2.7/dist-packages/oslo/config/cfg.py:1485
DEBUG glance.common.config [-] rbd_store_user = glance
log_opt_values /usr/lib/python2.7/dist-packages/oslo/config/cfg.py:1485

The glance-api service  did not use my rbd_store_user = images option!!
Then I configured a client.glance auth and it worked with the
implicit glance user!!!

Now my question: Am I the only one with this problem??


I've seen people have this issue before due to the way the 
glance-api.conf can have multiple sections.


Make sure those rbd settings are in the [DEFAULT] section, not just
at the bottom of the file (which may be a different section).


Regards,
   Steffen Thorhauer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy behind corporate firewalls

2013-08-09 Thread Luc Dumaine
Hi,

I was able to use ceph-deploy behind a proxy, by defining the appropriate 
environment variables used by wget..

I.E. on ubuntu just add to /etc/environnement:

http_proxy=http://host:port
ftp_proxy=http://host:port
https_proxy=http://host:port


Regard, Luc.


- Mail original -
De: Harvey Skinner hpmpe...@gmail.com
À: ceph-users@lists.ceph.com
Cc: Harvey Skinner hpmpe...@gmail.com
Envoyé: Vendredi 9 Août 2013 05:48:35
Objet: [ceph-users] ceph-deploy behind corporate firewalls

 hi all,

I am not sure if I am the only one having issues with ceph-deploy
behind a firewall or not.  I haven't seen any other reports of similar
issues yet.  With http proxies I am able to have apt-get working, but
wget is still an issue.

Working to use the newer ceph-deploy mechanism to deploy my next POC
set up on four storage nodes.   The ceph-deploy install process
unfortunately uses wget to retrieve the Ceph release key and failing
the install.   To get around this i can manually add the Ceph release
key on all my nodes and apt-get install all the Ceph packages.
Question though is whether there is anything else that ceph-deploy
does that I would need to do manually to have everything in state
where ceph-deploy would work correctly for the rest of the cluster
setup and deployment, i.e. ceph-deploy new  -and- ceph-deploy mon
create, etc.?

thank you,
Harvey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
Le SITIV décline toute responsabilité quant au contenu de ce message. Ce
message et les pièces jointes qui y sont attachées sont confidentiels et
établis à  l'attention exclusive de leur destinataire. Si vous pensez l'avoir
reçu par erreur, merci de bien vouloir en aviser immédiatement l'expéditeur,
de ne pas l'utiliser sous quelque forme que ce soit et de le détruire
immédiatement. Toute divulgation, utilisation, diffusion ou reproduction du
message ou des informations qu'il contient doit être préalablement autorisée
par l'expéditeur.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Jeff Moskow
Hi,

I have a 5 node ceph cluster that is running well (no problems using 
any of the
rbd images and that's really all we use).  

I have replication set to 3 on all three pools (data, metadata and rbd).

ceph -s reports:
health HEALTH_WARN 3 pgs degraded; 114 pgs stuck unclean; 
recovery 5746/384795 degraded (1.493%)

I have tried everything I could think of to clear/fix those errors and 
they persist.

Most of them appear to be a problem with not having 3 copies

0.2a0   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.874427  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 08:59:34.035198  0'0 2013-07-29 01:49:40.018625
4.1d9   260 0   238 0   1021055488  0   0   
active+remapped 2013-08-06 05:56:20.447612  21920'12710 21920'53408 
[6,13]  [6,13,4]0'0 2013-08-05 06:59:44.717555  0'0 2013-08-05 
06:59:44.717555
1.1dc   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687830  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:51.226012  0'0 2013-07-28 23:47:13.404512
0.1dd   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687525  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:45.258459  0'0 2013-08-01 05:58:17.141625
1.29f   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.882865  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 09:01:40.075441  0'0 2013-07-29 01:53:10.068503
1.118   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.081067  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:20:03.933842  0'0 2034-02-12 23:20:03.933842
0.119   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.095446  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:18:07.310080  0'0 2034-02-12 23:18:07.310080
4.115   248 0   226 0   987364352   0   0   
active+remapped 2013-08-06 05:50:34.112139  21920'6840  21920'42982 
[8,15]  [8,15,5]0'0 2013-08-05 06:59:18.303823  0'0 2013-08-05 
06:59:18.303823
4.4a241 0   286 0   941573120   0   0   
active+degraded 2013-08-06 12:00:47.758742  21920'85238 21920'206648
[4,6]   [4,6]   0'0 2013-08-05 06:58:36.681726  0'0 2013-08-05 
06:58:36.681726
0.4e0   0   0   0   0   0   0   active+remapped 
2013-08-06 12:00:47.765391  0'0 21920'489   [4,6]   [4,6,1] 0'0 
2013-08-04 08:58:12.783265  0'0 2013-07-28 14:21:38.227970


Can anyone suggest a way to clear this up?

Thanks!
Jeff


-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Wido den Hollander

On 08/09/2013 10:58 AM, Jeff Moskow wrote:

Hi,

I have a 5 node ceph cluster that is running well (no problems using 
any of the
rbd images and that's really all we use).

I have replication set to 3 on all three pools (data, metadata and rbd).

ceph -s reports:
health HEALTH_WARN 3 pgs degraded; 114 pgs stuck unclean; 
recovery 5746/384795 degraded (1.493%)

I have tried everything I could think of to clear/fix those errors and 
they persist.



Did you restart the primary OSD for that PGs?

Wido


Most of them appear to be a problem with not having 3 copies

0.2a0   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.874427  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 08:59:34.035198  0'0 2013-07-29 01:49:40.018625
4.1d9   260 0   238 0   1021055488  0   0   
active+remapped 2013-08-06 05:56:20.447612  21920'12710 21920'53408 
[6,13]  [6,13,4]0'0 2013-08-05 06:59:44.717555  0'0 2013-08-05 
06:59:44.717555
1.1dc   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687830  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:51.226012  0'0 2013-07-28 23:47:13.404512
0.1dd   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687525  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:45.258459  0'0 2013-08-01 05:58:17.141625
1.29f   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.882865  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 09:01:40.075441  0'0 2013-07-29 01:53:10.068503
1.118   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.081067  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:20:03.933842  0'0 2034-02-12 23:20:03.933842
0.119   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.095446  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:18:07.310080  0'0 2034-02-12 23:18:07.310080
4.115   248 0   226 0   987364352   0   0   
active+remapped 2013-08-06 05:50:34.112139  21920'6840  21920'42982 
[8,15]  [8,15,5]0'0 2013-08-05 06:59:18.303823  0'0 2013-08-05 
06:59:18.303823
4.4a241 0   286 0   941573120   0   0   
active+degraded 2013-08-06 12:00:47.758742  21920'85238 21920'206648
[4,6]   [4,6]   0'0 2013-08-05 06:58:36.681726  0'0 2013-08-05 
06:58:36.681726
0.4e0   0   0   0   0   0   0   active+remapped 
2013-08-06 12:00:47.765391  0'0 21920'489   [4,6]   [4,6,1] 0'0 
2013-08-04 08:58:12.783265  0'0 2013-07-28 14:21:38.227970


Can anyone suggest a way to clear this up?

Thanks!
Jeff





--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-09 Thread Oliver Francke

Hi Josh,

just opened

http://tracker.ceph.com/issues/5919

with all collected information incl. debug-log.

Hope it helps,

Oliver.

On 08/08/2013 07:01 PM, Josh Durgin wrote:

On 08/08/2013 05:40 AM, Oliver Francke wrote:

Hi Josh,

I have a session logged with:

 debug_ms=1:debug_rbd=20:debug_objectcacher=30

as you requested from Mike, even if I think, we do have another story
here, anyway.

Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is
3.2.0-51-amd...

Do you want me to open a ticket for that stuff? I have about 5MB
compressed logfile waiting for you ;)


Yes, that'd be great. If you could include the time when you saw the 
guest hang that'd be ideal. I'm not sure if this is one or two bugs,

but it seems likely it's a bug in rbd and not qemu.

Thanks!
Josh


Thnx in advance,

Oliver.

On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:

On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

We can un-wedge the guest by opening a NoVNC session or running a
'virsh screenshot' command. After that, the guest resumes and runs
as expected. At that point we can examine the guest. Each time we'll
see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the 
guest no

longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but 
also

on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver 
and

Mike, are a result of the same bug.  At least I hope they are :).

Stefan








--

Oliver Francke

filoo GmbH
Moltkestraße 25a
0 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] All old pgs in stale after recreating all osds

2013-08-09 Thread Da Chun
On Centos 6.4, Ceph 0.61.7.
I had a ceph cluster of 9 osds. Today I destroyed all of the osds, and 
recreated 6 new ones.
Then I find all the old pgs are in stale.
[root@ceph0 ceph]# ceph -s
   health HEALTH_WARN 192 pgs stale; 192 pgs stuck inactive; 192 pgs stuck 
stale; 192 pgs stuck unclean
   monmap e1: 3 mons at 
{ceph0=172.18.11.60:6789/0,ceph1=172.18.11.61:6789/0,ceph2=172.18.11.62:6789/0},
 election epoch 24, quorum 0,1,2 ceph0,ceph1,ceph2
   osdmap e166: 6 osds: 6 up, 6 in
pgmap v837: 192 pgs: 192 stale; 9526 bytes data, 221 MB used, 5586 GB / 
5586 GB avail
   mdsmap e114: 0/0/1 up



[root@ceph0 ~]# ceph health detail
...
pg 2.3 is stuck stale for 10249.230667, current state stale, last acting [5]
...
[root@ceph0 ~]# ceph pg 2.3 query
i don't have pgid 2.3



How can I get all the pgs back or recreated?


Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Jeff Moskow
Thanks for the suggestion.  I had tried stopping each OSD for 30 
seconds, then restarting it, waiting 2 minutes and then doing the next 
one (all OSD's eventually restarted).  I tried this twice.


--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mounting a pool via fuse

2013-08-09 Thread Georg Höllrigl

Hi,

I'm using ceph 0.61.7.

When using ceph-fuse, I couldn't find a way, to only mount one pool.

Is there a way to mount a pool - or is it simply not supported?



Kind Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] do we need to install ceph on KVM hypervisor for cloudstack-ceph intergration

2013-08-09 Thread Suresh Sadhu
HI,

To access the storage cluster from kvm hypervisor what are the packages need to 
install on kvm hypervisor(do  we need to install qemu,ceph on KVM host? For 
cloudstack-ceph integration).

MY hypervisor version  is rhel6.3.

Regards
Sadhu





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] do we need to install ceph on KVM hypervisor for cloudstack-ceph intergration

2013-08-09 Thread Wido den Hollander

On 08/09/2013 01:51 PM, Suresh Sadhu wrote:

HI,

To access the storage cluster from kvm hypervisor what are the packages
need to install on kvm hypervisor(do  we need to install qemu,ceph on
KVM host? For cloudstack-ceph integration).



You only need librbd and librados

The Ceph CLI tools and such are not mandatory, but won't hurt anything.

Both libvirt and Qemu will link against librbd which links to librados.

Wido


MY hypervisor version  is rhel6.3.

Regards

Sadhu



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] All old pgs in stale after recreating all osds

2013-08-09 Thread Da Chun
On Centos 6.4, Ceph 0.61.7.
I had a ceph cluster of 9 osds. Today I destroyed all of the osds, and 
recreated 6 new ones.
Then I find all the old pgs are in stale.
[root@ceph0 ceph]# ceph -s
   health HEALTH_WARN 192 pgs stale; 192 pgs stuck inactive; 192 pgs stuck 
stale; 192 pgs stuck unclean
   monmap e1: 3 mons at 
{ceph0=172.18.11.60:6789/0,ceph1=172.18.11.61:6789/0,ceph2=172.18.11.62:6789/0},
 election epoch 24, quorum 0,1,2 ceph0,ceph1,ceph2
   osdmap e166: 6 osds: 6 up, 6 in
pgmap v837: 192 pgs: 192 stale; 9526 bytes data, 221 MB used, 5586 GB / 
5586 GB avail
   mdsmap e114: 0/0/1 up



[root@ceph0 ~]# ceph health detail
...
pg 2.3 is stuck stale for 10249.230667, current state stale, last acting [5]
...
[root@ceph0 ~]# ceph pg 2.3 query
i don't have pgid 2.3



How can I get all the pgs back or recreated?


Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-09 Thread Andrei Mikhailovsky
I can confirm that I am having similar issues with ubuntu vm guests using fio 
with bs=4k direct=1 numjobs=4 iodepth=16. Occasionally i see hang tasks, 
occasionally guest vm stops responding without leaving anything in the logs and 
sometimes i see kernel panic on the console. I typically leave the runtime of 
the fio test for 60 minutes and it tends to stop responding after about 10-30 
mins. 

I am on ubuntu 12.04 with 3.5 kernel backport and using ceph 0.61.7 with qemu 
1.5.0 and libvirt 1.0.2 

Andrei 
- Original Message -

From: Oliver Francke oliver.fran...@filoo.de 
To: Josh Durgin josh.dur...@inktank.com 
Cc: ceph-users@lists.ceph.com, Mike Dawson mike.daw...@cloudapt.com, 
Stefan Hajnoczi stefa...@redhat.com, qemu-de...@nongnu.org 
Sent: Friday, 9 August, 2013 10:22:00 AM 
Subject: Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, 
heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive 
qemu-process, [Qemu-devel] [Bug 1207686] 

Hi Josh, 

just opened 

http://tracker.ceph.com/issues/5919 

with all collected information incl. debug-log. 

Hope it helps, 

Oliver. 

On 08/08/2013 07:01 PM, Josh Durgin wrote: 
 On 08/08/2013 05:40 AM, Oliver Francke wrote: 
 Hi Josh, 
 
 I have a session logged with: 
 
 debug_ms=1:debug_rbd=20:debug_objectcacher=30 
 
 as you requested from Mike, even if I think, we do have another story 
 here, anyway. 
 
 Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is 
 3.2.0-51-amd... 
 
 Do you want me to open a ticket for that stuff? I have about 5MB 
 compressed logfile waiting for you ;) 
 
 Yes, that'd be great. If you could include the time when you saw the 
 guest hang that'd be ideal. I'm not sure if this is one or two bugs, 
 but it seems likely it's a bug in rbd and not qemu. 
 
 Thanks! 
 Josh 
 
 Thnx in advance, 
 
 Oliver. 
 
 On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote: 
 On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote: 
 Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com: 
 We can un-wedge the guest by opening a NoVNC session or running a 
 'virsh screenshot' command. After that, the guest resumes and runs 
 as expected. At that point we can examine the guest. Each time we'll 
 see: 
 If virsh screenshot works then this confirms that QEMU itself is still 
 responding. Its main loop cannot be blocked since it was able to 
 process the screendump command. 
 
 This supports Josh's theory that a callback is not being invoked. The 
 virtio-blk I/O request would be left in a pending state. 
 
 Now here is where the behavior varies between configurations: 
 
 On a Windows guest with 1 vCPU, you may see the symptom that the 
 guest no 
 longer responds to ping. 
 
 On a Linux guest with multiple vCPUs, you may see the hung task message 
 from the guest kernel because other vCPUs are still making progress. 
 Just the vCPU that issued the I/O request and whose task is in 
 UNINTERRUPTIBLE state would really be stuck. 
 
 Basically, the symptoms depend not just on how QEMU is behaving but 
 also 
 on the guest kernel and how many vCPUs you have configured. 
 
 I think this can explain how both problems you are observing, Oliver 
 and 
 Mike, are a result of the same bug. At least I hope they are :). 
 
 Stefan 
 
 
 


-- 

Oliver Francke 

filoo GmbH 
Moltkestraße 25a 
0 Gütersloh 
HRB4355 AG Gütersloh 

Geschäftsführer: J.Rehpöhler | C.Kunz 

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-09 Thread Stefan Hajnoczi
On Fri, Aug 09, 2013 at 03:05:22PM +0100, Andrei Mikhailovsky wrote:
 I can confirm that I am having similar issues with ubuntu vm guests using fio 
 with bs=4k direct=1 numjobs=4 iodepth=16. Occasionally i see hang tasks, 
 occasionally guest vm stops responding without leaving anything in the logs 
 and sometimes i see kernel panic on the console. I typically leave the 
 runtime of the fio test for 60 minutes and it tends to stop responding after 
 about 10-30 mins. 
 
 I am on ubuntu 12.04 with 3.5 kernel backport and using ceph 0.61.7 with qemu 
 1.5.0 and libvirt 1.0.2 

Josh,
In addition to the Ceph logs you can also use QEMU tracing with the
following events enabled:
virtio_blk_handle_write
virtio_blk_handle_read
virtio_blk_rw_complete

See docs/tracing.txt for details on usage.

Inspecting the trace output will let you observe the I/O request
submission/completion from the virtio-blk device perspective.  You'll be
able to see whether requests are never being completed in some cases.

This bug seems like a corner case or race condition since most requests
seem to complete just fine.  The problem is that eventually the
virtio-blk device becomes unusable when it runs out of descriptors (it
has 128).  And before that limit is reached the guest may become
unusable due to the hung I/O requests.

Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH-DEPLOY TRIALS/EVALUATION RESULT ON CEPH VERSION 61.7

2013-08-09 Thread Aquino, BenX O
CEPH-DEPLOY EVALUATION ON CEPH VERSION 61.7
ADMINNODE:
root@ubuntuceph900athf1:~# ceph -v
ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
root@ubuntuceph900athf1:~#

SERVERNODE:
root@ubuntuceph700athf1:/etc/ceph# ceph -v
ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
root@ubuntuceph700athf1:/etc/ceph#

===:
Trial-1 of using ceph-deploy results: 
(http://ceph.com/docs/next/start/quick-ceph-deploy/)

My trial-1 scenario is using ceph-deploy to replace 2 OSD's (osd.2 and OSD.11) 
of a ceph node.

Obesrvation:
ceph-deploy creating a symbolic-links ceph-0--ceph-2 dir and ceph-1---ceph-11 
dir.
I did not ran into any errors or issue in this trials.

once concern:
--ceph deploy did not update linux fstab of mount point of osd data.

===

Trial 2: (http://ceph.com/docs/next/start/quick-ceph-deploy/)

I notice my node did not have any contents in 
/var/lib/ceph/boostrap-{osd}|{mds} .
Result: FAILURE TO MOVE FORWARD BEYOND THIS STEP

Tip from http://ceph.com/docs/next/start/quick-ceph-deploy/
If you don't have these keyrings, you may not have created a monitor 
successfully,
or you may have a problem with your network connection.
Ensure that you complete this step such that you have the foregoing keyrings 
before proceeding further.

Tip from (http://ceph.com/docs/next/start/quick-ceph-deploy/:
You may repeat this procedure. If it fails, check to see if the 
/var/lib/ceph/boostrap-{osd}|{mds} directories on the server node have keyrings.
If they do not have keyrings, try adding the monitor again; then, return to 
this step.

My WORKAROUND1:
COPIED CONTENTS OF /var/lib/ceph/boostrap-{osd}|{mds} FROM ANOTHER NODE

My WORKAROUND2:
USED CREATE A NEW CLUSTER PROCEDURE with CEPH-DEPLOY to create missing keyrings.

=:
TRIAL-3:  Attemp to build a new cluster/1-Node using ceph deploy:

RESULT FAILED TO GO BEYOND THE ERROR LOGS BELOW:

root@ubuntuceph900athf1:~/my-cluster# ceph-deploy osd prepare 
ubuntuceph700athf1:sde1:/var/lib/ceph/journal/osd.0.journal
ceph-disk-prepare -- /dev/sde1 /var/lib/ceph/journal/osd.0.journal returned 1
meta-data=/dev/sde1  isize=2048   agcount=4, agsize=30524098 blks
 =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=122096390, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=59617, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same 
device as the osd data
umount: /var/lib/ceph/tmp/mnt.iMsc1G: device is busy.
(In some cases useful info about processes that use
 the device is found by lsof(8) or fuser(1))
ceph-disk: Unmounting filesystem failed: Command '['/bin/umount', '--', 
'/var/lib/ceph/tmp/mnt.iMsc1G']' returned non-zero exit status 1

ceph-deploy: Failed to create 1 OSDs

root@ubuntuceph900athf1:~/my-cluster# ceph-deploy osd prepare 
ubuntuceph700athf1:sde1
ceph-disk-prepare -- /dev/sde1 returned 1
meta-data=/dev/sde1  isize=2048   agcount=4, agsize=30524098 blks
 =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=122096390, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=59617, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

umount: /var/lib/ceph/tmp/mnt.0JxBp1: device is busy.
(In some cases useful info about processes that use
 the device is found by lsof(8) or fuser(1))
ceph-disk: Unmounting filesystem failed: Command '['/bin/umount', '--', 
'/var/lib/ceph/tmp/mnt.0JxBp1']' returned non-zero exit status 1

ceph-deploy: Failed to create 1 OSDs
root@ubuntuceph900athf1:~/my-cluster# ceph-deploy osd prepare 
ubuntuceph700athf1:sde1
ceph-disk-prepare -- /dev/sde1 returned 1

ceph-disk: Error: Device is mounted: /dev/sde1

ceph-deploy: Failed to create 1 OSDs


Attempted on the local node:
root@ubuntuceph700athf1:/etc/ceph# ceph-deploy osd prepare 
ubuntuceph700athf1:sde1:/var/lib/ceph/journal/osd.0.journal
ceph-disk-prepare -- /dev/sde1 /var/lib/ceph/journal/osd.0.journal returned 1
ceph-disk: Error: Device is mounted: /dev/sde1

/dev/sde1 on /var/lib/ceph/tmp/mnt.GzZLAr type xfs (rw,noatime)

RESULT:
ceph-deploy complaints that osd drive is mounted, it was not mounted prior to 
running command, ceph-deploy mounted it, then complains that its mounted.



Re: [ceph-users] ceph-deploy behind corporate firewalls

2013-08-09 Thread Alfredo Deza
On Fri, Aug 9, 2013 at 1:34 AM, Luc Dumaine lduma...@sitiv.fr wrote:

 Hi,

 I was able to use ceph-deploy behind a proxy, by defining the appropriate
 environment variables used by wget..

 I.E. on ubuntu just add to /etc/environnement:

 http_proxy=http://host:port
 ftp_proxy=http://host:port
 https_proxy=http://host:port

 Thanks for letting us know, this definitely sounds useful, I will add it
to the docs so someone having a similar issue can have a workaround for now.




 Regard, Luc.


 - Mail original -
 De: Harvey Skinner hpmpe...@gmail.com
 À: ceph-users@lists.ceph.com
 Cc: Harvey Skinner hpmpe...@gmail.com
 Envoyé: Vendredi 9 Août 2013 05:48:35
 Objet: [ceph-users] ceph-deploy behind corporate firewalls

  hi all,

 I am not sure if I am the only one having issues with ceph-deploy
 behind a firewall or not.  I haven't seen any other reports of similar
 issues yet.  With http proxies I am able to have apt-get working, but
 wget is still an issue.

 Working to use the newer ceph-deploy mechanism to deploy my next POC
 set up on four storage nodes.   The ceph-deploy install process
 unfortunately uses wget to retrieve the Ceph release key and failing
 the install.   To get around this i can manually add the Ceph release
 key on all my nodes and apt-get install all the Ceph packages.
 Question though is whether there is anything else that ceph-deploy
 does that I would need to do manually to have everything in state
 where ceph-deploy would work correctly for the rest of the cluster
 setup and deployment, i.e. ceph-deploy new  -and- ceph-deploy mon
 create, etc.?

 thank you,
 Harvey
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 Le SITIV décline toute responsabilité quant au contenu de ce message. Ce
 message et les pièces jointes qui y sont attachées sont confidentiels et
 établis à  l'attention exclusive de leur destinataire. Si vous pensez
 l'avoir
 reçu par erreur, merci de bien vouloir en aviser immédiatement
 l'expéditeur,
 de ne pas l'utiliser sous quelque forme que ce soit et de le détruire
 immédiatement. Toute divulgation, utilisation, diffusion ou reproduction du
 message ou des informations qu'il contient doit être préalablement
 autorisée
 par l'expéditeur.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] STGT targets.conf example

2013-08-09 Thread Dan Mick
Awesome. Thanks Darryl. Do you want to propose a fix to stgt, or shall I?
On Aug 8, 2013 7:21 PM, Darryl Bond db...@nrggos.com.au wrote:

 Dan,
 I found that the tgt-admin perl script looks for a local file

 if (-e $backing_store  ! -d $backing_store  $can_alloc == 1) {

  A bit nasty, but I created some empty files relative to / of the same
 path as the RBD backing store which worked around the problem.

 mkdir /iscsi-spin
 touch /iscsi-spin/test

 Lets me restart tgtd and have the LUN created properly.

 tgt-admin --dump is also not that useful, doesn't output the backing
 store type.

 # tgt-admin --dump
 default-driver iscsi

 target iqn.2013.com.ceph:test
 backing-store iscsi-spin/test
 initiator-address 192.168.6.100
 /target


 Darryl

 On 08/09/13 07:23, Dan Mick wrote:

 On 08/04/2013 10:15 PM, Darryl Bond wrote:

 I am testing scsi-target-utils tgtd with RBD support.
 I have successfully created an iscsi target using RBD as an iscsi target
 and tested it.
 It backs onto a rados pool iscsi-spin with a RBD called test.
 Now I want it to survive a reboot. I have created a conf file

 target iqn.2008-09.com.ceph:test
   backing-store iscsi-spin/test
   bs-type rbd
   path iscsi-spin/test
   /backing-store
 /target

 When I restart tgtd It creates the target but doesn't connect the
 backing store.
 The tool tgt-admin has a test mode for the configuration file

 [root@cephgw conf.d]# tgt-admin -p -e
 # Adding target: iqn.2008-09.com.ceph:test
 tgtadm -C 0 --lld iscsi --op new --mode target --tid 1 -T
 iqn.2008-09.com.ceph:test
 # Skipping device: iscsi-spin/test
 # iscsi-spin/bashful-spin does not exist - please check the
 configuration file
 tgtadm -C 0 --lld iscsi --op bind --mode target --tid 1 -I ALL

 It looks to me like tgtd support RBD backing stores but the
 configuration utilities don't.

 I have not tried config files or tgt-admin to any great extent, but it
 doesn't look to me like there are backend dependencies in those tools
 (or I would have modified them at the time :)), but, that said, there
 may be some weird problem.  tgt-admin is a Perl script that could be
 instrumented to figure out what's going on.

 I do know that the syntax of the config file is dicey.

  Anyone tried this?
 What have I missed?

 Regards
 Darryl


 The contents of this electronic message and any attachments are intended
 only for the addressee and may contain legally privileged, personal,
 sensitive or confidential information. If you are not the intended
 addressee, and have received this email, any transmission, distribution,
 downloading, printing or photocopying of the contents of this message or
 attachments is strictly prohibited. Any legal privilege or
 confidentiality attached to this message and attachments is not waived,
 lost or destroyed by reason of delivery to any person other than
 intended addressee. If you have received this message and are not the
 intended addressee you should notify the sender by return email and
 destroy all copies of the message and any attachments. Unless expressly
 attributed, the views expressed in this email do not necessarily
 represent the views of the company.
 __**_
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 The contents of this electronic message and any attachments are intended
 only for the addressee and may contain legally privileged, personal,
 sensitive or confidential information. If you are not the intended
 addressee, and have received this email, any transmission, distribution,
 downloading, printing or photocopying of the contents of this message or
 attachments is strictly prohibited. Any legal privilege or confidentiality
 attached to this message and attachments is not waived, lost or destroyed
 by reason of delivery to any person other than intended addressee. If you
 have received this message and are not the intended addressee you should
 notify the sender by return email and destroy all copies of the message and
 any attachments. Unless expressly attributed, the views expressed in this
 email do not necessarily represent the views of the company.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-09 Thread Joao Eduardo Luis

On 07/08/13 15:14, Jeppesen, Nelson wrote:

Joao,

Have you had a chance to look at my monitor issues? I Ran ''ceph-mon -i FOO 
-compact'  last week but it did not improve disk usage.

Let me know if there's anything else I dig up. The monitor still at 0.67-rc2 
with the OSDs at .0.61.7.


Hi Nelson,

It's been a crazy week, and haven't had the opportunity to dive into the 
compaction issues -- and we've been tying the last loose ends for the 
dumpling release.


Btw, just noticed that you mentioned on your previous email that the 
'mon compact on start = true' flag made your monitor hang.  Well, that 
was not a hang per se.  If you try that again and take a look at IO on 
the mon store, you should see the monitor doing loads of it.  That's 
leveldb compacting.  It should take a while.  A considerable while.  As 
I previously mentioned, 10G stores can take a while to compact -- a 
220GB store will take even longer.


However, regardless of how we eventually fix this whole thing, you'll 
need to compact your store.  I seriously doubt there's a way out of it. 
 Well, there may be another way out of it, but that would involve a bit 
of trickery to get the leveldb contents out of the store and into a new, 
fresh store, which would seem a lot like a last resort.


But feel free to ping me on IRC and we'll try to figure something out.

  -Joao





On 08/02/2013 12:15 AM, Jeppesen, Nelson wrote:

Thanks for the reply, but how can I fix this without an outage?

I tired adding 'mon compact on start = true' but the monitor just hung. 
Unfortunately this is a production cluster and can't take the outages (I'm 
assuming the cluster will fail without a monitor). I had three monitors I was 
hit with the store.db bug and lost two of the three.

I have tried running with 0.61.5, .0.61.7 and 0.67-rc2. None of them seem to 
shrink the DB.


My guess is that the compaction policies we are enforcing won't cover
the portions of the store that haven't been compacted *prior* to the
upgrade.

Even today we still know of users with stores growing over dozens of
GBs, requiring occasional restarts to compact (which is far from an
acceptable fix).  Some of these stores can take several minutes to
compact when the monitors are restarted, although these guys can often
mitigate any down time by restarting monitors one at a time while
maintaining quorum.  Unfortunately you don't have that luxury. :-\

If however you are willing to manually force a compaction, you should be
able to do so with 'ceph-mon -i FOO --compact'.

Now, there is a possibility this is why you've been unable to add other
monitors to the cluster.  Chances are that the iterators used to
synchronize the store get stuck, or move slowly enough to make all sorts
of funny timeouts to be triggered.

I intend to look into your issue (especially the problems with adding
new monitors) in the morning to better assess what's happening.

-Joao



-Original Message-
From: Mike Dawson [mailto:mike.dawson at cloudapt.com]
Sent: Thursday, August 01, 2013 4:10 PM
To: Jeppesen, Nelson
Cc: ceph-users at lists.ceph.com
Subject: Re: [ceph-users] Why is my mon store.db is 220GB?

220GB is way, way too big. I suspect your monitors need to go through a 
successful leveldb compaction. The early releases of Cuttlefish suffered 
several issues with store.db growing unbounded. Most were fixed by 0.61.5, I 
believe.

You may have luck stoping all Ceph daemons, then starting the monitor by itself. 
When there were bugs, leveldb compaction tended work better without OSD traffic 
hitting the monitors. Also, there are some settings to force a compact on startup 
like 'mon compact on start = true' and mon compact on trim = true. I don't 
think either are required anymore though. See some history here:

http://tracker.ceph.com/issues/4895


Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:

My Mon store.db has been at 220GB for a few months now. Why is this
and how can I fix it? I have one monitor in this cluster and I suspect
that I can't  add monitors to the cluster because it is too big. Thank you.



___
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com







--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [ANN] ceph-deploy v1.2 has been released!

2013-08-09 Thread Alfredo Deza
I am very pleased to announce the release of ceph-deploy to the Python
Package Index.

The OS packages are yet to come, I will make sure to update this thread
when they do.

For now, if you are familiar with Python install tools, you can install
directly from PyPI with pip or easy_install:

pip install ceph-deploy

or

easy_install ceph-deploy

This release includes a massive effort for better error reporting and
granular information in remote hosts (for `install` and `mon create`
commands for now).

There were about 18 bug fixes and improvements too, including upstream
libraries that are used by ceph-deploy.

If you find any issues with ceph-deploy, please make sure you let me know
via this list or on irc at #ceph!

Enjoy!

-Alfredo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [ANN] ceph-deploy v1.2 has been released!

2013-08-09 Thread Sébastien RICCIO

Hi!

Awesome :)) Thanks for such a great work!

Cheers,
Sébastien

On 10.08.2013 02:52, Alfredo Deza wrote:
I am very pleased to announce the release of ceph-deploy to the Python 
Package Index.


The OS packages are yet to come, I will make sure to update this 
thread when they do.


For now, if you are familiar with Python install tools, you can 
install directly from PyPI with pip or easy_install:


pip install ceph-deploy

or

easy_install ceph-deploy

This release includes a massive effort for better error reporting and 
granular information in remote hosts (for `install` and `mon create` 
commands for now).


There were about 18 bug fixes and improvements too, including upstream 
libraries that are used by ceph-deploy.


If you find any issues with ceph-deploy, please make sure you let me 
know via this list or on irc at #ceph!


Enjoy!

-Alfredo


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com