Re: [ceph-users] Intel 520/530 SSD for ceph

2013-11-19 Thread Stefan Priebe

Hi Marcus,

Am 18.11.2013 23:51, schrieb m...@linuxbox.com:

On Mon, Nov 18, 2013 at 02:38:42PM +0100, Stefan Priebe - Profihost AG wrote:
You may actually be doing O_SYNC - recent kernels implement O_DSYNC,
but glibc maps O_DSYNC into O_SYNC.  But since you're writing to the
block device this won't matter much.


No difference regarding O_DSYNC or O_SYNC the values are the same. Also 
I'm using 3.10.19 as a kernel so it is recent enough.



I believe the effect of O_DIRECT by itself is just to bypass the buffer
cache, which is not going to make much difference for your dd case.
(It will mainly affect other applications that are also using the
buffer cache...)

 O_SYNC should be causing the writes to block until a response
 is received from the disk.  Without O_SYNC, the writes will
 just queue operations and return - potentially very fast.
 Your dd is probably writing enough data that there is some
 throttling by the system as it runs out of disk buffers and
 has to wait for some previous data to be written to the drive,
 but the delay for any individual block is not likely to matter.
 With O_SYNC, you are measuring the delay for each block directly,
 and you have absolutely removed the ability for the disk to
 perform any sort of parallism.

That's correct but ceph uses O_DSYNC for his journal and may be other 
stuff so it is important to have devices performing well with O_DSYNC.



Sounds like the intel 530 is has a much larger block write latency,
but can make up for it by performing more overlapped operations.

You might be able to vary this behavior by experimenting with sdparm,
smartctl or other tools, or possibly with different microcode in the drive.

Which values or which settings do you think of?

Greets
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

2013-11-19 Thread Mihály Árva-Tóth
Hello David and Chris,

Thank you for your replies in this thread.

 The automatic repair should handle getting an EIO during read of the
object replica.

I think when osd tries to read object from primary disk, which inside of
bad sector, the controller does not respond with EIO but something else.

If you can help me how can I debug response code I try to know.

Thank you,
Mihaly


2013/11/19 David Zafman david.zaf...@inktank.com


 I looked at the code.  The automatic repair should handle getting an EIO
 during read of the object replica.  It does NOT require removing the object
 as I said before, so it doesn’t matter which copy has bad sectors.  It will
 copy from a good replica to the primary, if necessary.  By default a
 deep-scrub which would catch this case is performed weekly.  A repair must
 be initiated by administrative action.

 When replicas differ due to comparison of checksums, we currently don’t
 have a way to determine which copy(s) are corrupt.  This is where a manual
 intervention may be necessary if the administrator can determine which
 copy(s) are bad.

 David Zafman
 Senior Developer
 http://www.inktank.com




 On Nov 18, 2013, at 1:11 PM, Chris Dunlop ch...@onthe.net.au wrote:

  OK, that's good (as far is it goes, being a manual process).
 
  So then, back to what I think was Mihály's original issue:
 
  pg repair or deep-scrub can not fix this issue. But if I
  understand correctly, osd has to known it can not retrieve
  object from osd.0 and need to be replicate an another osd
  because there is no 3 working replicas now.
 
  Given a bad checksum and/or read error tells ceph that an object
  is corrupt, it would seem to be a natural step to then have ceph
  automatically use another good-checksum copy, and even rewrite
  the corrupt object, either in normal operation or under a scub
  or repair.
 
  Is there a reason this isn't done, apart from lack of tuits?
 
  Cheers,
 
  Chris
 
 
  On Mon, Nov 18, 2013 at 11:43:46AM -0800, David Zafman wrote:
 
  No, you wouldn’t need to re-replicate the whole disk for a single bad
 sector.  The way to deal with that if the object is on the primary is to
 remove the file manually from the OSD’s filesystem and perform a repair of
 the PG that holds that object.  This will copy the object back from one of
 the replicas.
 
  David
 
  On Nov 17, 2013, at 10:46 PM, Chris Dunlop ch...@onthe.net.au wrote:
 
  Hi David,
 
  On Fri, Nov 15, 2013 at 10:00:37AM -0800, David Zafman wrote:
 
  Replication does not occur until the OSD is “out.”  This creates a
 new mapping in the cluster of where the PGs should be and thus data begins
 to move and/or create sufficient copies.  This scheme lets you control how
 and when you want the replication to occur.  If you have plenty of space
 and you aren’t going to replace the drive immediately, just mark the OSD
 “down AND “out..  If you are going to replace the drive immediately, set
 the “noout” flag.  Take the OSD “down” and replace drive.  Assuming it is
 mounted in the same place as the bad drive, bring the OSD back up.  This
 will replicate exactly the same PGs the bad drive held back to the
 replacement drive.  As was stated before don’t forget to “ceph osd unset
 noout
 
  Keep in mind that in the case of a machine that has a hardware
 failure and takes OSD(s) down there is an automatic timeout which will mark
 them “out for unattended operation.  Unless you are monitoring the cluster
 24/7 you should have enough disk space available to handle failures.
 
  Related info in:
 
 
 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
 
  David Zafman
  Senior Developer
  http://www.inktank.com
 
 
  Are you saying, if a disk suffers from a bad sector in an object
  for which it's primary, and for which good data exists on other
  replica PGs, there's no way for ceph to recover other than by
  (re-)replicating the whole disk?
 
  I.e., even if the disk is able to remap the bad sector using a
  spare, so the disk is ok (albeit missing a sector's worth of
  object data), the only way to recover is to basically blow away
  all the data on that disk and start again, replicating
  everything back to the disk (or to other disks)?
 
  Cheers,
 
  Chris.




-- 

Best regards,

Mihály Árva-Tóth

System Engineer



Virtual Call Center GmbH

Address: 23-33  Csalogány Street, Budapest 1027, Hungary

Tel: +36 1 999 7400

Mobile: +36 30 473 9256

Fax: +36 1 999 7401

E-mail: mihaly.arva-t...@virtual-call-center.eu

Web: www.virtual-call-center.eu http://www.virtual-call-center.hu/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] alternative approaches to CEPH-FS

2013-11-19 Thread James Pearce



2) Can't grow once you reach the hard limit of 14TB, and if you have
multiple of such machines, then fragmentation becomes a problem

3) might have the risk of 14TB partition corruption wiping out all 
your shares




14TB limit is due to EXT(3/4) recommendation(/implementation)?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] about start a osd daemon

2013-11-19 Thread Alfredo Deza
On Tue, Nov 19, 2013 at 1:29 AM, Dnsbed Ops o...@dnsbed.com wrote:
 Hi,

 When an osd node server restarted, I found the osd daemon doesn't get
 started.

 I must run these two commands from the deploy node to restart them:

 ceph-deploy osd prepare ceph3.anycast.net:/tmp/osd2
 ceph-deploy osd activate ceph3.anycast.net:/tmp/osd2


That really doesn't restart them, it creates them. I think ceph-deploy
is not destroying anything
here so it appears like it restarts when in fact is re-doing the whole
process again.


 My questions are,
 #1, can it be setup to startup automacially from the system reboot?

I would think they should come up on boot. Could you share some log output?

 #2, each time we activate it, we must prepare it firstly?

This is only required for paths, not for devices. For example, if you
had your OSD on
the sdb device you could just call `create` directly.

 Thanks.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Librados Error Codes

2013-11-19 Thread Behar Veliqi
Hi,

when using the librados c library, the documentation of the different functions 
just tells that it returns a negative error code on failure,
e.g. the rados_read function 
(http://ceph.com/docs/master/rados/api/librados/#rados_read).

Is there anywhere any further documentation which error code is returned under 
which condition and how to know _why_ the operation has failed?

Thanks!

Regards,
Behar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Size of RBD images

2013-11-19 Thread nicolasc

Hi every one,

In the course of playing with RBD, I noticed a few things:

* The RBD images are so thin-provisioned, you can create arbitrarily 
large ones.
  On my 0.72.1 freshly-installed empty 200TB cluster, I was able to 
create a 1PB image:


  $ rbd create --image-format 2 --size 1073741824 test_img

  This command is successful, and I can check the image status:

  $ rbd info test
  rbd image 'test':
size 1024 TB in 268435456 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.19f76b8b4567
format: 2
features: layering

* Such an oversized image seems unmountable on my 3.2.46 kernel.
  However the error message is not very explicit:

  $ rbd map test_img
  rbd: add failed: (2) No such file or directory

  There is no error or explanation to be seen anywhere in the logs.
  dmesg reports the connection to the cluster through RBD as usual, and 
that's it.
  Using the exact same commands with image size 32GB will successfully 
map the device.


* Such an oversize image takes an awfully long time to shrink or remove.
  However, the image has just been created and is empty.
  In RADOS, I only see the corresponding rbd_id and rbd_header, but no 
data object at all.

  Still, removing the 1PB image takes roughly 8 hours.

Cluster config:
3 mons, 8nodes * 72osds, about 4800pgs (2400pgs in pool rbd)
cluster and public network are 10GbE, each node has 8 cores and 64GB mem

Oh, so my questions:
- why is it possible to create an image five times the size of the 
cluster without warning?

- where could this No such file error come from?
- why does it take long to shrink/delete a 
large-but-empty-and-thin-provisioned image?


I know that 1PB is oversized (No such file when trying to map), and 
32GB is not, so I am currently looking for the oversize threshold. More 
info coming soon.


Best regards,

Nicolas Canceill
Scalable Storage Systems
SURFsara (Amsterdam, NL)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread Bernhard Glomm
Hi Nicolas
just fyi
rbd format 2 is not supported yet by the linux kernel (module)
it only can be used as a target for virtual machines using librbd
see: man rbd -- --image-format 

shrinking time: same happend to me,
rbd (v1) device
took about a week to shrink from 1PB  to 10TB
the good news: I had already about 5TB data on it
and ongoing processes using the device and 
neither was there any data loss nor was there
a significant performance issue.
(3mons + 4machines with different amount of OSDs each)

Bernhard

 EDIT: sorry about the No such file error
 
 Now, it seems this is a separate issue: the system I was using was 
 apparently unable to map devices to images in format 2. I will be 
 investigating that further before mentioning it again.
 
 I would still appreciate answers about the 1PB image and the time to 
 shrink it.
 
 Best regards,
 
 Nicolas Canceill
 Scalable Storage Systems
 SURFsara (Amsterdam, NL)
 
 
 On 11/19/2013 03:20 PM, nicolasc wrote:
  Hi every one,
  
  In the course of playing with RBD, I noticed a few things:
  
  * The RBD images are so thin-provisioned, you can create arbitrarily 
  large ones.
  On my 0.72.1 freshly-installed empty 200TB cluster, I was able to 
  create a 1PB image:
  
  $ rbd create --image-format 2 --size 1073741824 test_img
  
  This command is successful, and I can check the image status:
  
  $ rbd info test
  rbd image 'test':
  size 1024 TB in 268435456 objects
  order 22 (4096 kB objects)
  block_name_prefix: rbd_data.19f76b8b4567
  format: 2
  features: layering
  
  * Such an oversized image seems unmountable on my 3.2.46 kernel.
  However the error message is not very explicit:
  
  $ rbd map test_img
  rbd: add failed: (2) No such file or directory
  
  There is no error or explanation to be seen anywhere in the logs.
  dmesg reports the connection to the cluster through RBD as usual, 
  and that's it.
  Using the exact same commands with image size 32GB will successfully 
  map the device.
  
  * Such an oversize image takes an awfully long time to shrink or remove.
  However, the image has just been created and is empty.
  In RADOS, I only see the corresponding rbd_id and rbd_header, but no 
  data object at all.
  Still, removing the 1PB image takes roughly 8 hours.
  
  Cluster config:
  3 mons, 8nodes * 72osds, about 4800pgs (2400pgs in pool rbd)
  cluster and public network are 10GbE, each node has 8 cores and 64GB mem
  
  Oh, so my questions:
  - why is it possible to create an image five times the size of the 
  cluster without warning?
  - where could this No such file error come from?
  - why does it take long to shrink/delete a 
  large-but-empty-and-thin-provisioned image?
  
  I know that 1PB is oversized (No such file when trying to map), and 
  32GB is not, so I am currently looking for the oversize threshold. 
  More info coming soon.
  
  Best regards,
  
  Nicolas Canceill
  Scalable Storage Systems
  SURFsara (Amsterdam, NL)
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 


  
 


  

  
Bernhard Glomm

IT Administration


  

  Phone:


  +49 (30) 86880 134

  
  Fax:


  +49 (30) 86880 100

  
  Skype:


  bernhard.glomm.ecologic

  

  









  


  Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 
Berlin | Germany

  GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | 
USt/VAT-IdNr.: DE811963464

  Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH

  

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] about start a osd daemon

2013-11-19 Thread LaSalle, Jurvis
On 11/19/13, 1:29 AM, Dnsbed Ops o...@dnsbed.com wrote:

Hi,

When an osd node server restarted, I found the osd daemon doesn't get
started.

I must run these two commands from the deploy node to restart them:

ceph-deploy osd prepare ceph3.anycast.net:/tmp/osd2
ceph-deploy osd activate ceph3.anycast.net:/tmp/osd2


My questions are,
#1, can it be setup to startup automacially from the system reboot?
#2, each time we activate it, we must prepare it firstly?

Most *nix boxes wipe /tmp on boot.  If you¹re just doing a quick and dirty
POC and want to test cluster behavior across reboots, place your osd on in
/var/tmp or anywhere else really.


JL

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is Ceph a provider of block device too ?

2013-11-19 Thread Listas
Thank you! I am studying and testing CEPH. I think it will be very good 
for my needs


On 18-11-2013 20:31, Timofey root wrote:

On 09 нояб. 2013 г., at 1:46, Gregory Farnum g...@inktank.com wrote:


On Fri, Nov 8, 2013 at 8:49 AM, Listas lis...@adminlinux.com.br wrote:

Hi !

I have clusters (IMAP service) with 2 members configured with Ubuntu + Drbd
+ Ext4. Intend to migrate to the use of Ceph and begin to allow distributed
access to the data.

Does Ceph provides the distributed filesystem and block device?

Ceph's RBD is a distributed block device and works very well; you
could use it to replace DRBD. The CephFS distributed filesystem is in
more of a preview mode and is not supported for general use at this
time.


Does Ceph work fine in clusters of two members?

It should work fine in a cluster of that size, but you're not getting
as many advantages over other solutions at such small scales as you do
from larger ones.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph (deploy?) and drive paths / mounting / best practice.

2013-11-19 Thread Udo Lembke
On 19.11.2013 06:56, Robert van Leeuwen wrote:
 Hi,

 ...
 It looks like it is just using /dev/sdX for this instead of the 
 /dev/disk/by-id /by-path given by ceph-deploy.

 ...
Hi Robert,
I'm using the disk-label:

fstab:
LABEL=osd.0   /var/lib/ceph/osd/ceph-0   xfs noatime,nodiratime 
0   0

Creating with mkfs.xfs -L osd.0 /dev/sdX1

With this config the path can change. Work well for me.


Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread LaSalle, Jurvis
On 11/19/13, 2:10 PM, Wolfgang Hennerbichler wo...@wogri.com wrote:


On Nov 19, 2013, at 3:47 PM, Bernhard Glomm bernhard.gl...@ecologic.eu
wrote:

 Hi Nicolas
 just fyi
 rbd format 2 is not supported yet by the linux kernel (module)

I believe this is wrong. I think linux supports rbd format 2 images since
3.10. 

One more reason to cross our fingers for official Saucy Salamander support
soonŠ 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread Gruher, Joseph R
So is there any size limit on RBD images?  I had a failure this morning 
mounting 1TB RBD.  Deleting now (why does it take so long to delete if it was 
never even mapped, much less written to?) and will retry with smaller images.  
See output below.  This is 0.72 on Ubuntu 13.04 with 3.12 kernel.

ceph@joceph-client01:~$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1

ceph@joceph-client01:~$ rbd map testrbd -p testpool01
rbd: add failed: (13) Permission denied

ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rados df
pool name   category KB  objects   clones 
degraded  unfound   rdrd KB   wrwr KB
data-  000  
  0   00000
metadata-  000  
  0   00000
rbd -  120  
  0   0   10788
testpool01  -  000  
  0   00000
testpool02  -  000  
  0   00000
testpool03  -  000  
  0   00000
testpool04  -  000  
  0   00000
  total used  23287851602
  total avail 9218978040
  total space11547763200

ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd

ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread Gruher, Joseph R


-Original Message-
From: Gruher, Joseph R
Sent: Tuesday, November 19, 2013 12:24 PM
To: 'Wolfgang Hennerbichler'; Bernhard Glomm
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Size of RBD images

So is there any size limit on RBD images?  I had a failure this morning 
mounting
1TB RBD.  Deleting now (why does it take so long to delete if it was never even
mapped, much less written to?) and will retry with smaller images.  See
output below.  This is 0.72 on Ubuntu 13.04 with 3.12 kernel.

ceph@joceph-client01:~$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1

ceph@joceph-client01:~$ rbd map testrbd -p testpool01
rbd: add failed: (13) Permission denied

ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rados df
pool name   category KB  objects   clones 
degraded  unfound
rdrd KB   wrwr KB
data-  000 
   0   0000
0
metadata-  000 
   0   0000
0
rbd -  120 
   0   0   1078
8
testpool01  -  000 
   0   0000
0
testpool02  -  000 
   0   0000
0
testpool03  -  000 
   0   0000
0
testpool04  -  000 
   0   0000
0
  total used  23287851602
  total avail 9218978040
  total space11547763200

ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd

ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1


I think I figured out where I went wrong here.  I had thought if you didn't 
specify the pool on the 'rbd create' command line you could then later map to 
any pool.  In retrospect that probably doesn't make a lot of sense and it 
appears if you don't specify the pool at the create step it just defaults to 
the rbd pool.  See example below.

ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage5 
--pool testpool01
ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage5 --pool testpool01

ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage6
ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool rbd
ceph@joceph-client01:/etc/ceph$
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help v.72configure federate gateway failed

2013-11-19 Thread Josh Durgin

Sorry for the delay, I'm still catching up since the openstack
conference.

Does the system user for the destination zone exist with the same
access secret and key in the source zone?

If you enable debug rgw = 30 on the destination you can see why the
copy_obj from the source zone is failing.

Josh

On 11/11/2013 12:52 AM, maoqi1982 wrote:

Hi list
ceph version is the latest v0.72 emperor, follow the
http://ceph.com/docs/master/radosgw/federated-config/ doc i deploy two
ceph cluster (one ceph  per datasite  )  to form a region (a master zone
, a slave zone ). the metadata seem to be sync ok. but the object is
failed to sync .

the error is as following:
INFO:radosgw_agent.worker:6053 is processing shard number 47
INFO:radosgw_agent.worker:finished processing shard 47
**
*INFO:radosgw_agent.sync:48/128 items processed*
*INFO:radosgw_agent.worker:6053 is processing shard number 48*
**
**
*INFO:radosgw_agent.worker:bucket instance east-bucket:us-east.4139.1
has 5 entries after 002.2.3*
*INFO:radosgw_agent.worker:syncing bucket east-bucket*
*ERROR:radosgw_agent.worker:failed to sync object east-bucket/驽?*
*?docx: *
*ERROR:radosgw_agent.worker:failed to sync object east-bucket/sss.py:
state is error*
*ERROR:radosgw_agent.worker:failed to sync object east-bucket/Nfg.docx:
state is error*
**
INFO:radosgw_agent.worker:finished processing shard 48
INFO:radosgw_agent.worker:6053 is processing shard number 49
INFO:radosgw_agent.sync:49/128 items processed
INFO:radosgw_agent.sync:50/128 items processed
INFO:radosgw_agent.worker:finished processing shard 49
INFO:radosgw_agent.worker:6053 is processing shard number 50
INFO:radosgw_agent.worker:finished processing shard 50
INFO:radosgw_agent.sync:51/128 items processed
INFO:radosgw_agent.worker:6053 is processing shard number 51
INFO:radosgw_agent.worker:finished processing shard 51
INFO:radosgw_agent.sync:52/128 items processed
INFO:radosgw_agent.worker:6053 is processing shard number 52
INFO:radosgw_agent.sync:53/128 items processed
INFO:radosgw_agent.worker:finished processing shard 52

thanks


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-agent AccessDenied 403

2013-11-19 Thread Josh Durgin

On 11/13/2013 09:06 PM, lixuehui wrote:

And on the slave zone gateway instence ,the info is like this :

2013-11-14 12:54:24.516840 7f51e7fef700  1 == starting new 
request req=0xb1e3b0 =
2013-11-14 12:54:24.526640 7f51e7fef700  1 == req done 
req=0xb1e3b0 http_status=200 ==
2013-11-14 12:54:24.545440 7f51e4fe9700  1 == starting new 
request req=0xb1c690 =
2013-11-14 12:54:24.551696 7f51e4fe9700  0 WARNING: couldn't find 
acl header for bucket, generating default
2013-11-14 12:54:24.566005 7f51e4fe9700  0  HTTP_DATE - Thu Nov 
14 04:54:24 2013
2013-11-14 12:54:24.566046 7f51e4fe9700  0  HTTP_X_AMZ_COPY_SOURCE 
- sss%2Frgwconf
2013-11-14 12:54:24.607998 7f51e4fe9700  1 == req done 
req=0xb1c690 http_status=403 ==
2013-11-14 12:54:24.626466 7f51e27e4700  1 == starting new 
request req=0xb24260 =

Any one could help to find the problem ? Does it mean , we should set
acl for the bucket . In fact ,the information goes the same as it before
, after setting acl for the bucket .
bucket-name sss
object-name rgwconf
Or is there something wrong with  either the HTTP_DATE or
HTTP_X_AMS_COPY_SOURCE?


Those headers are fine, and it's unrelated to acls since the gateway is
using a system user for cross-zone copies, which has full access.

Does the system user for the destination zone exist with the same
access secret and key in the source zone?

Josh



lixuehui
*发件人:* lixuehui mailto:lixue...@chinacloud.com.cn
*发送时间:* 2013-11-13 16:16
*收件人:* ceph-users mailto:ceph-users@lists.ceph.com
*主题:* radosgw-agent AccessDenied 403
Hi ,list
We've ever reflected that ,radosgw-agent sync data failed all the
time ,before. We paste the concert log here to seek any help now .

  application/json; charset=UTF-8
Wed, 13 Nov 2013 07:24:45 GMT
x-amz-copy-source:sss%2Frgwconf
/sss/rgwconf
2013-11-13T15:24:45.510 11171:DEBUG:boto:Signature:
AWS CQHH7O4XULLINBNQQSPB:9ktSGas0/iuekklDmHRuU+OItek=
2013-11-13T15:24:45.511 11171:DEBUG:boto:url = 
'http://ceph-rgw41.com/sss/rgwconf'
params={'rgwx-op-id': 'ceph-rgw41:11160:2', 
'rgwx-source-zone': u'us-east', 'rgwx-client-id': 'radosgw-agent'}
headers={'Content-Length': '0', 'User-Agent': 
'Boto/2.16.0 Python/2.7.3 Linux/3.5.0-23-generic', 'x-amz-copy-source': 
'sss%2Frgwconf', 'Date': 'Wed, 13 Nov 2013 07:24:45 GMT', 'Content-Type': 
'application/json; charset=UTF-8', 'Authorization': 'AWS 
CQHH7O4XULLINBNQQSPB:9ktSGas0/iuekklDmHRuU+OItek='}
data=None
2013-11-13T15:24:45.519 
11171:INFO:requests.packages.urllib3.connectionpool:Starting new HTTP 
connection (1): ceph-rgw41.com
2013-11-13T15:24:45.580 
11171:DEBUG:requests.packages.urllib3.connectionpool:PUT 
/sss/rgwconf?rgwx-op-id=ceph-rgw41%3A11160%3A2rgwx-source-zone=us-eastrgwx-client-id=radosgw-agent
 HTTP/1.1 403 78
2013-11-13T15:24:45.584 11171:DEBUG:radosgw_agent.worker:exception during sync: Http error code 403 
content ?xml version=1.0 
encoding=UTF-8?ErrorCodeAccessDenied/Code/Error
2013-11-13T15:24:45.587 11171:DEBUG:boto:StringToSign:
GET
Wed, 13 Nov 2013 07:24:45 GMT
/admin/opstate
2013-11-13T15:24:45.589 11171:DEBUG:boto:Signature:
AWS CQHH7O4XULLINBNQQSPB:JZwaFKhZEsQUj50jLxjNzni8n5Q=
2013-11-13T15:24:45.590 11171:DEBUG:boto:url = 
'http://ceph-rgw41.com/admin/opstate'
params={'client-id': 'radosgw-agent', 'object': 
'sss/rgwconf', 'op-id': 'ceph-rgw41:11160:2'}
headers={'Date': 'Wed, 13 Nov 2013 07:24:45 GMT', 
'Content-Length': '0', 'Authorization': 'AWS 
CQHH7O4XULLINBNQQSPB:JZwaFKhZEsQUj50jLxjNzni8n5Q=', 'User-Agent': 'Boto/2.16.0 
Python/2.7.3 Linux/3.5.0-23-generic'}
data=None
2013-11-13T15:24:45.594 
11171:INFO:requests.packages.urllib3.connectionpool:Starting new HTTP 
connection (1): ceph-rgw41.com
2013-11-13T15:24:45.607 
11171:DEBUG:requests.packages.urllib3.connectionpool:GET 
/admin/opstate?client-id=radosgw-agentobject=sss%2Frgwconfop-id=ceph-rgw41%3A11160%3A2 
HTTP/1.1 200 None
2013-11-13T15:24:45.612 
11171:DEBUG:radosgw_agent.worker:op state is [{u'timestamp': u'2013-11-13 
07:24:45.561401Z', u'op_id': u'ceph-rgw41:11160:2', u'object': u'sss/rgwconf', 
u'state': u'error', u'client_id': u'radosgw-agent'}]
2013-11-13T15:24:45.614 
11171:ERROR:radosgw_agent.worker:failed to 

Re: [ceph-users] Ephemeral RBD with Havana and Dumpling

2013-11-19 Thread Josh Durgin

On 11/14/2013 09:54 AM, Dmitry Borodaenko wrote:

On Thu, Nov 14, 2013 at 6:00 AM, Haomai Wang haomaiw...@gmail.com wrote:

We are using the nova fork by Josh Durgin
https://github.com/jdurgin/nova/commits/havana-ephemeral-rbd - are there
more patches that need to be integrated?

I hope I can release or push commits to this branch contains live-migration,
incorrect filesystem size fix and ceph-snapshort support in a few days.


Can't wait to see this patch! Are you getting rid of the shared
storage requirement for live-migration?


Yes, that's what Haomai's patch will fix for rbd-based ephemeral
volumes (bug https://bugs.launchpad.net/nova/+bug/1250751).

Note that volume-backed instances work with live migration just fine
without a shared fs for ephemeral disks since Grizzly.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] alternative approaches to CEPH-FS

2013-11-19 Thread Gautam Saxena
Hi Yip,

Thanks for the code. With respect to can't grow, I think I can (with some
difficulty perhaps?) resize the vm if I needed to, but I'm really just
trying to buy myself time till CEPH-FS is production readyPoint #3
scares me, so I'll have to think about that one. Most likely I'd use a
completely different technology to back-up this VM (eg rsync the key
folders to some external, encrypted, cheap RAID storage, such as iomega.)
Point 4 is probably not that big a deal for my needs, since CEPH itself
should more-or-less ensure high availability due to disk crashes -- and as
for VM crashing, I can tolerate a few minutes/hours downtime if needed, and
so far I have yet to have a Linux VM crash over 3 years of running them


On Mon, Nov 18, 2013 at 1:26 AM, YIP Wai Peng yi...@comp.nus.edu.sg wrote:

 Hi all,

 I've uploaded it via github - https://github.com/waipeng/nfsceph.
 Standard disclaimer applies. :)

 Actually #3 is a novel idea, I have not thought of it. Thinking about the
 difference just off the top of my head though, comparatively, #3 will have

 1) more overheads (because of the additional VM)

 2) Can't grow once you reach the hard limit of 14TB, and if you have
 multiple of such machines, then fragmentation becomes a problem

 3) might have the risk of 14TB partition corruption wiping out all your
 shares

 4) not as easy as HA. Although I have not worked HA into NFSCEPH yet, it
 should be doable by drdb-ing the NFS data directory, or any other
 techniques that people use for redundant NFS servers.

 - WP


 On Fri, Nov 15, 2013 at 10:26 PM, Gautam Saxena gsax...@i-a-inc.comwrote:

 Yip,

 I went to the link. Where can the script ( nfsceph) be downloaded? How's
 the robustness and performance of this technique? (That is, is there are
 any reason to believe that it would more/less robust and/or performant than
 option #3 mentioned in the original thread?)


 On Fri, Nov 15, 2013 at 1:57 AM, YIP Wai Peng yi...@comp.nus.edu.sgwrote:

 On Fri, Nov 15, 2013 at 12:08 AM, Gautam Saxena gsax...@i-a-inc.comwrote:


 1) nfs over rbd (
 http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/)


 We are now running this - basically an intermediate/gateway node that
 mounts ceph rbd objects and exports them as NFS.
 http://waipeng.wordpress.com/2013/11/12/nfsceph/

 - WP




 --
 *Gautam Saxena *
 President  CEO
 Integrated Analysis Inc.

 Making Sense of Data.™
 Biomarker Discovery Software | Bioinformatics Services | Data Warehouse
 Consulting | Data Migration Consulting
 www.i-a-inc.com  http://www.i-a-inc.com/
 gsax...@i-a-inc.com
 (301) 760-3077  office
 (240) 479-4272  direct
 (301) 560-3463  fax





-- 
*Gautam Saxena *
President  CEO
Integrated Analysis Inc.

Making Sense of Data.™
Biomarker Discovery Software | Bioinformatics Services | Data Warehouse
Consulting | Data Migration Consulting
www.i-a-inc.com  http://www.i-a-inc.com/
gsax...@i-a-inc.com
(301) 760-3077  office
(240) 479-4272  direct
(301) 560-3463  fax
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Librados Error Codes

2013-11-19 Thread Josh Durgin

On 11/19/2013 05:28 AM, Behar Veliqi wrote:

Hi,

when using the librados c library, the documentation of the different functions 
just tells that it returns a negative error code on failure,
e.g. the rados_read function 
(http://ceph.com/docs/master/rados/api/librados/#rados_read).

Is there anywhere any further documentation which error code is returned under 
which condition and how to know _why_ the operation has failed?


For some functions there is, but for most of them there are many
common errors that aren't listed, and some errors depend on
the OSD backend being used.

The error codes are all negative POSIX errno values, so many of them
should be self-explanatory (i.e. -ENOENT when an object doesn't exist,
-EPERM when you don't have access to a pool, -EROFS if you try to write
to a snapshot, etc.). It would be good to document these though.

If you're looking into librados more, the C header has some more detail
in @defgroup blocks that aren't parsed into the web docs:

https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h#L279

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] alternative approaches to CEPH-FS

2013-11-19 Thread YIP Wai Peng
On Wednesday, 20 November 2013, Gautam Saxena wrote:

 Hi Yip,

 Thanks for the code. With respect to can't grow, I think I can (with
 some difficulty perhaps?) resize the vm if I needed to, but I'm really just
 trying to buy myself time till CEPH-FS is production readyPoint #3
 scares me, so I'll have to think about that one. Most likely I'd use a
 completely different technology to back-up this VM (eg rsync the key
 folders to some external, encrypted, cheap RAID storage, such as iomega.)
 Point 4 is probably not that big a deal for my needs, since CEPH itself
 should more-or-less ensure high availability due to disk crashes -- and as
 for VM crashing, I can tolerate a few minutes/hours downtime if needed, and
 so far I have yet to have a Linux VM crash over 3 years of running them


Sorry, I don't really mean can't. It should be easy to expand the rbd and
resize2fs the partition, but a drawback is that it will incur downtime. All
in all, sounds like a feasible solution too! :)

- WP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] alternative approaches to CEPH-FS

2013-11-19 Thread YIP Wai Peng
On Wednesday, 20 November 2013, Dimitri Maziuk wrote:

 On 11/18/2013 01:19 AM, YIP Wai Peng wrote:
  Hi Dima,
 
  Benchmark FYI.
 
  $ /usr/sbin/bonnie++ -s 0 -n 5:1m:4k
  Version  1.97  --Sequential Create-- Random
 Create
  altair  -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
  files:max:min/sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  /sec %CP
   5:1048576:409618   1   609  11   604   918   1   436  10
 686  9
  Latency  1187ms   70907us 261ms2352ms 205ms
  111ms

 Is this on an FS on RBD on the NFS server, mounted and exported?



  Yes it is. And benchmarked on the client nfs machine.


 I get

  Version  1.96   --Sequential Create-- Random
 Create
  nautilus-Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
  files:max:min/sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  /sec %CP
   5:1048576:409619   1  5422  99   123   241   3  4525  99
 127   2
  Latency 86381us 746us   44125us 303ms 746us
 60720us

 on an ext4 filesystem mounted on top of DRBD device backed by an LSI
 raid (10, battery-backed) on the main node and an mdadm raid-0 on the
 passive node. This is on the server with he FS exported and mounted (but
 not getting much use -- I ran it after hours).


Hm, so maybe this nfsceph is not _that_ bad after all! :) Your read clearly
wins, so I'm guessing the drdb write is the slow one. Which drdb mode are
you using?


 On a client  during
 working hours I get

  Version  1.96   --Sequential Create-- Random
 Create
  stingray-Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
  files:max:min/sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  /sec %CP
   5:1048576:4096 3   0  1952  29  1385  13 2   0  1553  18
 575   5
  Latency  16383ms   19662us153ms8482ms1935us
  4467ms

 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] install three mon-nodes, two successed, one failed

2013-11-19 Thread Dnsbed Ops

Hello,

I follow the doc there:
http://ceph.com/docs/master/start/quick-ceph-deploy/

Just installed three mon-nodes, but one got failed.

The command and output:
ceph@ceph1:~/my-cluster$ ceph-deploy --overwrite-conf mon create  
ceph3.geocast.net
[ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy 
--overwrite-conf mon create ceph3.geocast.net
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts 
ceph3.geocast.net

[ceph_deploy.mon][DEBUG ] detecting platform for host ceph3 ...
[ceph3.geocast.net][DEBUG ] connected to host: ceph3.geocast.net
[ceph3.geocast.net][DEBUG ] detect platform information from remote 
host

[ceph3.geocast.net][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 12.04 precise
[ceph3][DEBUG ] determining if provided host has same hostname in 
remote

[ceph3.geocast.net][DEBUG ] get remote short hostname
[ceph3][DEBUG ] deploying mon to ceph3
[ceph3.geocast.net][DEBUG ] get remote short hostname
[ceph3.geocast.net][DEBUG ] remote hostname: ceph3
[ceph3.geocast.net][DEBUG ] write cluster configuration to 
/etc/ceph/{cluster}.conf

[ceph3.geocast.net][DEBUG ] create the mon path if it does not exist
[ceph3.geocast.net][DEBUG ] checking for done path: 
/var/lib/ceph/mon/ceph-ceph3/done
[ceph3.geocast.net][DEBUG ] create a done file to avoid re-doing the 
mon deployment

[ceph3.geocast.net][DEBUG ] create the init path if it does not exist
[ceph3.geocast.net][DEBUG ] locating the `service` executable...
[ceph3.geocast.net][INFO  ] Running command: sudo initctl emit ceph-mon 
cluster=ceph id=ceph3
[ceph3.geocast.net][INFO  ] Running command: sudo ceph --cluster=ceph 
--admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status
[ceph3][ERROR ] admin_socket: exception getting command descriptions: 
[Errno 2] No such file or directory

[ceph3][WARNIN] monitor: mon.ceph3, might not be running yet
[ceph3.geocast.net][INFO  ] Running command: sudo ceph --cluster=ceph 
--admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status
[ceph3][ERROR ] admin_socket: exception getting command descriptions: 
[Errno 2] No such file or directory

[ceph3][WARNIN] monitor ceph3 does not exist in monmap
[ceph3][WARNIN] neither `public_addr` nor `public_network` keys are 
defined for monitors

[ceph3][WARNIN] monitors may not be able to form quorum


The mon log for ceph3:
2013-11-20 12:32:07.622385 7f0c24b63780  0 ceph version 0.72.1 
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2267
2013-11-20 12:32:07.711105 7f0c24b63780  0 mon.ceph3 does not exist in 
monmap, will attempt to join an existing cluster
2013-11-20 12:32:07.711318 7f0c24b63780 -1 no public_addr or 
public_network specified, and mon.ceph3 not present in monmap or 
ceph.conf
2013-11-20 12:32:07.730717 7f6b94ed6780  0 ceph version 0.72.1 
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2277
2013-11-20 12:32:07.820180 7f6b94ed6780  0 mon.ceph3 does not exist in 
monmap, will attempt to join an existing cluster
2013-11-20 12:32:07.820402 7f6b94ed6780 -1 no public_addr or 
public_network specified, and mon.ceph3 not present in monmap or 
ceph.conf
2013-11-20 12:32:07.839424 7f62fa747780  0 ceph version 0.72.1 
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2287
2013-11-20 12:32:07.929246 7f62fa747780  0 mon.ceph3 does not exist in 
monmap, will attempt to join an existing cluster
2013-11-20 12:32:07.929533 7f62fa747780 -1 no public_addr or 
public_network specified, and mon.ceph3 not present in monmap or 
ceph.conf
2013-11-20 12:32:07.952320 7f32d060e780  0 ceph version 0.72.1 
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2297
2013-11-20 12:32:08.022799 7f32d060e780  0 mon.ceph3 does not exist in 
monmap, will attempt to join an existing cluster
2013-11-20 12:32:08.023155 7f32d060e780 -1 no public_addr or 
public_network specified, and mon.ceph3 not present in monmap or 
ceph.conf
2013-11-20 12:32:08.042415 7fa5e81dc780  0 ceph version 0.72.1 
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2307
2013-11-20 12:32:08.115243 7fa5e81dc780  0 mon.ceph3 does not exist in 
monmap, will attempt to join an existing cluster
2013-11-20 12:32:08.115528 7fa5e81dc780 -1 no public_addr or 
public_network specified, and mon.ceph3 not present in monmap or 
ceph.conf
2013-11-20 12:32:08.134854 7fd9929c4780  0 ceph version 0.72.1 
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 2317
2013-11-20 12:32:08.224308 7fd9929c4780  0 mon.ceph3 does not exist in 
monmap, will attempt to join an existing cluster
2013-11-20 12:32:08.224541 7fd9929c4780 -1 no public_addr or 
public_network specified, and mon.ceph3 not present in monmap or 
ceph.conf



I have tried many times, got no luck.
Can you help? Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] install three mon-nodes, two successed, one failed

2013-11-19 Thread Dnsbed Ops

And this is the ceph.conf:

[global]
fsid = 0615ddc1-abff-4fe2-8919-68448b9f6faa
mon_initial_members = ceph2, ceph3, ceph4
mon_host = 172.17.6.66,172.17.6.67,172.17.6.68
auth_supported = cephx
osd_journal_size = 1024
filestore_xattr_use_omap = true

Thanks.


于 2013-11-20 12:47, Dnsbed Ops 回复:

Hello,

I follow the doc there:
http://ceph.com/docs/master/start/quick-ceph-deploy/

Just installed three mon-nodes, but one got failed.

The command and output:
ceph@ceph1:~/my-cluster$ ceph-deploy --overwrite-conf mon create
ceph3.geocast.net
[ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy
--overwrite-conf mon create ceph3.geocast.net
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts 
ceph3.geocast.net

[ceph_deploy.mon][DEBUG ] detecting platform for host ceph3 ...
[ceph3.geocast.net][DEBUG ] connected to host: ceph3.geocast.net
[ceph3.geocast.net][DEBUG ] detect platform information from remote 
host

[ceph3.geocast.net][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 12.04 precise
[ceph3][DEBUG ] determining if provided host has same hostname in 
remote

[ceph3.geocast.net][DEBUG ] get remote short hostname
[ceph3][DEBUG ] deploying mon to ceph3
[ceph3.geocast.net][DEBUG ] get remote short hostname
[ceph3.geocast.net][DEBUG ] remote hostname: ceph3
[ceph3.geocast.net][DEBUG ] write cluster configuration to
/etc/ceph/{cluster}.conf
[ceph3.geocast.net][DEBUG ] create the mon path if it does not exist
[ceph3.geocast.net][DEBUG ] checking for done path:
/var/lib/ceph/mon/ceph-ceph3/done
[ceph3.geocast.net][DEBUG ] create a done file to avoid re-doing the
mon deployment
[ceph3.geocast.net][DEBUG ] create the init path if it does not exist
[ceph3.geocast.net][DEBUG ] locating the `service` executable...
[ceph3.geocast.net][INFO  ] Running command: sudo initctl emit
ceph-mon cluster=ceph id=ceph3
[ceph3.geocast.net][INFO  ] Running command: sudo ceph --cluster=ceph
--admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status
[ceph3][ERROR ] admin_socket: exception getting command descriptions:
[Errno 2] No such file or directory
[ceph3][WARNIN] monitor: mon.ceph3, might not be running yet
[ceph3.geocast.net][INFO  ] Running command: sudo ceph --cluster=ceph
--admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status
[ceph3][ERROR ] admin_socket: exception getting command descriptions:
[Errno 2] No such file or directory
[ceph3][WARNIN] monitor ceph3 does not exist in monmap
[ceph3][WARNIN] neither `public_addr` nor `public_network` keys are
defined for monitors
[ceph3][WARNIN] monitors may not be able to form quorum


The mon log for ceph3:
2013-11-20 12:32:07.622385 7f0c24b63780  0 ceph version 0.72.1
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 
2267

2013-11-20 12:32:07.711105 7f0c24b63780  0 mon.ceph3 does not exist
in monmap, will attempt to join an existing cluster
2013-11-20 12:32:07.711318 7f0c24b63780 -1 no public_addr or
public_network specified, and mon.ceph3 not present in monmap or
ceph.conf
2013-11-20 12:32:07.730717 7f6b94ed6780  0 ceph version 0.72.1
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 
2277

2013-11-20 12:32:07.820180 7f6b94ed6780  0 mon.ceph3 does not exist
in monmap, will attempt to join an existing cluster
2013-11-20 12:32:07.820402 7f6b94ed6780 -1 no public_addr or
public_network specified, and mon.ceph3 not present in monmap or
ceph.conf
2013-11-20 12:32:07.839424 7f62fa747780  0 ceph version 0.72.1
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 
2287

2013-11-20 12:32:07.929246 7f62fa747780  0 mon.ceph3 does not exist
in monmap, will attempt to join an existing cluster
2013-11-20 12:32:07.929533 7f62fa747780 -1 no public_addr or
public_network specified, and mon.ceph3 not present in monmap or
ceph.conf
2013-11-20 12:32:07.952320 7f32d060e780  0 ceph version 0.72.1
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 
2297

2013-11-20 12:32:08.022799 7f32d060e780  0 mon.ceph3 does not exist
in monmap, will attempt to join an existing cluster
2013-11-20 12:32:08.023155 7f32d060e780 -1 no public_addr or
public_network specified, and mon.ceph3 not present in monmap or
ceph.conf
2013-11-20 12:32:08.042415 7fa5e81dc780  0 ceph version 0.72.1
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 
2307

2013-11-20 12:32:08.115243 7fa5e81dc780  0 mon.ceph3 does not exist
in monmap, will attempt to join an existing cluster
2013-11-20 12:32:08.115528 7fa5e81dc780 -1 no public_addr or
public_network specified, and mon.ceph3 not present in monmap or
ceph.conf
2013-11-20 12:32:08.134854 7fd9929c4780  0 ceph version 0.72.1
(4d923861868f6a15dcb33fef7f50f674997322de), process ceph-mon, pid 
2317

2013-11-20 12:32:08.224308 7fd9929c4780  0 mon.ceph3 does not exist
in monmap, will attempt to join an existing cluster
2013-11-20 12:32:08.224541 7fd9929c4780 -1 no public_addr or
public_network specified, and mon.ceph3 not present in monmap or
ceph.conf


I have tried