Re: [ceph-users] Ceph and cinder bootable volume not working

2013-12-31 Thread Haomai Wang
Oh, this bug is registed at https://bugs.launchpad.net/nova/+bug/1257674

On Wed, Jan 1, 2014 at 2:21 AM, ed l  wrote:
> I managed to figure out the issues.
>
> One i was trying to use qcow format instead of raw.
> Second adding the following fixed all the issues with file injection not
> supported
> in nova.conf:
> libvirt_inject_password=false
> libvirt_inject_key=false
> libvirt_inject_partition=-2
> libvirt_images_type=rbd
> libvirt_images_rbd_pool=nova
> libvirt_images_rbd_ceph_conf= /etc/ceph/ceph.conf
>
>> Date: Tue, 31 Dec 2013 09:51:48 +0800
>> Subject: Re: [ceph-users] Ceph and cinder bootable volume not working
>> From: haomaiw...@gmail.com
>> To: u29...@hotmail.com
>> CC: ceph-users@lists.ceph.com
>
>>
>> It seemed that your nova secret uuid is incorrect. Maybe you can paste
>> your nova.conf
>>
>> On Tue, Dec 31, 2013 at 5:12 AM, ed l  wrote:
>> > Sorry I thought i pasted a important log file on first email. This is
>> > from
>> > nova compute logs. file injection into a boot volume instance is not
>> > supported seems to be a weird error that i dont know what to do next.
>> > Any
>> > ideas?
>> >
>> > 2013-12-30 05:54:40.625 1469 ERROR nova.virt.libvirt.driver [-]
>> > [instance:
>> > 74abd3e2-f020-4612-9ba7-219d42afefec] During wait destroy, instance
>> > disappeared.
>> > 2013-12-30 05:54:40.730 1469 ERROR nova.virt.libvirt.driver [-]
>> > [instance:
>> > 3899a5bb-6737-412d-be57-a76a31744e9b] During wait destroy, instance
>> > disappeared.
>> > 2013-12-30 05:54:40.758 1469 ERROR nova.virt.libvirt.driver [-]
>> > [instance:
>> > 1a4c61af-c767-4839-b8e0-4e5c225a6be2] During wait destroy, instance
>> > disappeared.
>> > 2013-12-30 05:54:40.788 1469 ERROR nova.virt.libvirt.driver [-]
>> > [instance:
>> > bc698cd9-582a-4da1-a2c9-f1967f15567a] During wait destroy, instance
>> > disappeared.
>> > 2013-12-30 05:55:00.028 1469 WARNING nova.virt.libvirt.driver
>> > [req-97d23860-e6d5-4be6-b12b-713a4627b8b3
>> > 0889af39b0dd4837815540dc9f7c9e3e
>> > 9ca58e5c52ad41efae313a5a0b477b22] [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] File injection into a boot from
>> > volume
>> > instance is not supported
>> > 2013-12-30 05:55:00.231 1469 ERROR nova.compute.manager
>> > [req-97d23860-e6d5-4be6-b12b-713a4627b8b3
>> > 0889af39b0dd4837815540dc9f7c9e3e
>> > 9ca58e5c52ad41efae313a5a0b477b22] [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] Instance failed to spawn
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] Traceback (most recent call last):
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] File
>> > "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1417,
>> > in
>> > _spawn
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] block_device_info)
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] File
>> > "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line
>> > 2067,
>> > in spawn
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] write_to_disk=True)
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] File
>> > "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line
>> > 3061,
>> > in to_xml
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] xml = conf.to_xml()
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] File
>> > "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line 68,
>> > in
>> > to_xml
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] root = self.format_dom()
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] File
>> > "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line
>> > 1138,
>> > in format_dom
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] self._format_devices(root)
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] File
>> > "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line
>> > 1115,
>> > in _format_devices
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] devices.append(dev.format_dom())
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compute.manager [instance:
>> > 1a28dd9e-d199-43cb-b858-a3c7c2909767] File
>> > "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line
>> > 521, in
>> > format_dom
>> > 2013-12-30 05:55:00.231 1469 TRACE nova.compu

Re: [ceph-users] shutting down for maintenance

2013-12-31 Thread James Harper
> 
> Most production clusters are large enough that you don't have to bring down
> the entire cluster to do maintenance on particular machines. If your
> reconfiguring the entire network, that's a bit more involved. I'm not sure
> what your cluster looks like, so I can't advise.  However, you mention
> changing IP addresses. Changing the IP addresses for the OSDs is okay, but
> you want to be careful when changing them for monitors. See
> http://ceph.com/docs/master/rados/operations/add-or-rm-
> mons/#changing-a-monitor-s-ip-address.  Since monitors use the public
> network, this shouldn't be a problem in your case. You can change the config
> files if you've include OSD entries in your config file. You may also change 
> the
> settings at runtime. See
> http://ceph.com/docs/master/rados/configuration/ceph-conf/#runtime-
> changes.
> 

I'm reconfiguring a switch, so it's going to affect all members.

James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-community] Ceph User Committee elections : call for participation

2013-12-31 Thread Sebastien Han
Hi,

I’m not sure to have the whole visibility of the role but I will be more than 
happy to take over.
I believe that I can allocate some time for this.

Cheers.
 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien@enovance.com 
Address : 10, rue de la Victoire - 75009 Paris 
Web : www.enovance.com - Twitter : @enovance 

On 31 Dec 2013, at 09:18, Loic Dachary  wrote:

> Hi,
> 
> For personal reasons I have to step down as head of the Ceph User Committee 
> at the end of January 2014. Who would be willing to take over this role ? If 
> there is enough interest I'll organize the election. Otherwise we'll have to 
> figure out something ;-)
> 
> Cheers
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 
> ___
> Ceph-community mailing list
> ceph-commun...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] shutting down for maintenance

2013-12-31 Thread John Wilkins
Most production clusters are large enough that you don't have to bring down
the entire cluster to do maintenance on particular machines. If your
reconfiguring the entire network, that's a bit more involved. I'm not sure
what your cluster looks like, so I can't advise.  However, you mention
changing IP addresses. Changing the IP addresses for the OSDs is okay, but
you want to be careful when changing them for monitors. See
http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address.
 Since monitors use the public network, this shouldn't be a problem in your
case. You can change the config files if you've include OSD entries in your
config file. You may also change the settings at runtime. See
http://ceph.com/docs/master/rados/configuration/ceph-conf/#runtime-changes.


On Tue, Dec 31, 2013 at 9:35 AM, Scottix  wrote:

> The way I have done it is so the osd don't get set out.
>
> Check the link below
>
>
> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing
>
>
> On Tue, Dec 31, 2013 at 12:43 AM, James Harper <
> james.har...@bendigoit.com.au> wrote:
>
>> I need to shut down ceph for maintenance to make some hardware changes.
>> Is it sufficient to just stop all services on all nodes, or is there a way
>> to put the whole cluster into standby or something first?
>>
>> And when things come back up, IP addresses on the cluster network will be
>> different (public network will not change though). Is it sufficient to just
>> change the config files and the osd's will register themselves correctly,
>> or is there more involved?
>>
>> Thanks
>>
>> James
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Follow Me: @Scottix 
> http://about.me/scottix
> scot...@gmail.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph 2013, a year in review

2013-12-31 Thread Patrick McGarry
Hey Cephers,

I took a few minutes today to reflect on this past year of Ceph.  Hope
you are all gearing up to bring in the new year with a bang.  See you
in 2014.

http://ceph.com/community/ceph-in-2013-a-year-in-review/



Best Regards,

Patrick McGarry
Director, Community || Inktank
http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] shutting down for maintenance

2013-12-31 Thread Scottix
The way I have done it is so the osd don't get set out.

Check the link below

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing


On Tue, Dec 31, 2013 at 12:43 AM, James Harper <
james.har...@bendigoit.com.au> wrote:

> I need to shut down ceph for maintenance to make some hardware changes. Is
> it sufficient to just stop all services on all nodes, or is there a way to
> put the whole cluster into standby or something first?
>
> And when things come back up, IP addresses on the cluster network will be
> different (public network will not change though). Is it sufficient to just
> change the config files and the osd's will register themselves correctly,
> or is there more involved?
>
> Thanks
>
> James
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Follow Me: @Scottix 
http://about.me/scottix
scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster performance degrade (radosgw) after running some time

2013-12-31 Thread Guang Yang
Thanks Mark, my comments inline...

Date: Mon, 30 Dec 2013 07:36:56 -0600
From: Mark Nelson 
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph cluster performance degrade (radosgw)
    after running some time

On 12/30/2013 05:45 AM, Guang wrote:
> Hi ceph-users and ceph-devel,
> Merry Christmas and Happy New Year!
>
> We have a ceph cluster with radosgw, our customer is using S3 API to
> access the cluster.
>
> The basic information of the cluster is:
> bash-4.1$ ceph -s
>    cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
>    health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>    monmap e1: 3 mons at
> {osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
> election epoch 40, quorum 0,1,2 osd151,osd152,osd153
>    osdmap e129885: 787 osds: 758 up, 758 in
>      pgmap v1884502: 22203 pgs: 22125 active+clean, 1
> active+clean+scrubbing, 1 active+clean+inconsistent, 76
> active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065
> TB avail
>    mdsmap e1: 0/0/1 up
>
> #When the latency peak happened, there was no scrubbing, recovering or
> backfilling at the moment.#
>
> While the performance of the cluster (only with WRITE traffic) is stable
> until Dec 25th, our monitoring (for radosgw access log) shows a
> significant increase of average latency and 99% latency.
>
> And then I chose one OSD and try to grep slow requests logs and find
> that most of the slow requests were waiting for subop, I take osd22 for
> example.
>
> osd[561-571] are hosted by osd22.
> -bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log |
> grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done >
> ~/slow_osd.txt
> -bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort ?nr
>    3586 656,598
>      289 467,629
>      284 598,763
>      279 584,598
>      203 172,598
>      182 598,6
>      155 629,646
>      83 631,598
>      65 631,593
>      21 616,629
>      20 609,671
>      20 609,390
>      13 609,254
>      12 702,629
>      12 629,641
>      11 665,613
>      11 593,724
>      11 361,591
>      10 591,709
>        9 681,609
>        9 609,595
>        9 591,772
>        8 613,662
>        8 575,591
>        7 674,722
>        7 609,603
>        6 585,605
>        5 613,691
>        5 293,629
>        4 774,591
>        4 717,591
>        4 613,776
>        4 538,629
>        4 485,629
>        3 702,641
>        3 608,629
>        3 593,580
>        3 591,676
>
> It turns out most of the slow requests were waiting for osd 598, 629, I
> ran the procedure on another host osd22 and got the same pattern.
>
> Then I turned to the host having osd598 and dump the perf counter to do
> comparision.
>
> -bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon
> /var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
> op_latency,subop_latency,total_ops
> 0.192097526753471,0.0344513450167198,7549045
> 1.99137797628122,1.42198426157216,9184472
> 0.198062399664129,0.0387090378926376,6305973
> 0.621697271315762,0.396549768986993,9726679
> 29.5222496247375,18.246379615, 10860858
> 0.229250239525916,0.0557482067611005,8149691
> 0.208981698303654,0.0375553180438224,6623842
> 0.47474766302086,0.292583928601509,9838777
> 0.339477790083925,0.101288409388438,9340212
> 0.186448840141895,0.0327296517417626,7081410
> 0.807598201207144,0.0139762289702332,6093531
> (osd 598 is op hotspot as well)
>
> This double confirmed that osd 598 was having some performance issues
> (it has around *30 seconds average op latency*!).
> sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the
> latency difference is not as significant as we saw from osd perf.
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 37.3    459.9    89.8    4106.9  61.8    1.6      12.2    4.9
> 42.3    545.8    91.8    4296.3  69.7    2.4      17.6    5.2
> 42.0    483.8    93.1    4263.6  68.8    1.8      13.3    5.1
> 39.7    425.5    89.4    4327.0  68.5    1.8      14.0    5.3
>
> Another disk at the same time for comparison (/dev/sdb).
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 34.2    502.6    80.1    3524.3    53.4    1.3    11.8      4.7
> 35.3    560.9    83.7    3742.0    56.0    1.2    9.8      4.7
> 30.4    371.5  78.8    3631.4    52.2    1.7    15.8    4.8
> 33.0    389.4  78.8      3597.6  54.2    1.4      12.1    4.8
>
> Any idea why a couple of OSDs are so slow that impact the performance of
> the entire cluster?

You may want to use the dump_historic_ops command in the admin socket 
for the slow OSDs.  That will give you some clues regarding where the 
ops are hanging up in the OSD.  You can also crank the osd debugging way 
up on that node and search through the logs to see if there are any 
patterns or trends (consistent slowness, pauses, etc).  It may also be 
useful to look and see if that OSD is pegging CPU and if so attach 
strace or perf to it and see what it's doing.
[yguang] We have a job dump_historic_ops but unfortunate

Re: [ceph-users] Ceph cluster performance degrade (radosgw) after running some time

2013-12-31 Thread Guang Yang
Thanks Wido, my comments inline...

>Date: Mon, 30 Dec 2013 14:04:35 +0100
>From: Wido den Hollander 
>To: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] Ceph cluster performance degrade (radosgw)
>    after running some time

>On 12/30/2013 12:45 PM, Guang wrote:
> Hi ceph-users and ceph-devel,
> Merry Christmas and Happy New Year!
>
> We have a ceph cluster with radosgw, our customer is using S3 API to
> access the cluster.
>
> The basic information of the cluster is:
> bash-4.1$ ceph -s
>    cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2
>    health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>    monmap e1: 3 mons at
> {osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0},
> election epoch 40, quorum 0,1,2 osd151,osd152,osd153
>    osdmap e129885: 787 osds: 758 up, 758 in
>      pgmap v1884502: 22203 pgs: 22125 active+clean, 1
> active+clean+scrubbing, 1 active+clean+inconsistent, 76
> active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065
> TB avail
>    mdsmap e1: 0/0/1 up
>
> #When the latency peak happened, there was no scrubbing, recovering or
> backfilling at the moment.#
>
> While the performance of the cluster (only with WRITE traffic) is stable
> until Dec 25th, our monitoring (for radosgw access log) shows a
> significant increase of average latency and 99% latency.
>
> And then I chose one OSD and try to grep slow requests logs and find
> that most of the slow requests were waiting for subop, I take osd22 for
> example.
>
> osd[561-571] are hosted by osd22.
> -bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log |
> grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done >
> ~/slow_osd.txt
> -bash-4.1$ cat ~/slow_osd.txt  | sort | uniq -c | sort ?nr
>    3586 656,598
>      289 467,629
>      284 598,763
>      279 584,598
>      203 172,598
>      182 598,6
>      155 629,646
>      83 631,598
>      65 631,593
>      21 616,629
>      20 609,671
>      20 609,390
>      13 609,254
>      12 702,629
>      12 629,641
>      11 665,613
>      11 593,724
>      11 361,591
>      10 591,709
>        9 681,609
>        9 609,595
>        9 591,772
>        8 613,662
>        8 575,591
>        7 674,722
>        7 609,603
>        6 585,605
>        5 613,691
>        5 293,629
>        4 774,591
>        4 717,591
>        4 613,776
>        4 538,629
>        4 485,629
>        3 702,641
>        3 608,629
>        3 593,580
>        3 591,676
>
> It turns out most of the slow requests were waiting for osd 598, 629, I
> ran the procedure on another host osd22 and got the same pattern.
>
> Then I turned to the host having osd598 and dump the perf counter to do
> comparision.
>
> -bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon
> /var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done
> op_latency,subop_latency,total_ops
> 0.192097526753471,0.0344513450167198,7549045
> 1.99137797628122,1.42198426157216,9184472
> 0.198062399664129,0.0387090378926376,6305973
> 0.621697271315762,0.396549768986993,9726679
> 29.5222496247375,18.246379615, 10860858
> 0.229250239525916,0.0557482067611005,8149691
> 0.208981698303654,0.0375553180438224,6623842
> 0.47474766302086,0.292583928601509,9838777
> 0.339477790083925,0.101288409388438,9340212
> 0.186448840141895,0.0327296517417626,7081410
> 0.807598201207144,0.0139762289702332,6093531
> (osd 598 is op hotspot as well)
>
> This double confirmed that osd 598 was having some performance issues
> (it has around *30 seconds average op latency*!).
> sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the
> latency difference is not as significant as we saw from osd perf.
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 37.3    459.9    89.8    4106.9  61.8    1.6      12.2    4.9
> 42.3    545.8    91.8    4296.3  69.7    2.4      17.6    5.2
> 42.0    483.8    93.1    4263.6  68.8    1.8      13.3    5.1
> 39.7    425.5    89.4    4327.0  68.5    1.8      14.0    5.3
>
> Another disk at the same time for comparison (/dev/sdb).
> reads  kbread writes  kbwrite %busy  avgqu  await  svctm
> 34.2    502.6    80.1    3524.3    53.4    1.3    11.8      4.7
> 35.3    560.9    83.7    3742.0    56.0    1.2    9.8      4.7
> 30.4    371.5  78.8    3631.4    52.2    1.7    15.8    4.8
> 33.0    389.4  78.8      3597.6  54.2    1.4      12.1    4.8
>
> Any idea why a couple of OSDs are so slow that impact the performance of
> the entire cluster?
>

What filesystem are you using? Btrfs or XFS?

Btrfs still suffers from a performance degradation over time. So if you 
run btrfs, that might be the problem.

[yguang] We are running on xfs, journal and data share the same disk with 
different partitions.

Wido

> Thanks,
> Guang
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailin

Re: [ceph-users] Add a noter node with odd ans monitor

2013-12-31 Thread Alfredo Deza
On Thu, Dec 19, 2013 at 10:38 AM, Julien Calvet
 wrote:
> Hello,
>
> When I try to deploy a new monitor on a new node with ceph-deploy, I have 
> this error:
>
>
> —
>
> eph@p1:~$ ceph-deploy mon create s4.13h.com
> [ceph_deploy.cli][INFO  ] Invoked (1.3.3): /usr/bin/ceph-deploy mon create 
> s4.13h.com
> [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts s4.13h.com
> [ceph_deploy.mon][DEBUG ] detecting platform for host s4 ...
> [s4.13h.com][DEBUG ] connected to host: s4.13h.com
> [s4.13h.com][DEBUG ] detect platform information from remote host
> [s4.13h.com][DEBUG ] detect machine type
> [ceph_deploy.mon][INFO  ] distro info: Ubuntu 12.04 precise
> [s4][DEBUG ] determining if provided host has same hostname in remote
> [s4.13h.com][DEBUG ] get remote short hostname
> [s4][DEBUG ] deploying mon to s4
> [s4.13h.com][DEBUG ] get remote short hostname
> [s4.13h.com][DEBUG ] remote hostname: s4
> [s4.13h.com][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
> [s4.13h.com][DEBUG ] create the mon path if it does not exist
> [s4.13h.com][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-s4/done
> [s4.13h.com][DEBUG ] create a done file to avoid re-doing the mon deployment
> [s4.13h.com][DEBUG ] create the init path if it does not exist
> [s4.13h.com][DEBUG ] locating the `service` executable...
> [s4.13h.com][INFO  ] Running command: sudo initctl emit ceph-mon cluster=ceph 
> id=s4
> [s4.13h.com][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon 
> /var/run/ceph/ceph-mon.s4.asok mon_status
> [s4][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] 
> No such file or directory
> [s4][WARNIN] monitor: mon.s4, might not be running yet
> [s4.13h.com][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon 
> /var/run/ceph/ceph-mon.s4.asok mon_status
> [s4][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] 
> No such file or directory
> [s4][WARNIN] s4 is not defined in `mon initial members`
> [s4][WARNIN] monitor s4 does not exist in monmap
> [s4][WARNIN] neither `public_addr` nor `public_network` keys are defined for 
> monitors
> [s4][WARNIN] monitors may not be able to form quorum
>
>
> —
>
> Could anyone help me ?

I don't think you can *add* monitors with ceph-deploy (see
http://tracker.ceph.com/issues/6552)

To add some monitors, for now, you will need to add them manually (not
with ceph-deploy)

>
> Regards,
>
> Julien
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] shutting down for maintenance

2013-12-31 Thread James Harper
I need to shut down ceph for maintenance to make some hardware changes. Is it 
sufficient to just stop all services on all nodes, or is there a way to put the 
whole cluster into standby or something first?

And when things come back up, IP addresses on the cluster network will be 
different (public network will not change though). Is it sufficient to just 
change the config files and the osd's will register themselves correctly, or is 
there more involved?

Thanks

James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph User Committee elections : call for participation

2013-12-31 Thread Loic Dachary
Hi,

For personal reasons I have to step down as head of the Ceph User Committee at 
the end of January 2014. Who would be willing to take over this role ? If there 
is enough interest I'll organize the election. Otherwise we'll have to figure 
out something ;-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com