Re: [ceph-users] Ceph - Health and Monitoring

2017-01-04 Thread jiajia zhong
actually, what you need is an ceph-common package (ubuntu) which contains
/usr/bin/ceph, You have to be sure the command's going to be executed on
which host. make sure the keys and ceph.conf are correctly configured on
that host.

you could just run the commands to make sure the configure's ok.
eg:
user@:/var/lib/shinken/libexec$ ./check_ceph_mon -I  -H
10.0.1.XXX
MON OK




2017-01-05 3:50 GMT+08:00 Jeffrey Ollie :

> I can definitely recommend Prometheus but I prefer the exporter for Ceph
> that I wrote :)
>
> https://github.com/jcollie/ceph_exporter
>
>
> On Mon, Jan 2, 2017 at 7:55 PM, Craig Chi  wrote:
>
>> Hello,
>>
>> I suggest Prometheus with ceph_exporter
>>  and Grafana (UI). It can
>> also monitor the node's health and any other services you want.
>> And it has a beautiful UI.
>>
>> Sincerely,
>> Craig Chi
>>
>> On 2017-01-02 21:32, ulem...@polarzone.de wrote:
>>
>> Hi Andre,
>> I use check_ceph_dash on top of ceph-dash for this (is an nagios/icinga
>> Plugin).
>> https://github.com/Crapworks/ceph-dashhttps://github.com/Crapworks/check_ceph_dash
>>
>> ceph-dash provide an simple clear overview as web-dashbord.
>>
>>
>> Udo
>>
>> Am 2017-01-02 12:42, schrieb Andre Forigato:
>> > Hello,
>> >
>> > I am responsible with the health of the servers and the entire Ceph
>> > system.
>> > What should I use to monitor the entire Celp environment?
>> > Monitor all objects.
>> >
>> > Which one is the best?
>> > Is it SNMP only?
>> >
>> >
>> > Thanks.
>> >
>> > Andre
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing 
>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Jeff Ollie
> The majestik møøse is one of the mäni interesting furry animals in Sweden.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster pause - possible consequences

2017-01-04 Thread Brian Andrus
On Mon, Jan 2, 2017 at 6:46 AM, Wido den Hollander  wrote:

>
> > Op 2 januari 2017 om 15:43 schreef Matteo Dacrema :
> >
> >
> > Increasing pg_num will lead to several slow requests and cluster freeze,
> but  due to creating pgs operation , for what I’ve seen until now.
> > During the creation period all the request are frozen , and the creation
> period take a lot of time even for 128 pgs.
> >
> > I’ve observed that during creation period most of the OSD goes at 100%
> of their performance capacity. I think that without operation running in
> the cluster I’ll be able to upgrade pg_num quickly without causing down
> time several times.
> >
>
> First, slowly increase pg_num to the number you want, then increase
> pgp_num in small baby steps as well.
>
> Wido
>

As Wido mentioned, low+slow is the way to go for production environments.
increase in small increments.

pg_num increases should be fairly transparent to client IO, but test first
by increasing your pool in increasing amounts. pgp_num increase will cause
client interruption in a lot of cases, so this is what you'll need to be
wary of.

Here's some select logic from a quick and dirty script I wrote to do the
last PG increase job, maybe it will help in your endeavors:

https://gist.github.com/oddomatik/7cca9b64d7b13d17e800cc35894037ac


>
> > Matteo
> >
> > > Il giorno 02 gen 2017, alle ore 15:02, c...@jack.fr.eu.org ha scritto:
> > >
> > > Well, as the doc said:
> > >> Set or clear the pause flags in the OSD map. If set, no IO requests
> will be sent to any OSD. Clearing the flags via unpause results in
> resending pending requests.
> > > If you do that on a production cluster, that means your cluster will no
> > > longer be in production :)
> > >
> > > Depending on your needs, but ..
> > > Maybe you want do this operation as fast as possible
> > > Or maybe you want to make that operation as transparent as possible,
> > > from a user point of view
> > >
> > > You may have a look at osd_recovery_op_priority &
> > > osd_client_op_priority, they might be interesting for you
> > >
> > > On 02/01/2017 14:37, Matteo Dacrema wrote:
> > >> Hi All,
> > >>
> > >> what happen if I set pause flag on a production cluster?
> > >> I mean, will all the request remain pending/waiting or all the
> volumes attached to the VMs will become read-only?
> > >>
> > >> I need to quickly upgrade placement group number from 3072 to 8192 or
> better to 165336 and I think doing it without client operations will be
> much faster.
> > >>
> > >> Thanks
> > >> Regards
> > >> Matteo
> > >>
> > >>
> > >>
> > >>
> > >> ___
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > > --
> > > Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato
> non infetto.
> > > Seguire il link qui sotto per segnalarlo come spam:
> > > http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=9F3C956B85.A333A
> > >
> > >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Brian Andrus
Cloud Systems Engineer
DreamHost, LLC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool Sizes

2017-01-04 Thread Brian Andrus
Think "many objects, few pools". The number of pools do not scale well
because of PG limitations. Keep a small number of pools with the proper
number of PGs. See this tool for pool sizing:

https://ceph.com/pgcalc

By default, the developers have chosen a 4MB object size for the built-in
clients. This is a sensible choice and will result in good performance for
most workloads, but can depend on what type of operations you most
frequently perform and how your client interacts with the cluster.

I will let some other folks chime in with firsthand experience, but I have
worked with pools containing billions of objects and observed them
functioning fine. A few issues I can foresee off the top of my head are
potential underlying filesystem limits (should be okay) and keeping cluster
operations to a minimum (resizing/deleting pools).

Since we're talking about scale, CERN's videos are interesting for
examining the current challenges in Ceph at scale. (mostly hardware
observations)

https://youtu.be/A_VojOZjJTY

Yahoo chose a "super-cluster" architecture to work around former
limitations with large clusters, but I do believe many of the findings
CERN/Yahoo have uncovered have been addressed in recent versions of Ceph,
or are being targeted by developers in upcoming versions.

https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at


On Sat, Dec 31, 2016 at 3:53 PM, Kent Borg  wrote:

> More newbie questions about librados...
>
> I am making design decisions now that I want to scale to really big sizes
> in the future, and so need to understand where size limits and performance
> bottlenecks come from. Ceph has a reputation for being able to scale to
> exabytes, but I don't see much on how one should sensibly get to such
> scales. Do I make big objects? Pools with lots of objects in them? Lots of
> pools? A pool that has a thousand objects of a megabyte each vs. a pool
> that has a million objects or a thousand bytes each: why should one take
> one approach and when should one take the other? How big can a pool get? Is
> a billion objects a lot, something that Ceph works to handle, or is it
> something Ceph thinks is no big deal? Is a trillion objects a lot? Is a
> million pools a lot? A billion pools? How many is "lots" for Ceph?
>
> I plan to accumulate data indefinitely, I plan to add cluster capacity on
> a regular schedule, I want performance that doesn't degrade with size.
>
> Where do things break down? What is the wrong way to scale Ceph?
>
> Thanks,
>
> -kb, the Kent who guesses putting all his data in a single xattr or single
> RADOS object would be the wrong way.
>
> P.S. Happy New Year!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Brian Andrus
Cloud Systems Engineer
DreamHost, LLC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - Health and Monitoring

2017-01-04 Thread Jeffrey Ollie
I can definitely recommend Prometheus but I prefer the exporter for Ceph
that I wrote :)

https://github.com/jcollie/ceph_exporter


On Mon, Jan 2, 2017 at 7:55 PM, Craig Chi  wrote:

> Hello,
>
> I suggest Prometheus with ceph_exporter
>  and Grafana (UI). It can
> also monitor the node's health and any other services you want.
> And it has a beautiful UI.
>
> Sincerely,
> Craig Chi
>
> On 2017-01-02 21:32, ulem...@polarzone.de wrote:
>
> Hi Andre,
> I use check_ceph_dash on top of ceph-dash for this (is an nagios/icinga
> Plugin).
> https://github.com/Crapworks/ceph-dashhttps://github.com/Crapworks/check_ceph_dash
>
> ceph-dash provide an simple clear overview as web-dashbord.
>
>
> Udo
>
> Am 2017-01-02 12:42, schrieb Andre Forigato:
> > Hello,
> >
> > I am responsible with the health of the servers and the entire Ceph
> > system.
> > What should I use to monitor the entire Celp environment?
> > Monitor all objects.
> >
> > Which one is the best?
> > Is it SNMP only?
> >
> >
> > Thanks.
> >
> > Andre
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Jeff Ollie
The majestik møøse is one of the mäni interesting furry animals in Sweden.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - Health and Monitoring

2017-01-04 Thread Andre Forigato
Crai, 

Hi,

I did not understand.

I installed the plugins on the Ceph server, but what should I install on the 
Nagios server?
The plugins installed on the Ceph server are working, but on the Nagios server 
they do not work, because the plugins need the installed Celp.
What should I do?

My plugins:
check-ceph-dash.py  check_ceph_df  check_ceph_health  check_ceph_mon  
check_ceph_osd  check_ceph_rgw

Site:
https://github.com/valerytschopp/ceph-nagios-plugins

Thank you.

André 
- Mensagem original -
> De: "Craig Chi" 
> Para: ulem...@polarzone.de
> Cc: "Andre Forigato" , ceph-users@lists.ceph.com
> Enviadas: Segunda-feira, 2 de janeiro de 2017 23:55:21
> Assunto: Re: [ceph-users] Ceph - Health and Monitoring

> Hello,
> I suggest Prometheus with ceph_exporter and Grafana (UI). It can also monitor
> the node's health and any other services you want.
> And it has a beautiful UI.
> Sincerely,
> Craig Chi
> On 2017-01-02 21:32, ulem...@polarzone.de wrote:

>> Hi Andre,
>> I use check_ceph_dash on top of ceph-dash for this (is an nagios/icinga
>> Plugin).

>> https://github.com/Crapworks/ceph-dash
>> https://github.com/Crapworks/check_ceph_dash

>> ceph-dash provide an simple clear overview as web-dashbord.

>> Udo

>> Am 2017-01-02 12:42, schrieb Andre Forigato:
>> > Hello,

>> > I am responsible with the health of the servers and the entire Ceph
>> > system.
>> > What should I use to monitor the entire Celp environment?
>> > Monitor all objects.

>> > Which one is the best?
>> > Is it SNMP only?


>> > Thanks.

>> > Andre
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> NOTE: This message was trained as non-spam. If this is wrong, please correct 
> the
> training as soon as possible.
> Spam
> N�o spam
> Esquecer voto anterior
> (Corpo do link HTML de treinamento)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw setup issue

2017-01-04 Thread Orit Wasserman
On Wed, Jan 4, 2017 at 7:08 PM, Brian Andrus  wrote:
> Regardless of whether it worked before, have you verified your RadosGWs have
> write access to monitors? They will need it if you want the RadosGW to
> create its own pools.
>
> ceph auth get 
>

I agree, it could be permissions issue

> On Wed, Jan 4, 2017 at 8:59 AM, Kamble, Nitin A 
> wrote:
>>
>>
>> > On Dec 26, 2016, at 2:48 AM, Orit Wasserman  wrote:
>> >
>> > On Fri, Dec 23, 2016 at 3:42 AM, Kamble, Nitin A
>> >  wrote:
>> >> I am trying to setup radosgw on a ceph cluster, and I am seeing some
>> >> issues where google is not helping. I hope some of the developers would be
>> >> able to help here.
>> >>
>> >>
>> >> I tried to create radosgw as mentioned here [0] on a jewel cluster. And
>> >> it gives the following error in log file after starting radosgw.
>> >>
>> >>
>> >> 2016-12-22 17:36:46.755786 7f084beeb9c0  0 set uid:gid to 167:167
>> >> (ceph:ceph)
>> >> 2016-12-22 17:36:46.755849 7f084beeb9c0  0 ceph version
>> >> 10.2.2-118-g894a5f8 (894a5f8d878d4b267f80b90a4bffce157f2b4ba7), process
>> >> radosgw, pid 10092
>> >> 2016-12-22 17:36:46.763821 7f084beeb9c0  1 -- :/0 messenger.start
>> >> 2016-12-22 17:36:46.764731 7f084beeb9c0  1 -- :/1011033520 -->
>> >> 39.0.16.7:6789/0 -- auth(proto 0 40 bytes epoch 0) v1 -- ?+0 
>> >> 0x7f084c8e9f60
>> >> con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.765055 7f084beda700  1 -- 39.0.16.9:0/1011033520
>> >> learned my addr 39.0.16.9:0/1011033520
>> >> 2016-12-22 17:36:46.765492 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> <== mon.0 39.0.16.7:6789/0 1  mon_map magic: 0 v1  195+0+0
>> >> (146652916 0 0) 0x7f0814000a60 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.765562 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> <== mon.0 39.0.16.7:6789/0 2  auth_reply(proto 2 0 (0) Success) v1 
>> >> 
>> >> 33+0+0 (1206278719 0 0) 0x7f0814000ee0 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.765697 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> --> 39.0.16.7:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
>> >> 0x7f08180013b0 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.765968 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> <== mon.0 39.0.16.7:6789/0 3  auth_reply(proto 2 0 (0) Success) v1 
>> >> 
>> >> 222+0+0 (4230455906 0 0) 0x7f0814000ee0 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.766053 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> --> 39.0.16.7:6789/0 -- auth(proto 2 181 bytes epoch 0) v1 -- ?+0
>> >> 0x7f0818001830 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.766315 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> <== mon.0 39.0.16.7:6789/0 4  auth_reply(proto 2 0 (0) Success) v1 
>> >> 
>> >> 425+0+0 (3179848142 0 0) 0x7f0814001180 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.766383 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> --> 39.0.16.7:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 
>> >> 0x7f084c8ea440
>> >> con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.766452 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520
>> >> --> 39.0.16.7:6789/0 -- mon_subscribe({osdmap=0}) v2 -- ?+0 0x7f084c8ea440
>> >> con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.766518 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> <== mon.0 39.0.16.7:6789/0 5  mon_map magic: 0 v1  195+0+0
>> >> (146652916 0 0) 0x7f0814001110 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.766671 7f08227fc700  2
>> >> RGWDataChangesLog::ChangesRenewThread: start
>> >> 2016-12-22 17:36:46.766691 7f084beeb9c0 20 get_system_obj_state:
>> >> rctx=0x7ffec2850d00 obj=.rgw.root:default.realm state=0x7f084c8efdf8
>> >> s->prefetch_data=0
>> >> 2016-12-22 17:36:46.766750 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> <== mon.0 39.0.16.7:6789/0 6  osd_map(9506..9506 src has 8863..9506) 
>> >> v3
>> >>  66915+0+0 (689048617 0 0) 0x7f0814011680 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.767029 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520
>> >> --> 39.0.16.7:6789/0 -- mon_get_version(what=osdmap handle=1) v1 -- ?+0
>> >> 0x7f084c8f05f0 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.767163 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> <== mon.0 39.0.16.7:6789/0 7  mon_get_version_reply(handle=1
>> >> version=9506) v2  24+0+0 (2817198406 0 0) 0x7f0814001110 con
>> >> 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.767214 7f084beeb9c0 20 get_system_obj_state:
>> >> rctx=0x7ffec2850210 obj=.rgw.root:default.realm state=0x7f084c8efdf8
>> >> s->prefetch_data=0
>> >> 2016-12-22 17:36:46.767231 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520
>> >> --> 39.0.16.7:6789/0 -- mon_get_version(what=osdmap handle=2) v1 -- ?+0
>> >> 0x7f084c8f0ac0 con 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.767341 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
>> >> <== mon.0 39.0.16.7:6789/0 8  mon_get_version_reply(handle=2
>> >> version=9506) v2  24+0+0 (1826043941 0 0) 0x7f0814001110 con
>> >> 0x7f084c8e9480
>> >> 2016-12-22 17:36:46.767367 7f084beeb9c0 10 could not read realm id: (2)
>> >> No such file or directory
>> >> 2016-12-22 17:36:46.767390 7f084beeb9

Re: [ceph-users] Storage system

2017-01-04 Thread Chris Jones
Based on this limited info, Object storage if behind proxy. We use Ceph
behind HAProxy and hardware load-balancers at Bloomberg. Our Chef recipes
are at https://github.com/ceph/ceph-chef and
https://github.com/bloomberg/chef-bcs. The chef-bcs cookbooks show the
HAProxy info.

Thanks,
Chris

On Wed, Jan 4, 2017 at 11:51 AM, Patrick McGarry 
wrote:

> Moving this to ceph-user list where it'll get some attention.
>
> On Thu, Dec 22, 2016 at 2:08 PM, SIBALA, SATISH  wrote:
>
>> Hi,
>>
>>
>>
>> Could you please give me an recommendation on kind of Ceph storage to be
>> used with NGINX proxy server (Object / Block / FileSystem)?
>>
>>
>>
>> Best Regards
>>
>> Satish
>>
>> [image: cid:image001.png@01D1D904.36628D60]
>>
>>
>>
>
>
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Best Regards,
Chris Jones

cjo...@cloudm2.com
(p) 770.655.0770
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Estimate Max IOPS of Cluster

2017-01-04 Thread John Petrini
Thank you both for the tools an suggestions. I expected the response "there
are many variables" but this gives me a place to start in determining what
our configuration is capable of.

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter]    [image: LinkedIn]
   [image: Google Plus]
   [image: Blog]

Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]


The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Wed, Jan 4, 2017 at 11:58 AM, Maged Mokhtar  wrote:

>
> if you are asking about what tools to use:
> http://tracker.ceph.com/projects/ceph/wiki/Benchmark_
> Ceph_Cluster_Performance
>
> You should run many concurrent processes on different clients
>
>
> *From:* Maged Mokhtar 
> *Sent:* Wednesday, January 04, 2017 6:45 PM
> *To:* John Petrini  ; ceph-users
> 
> *Subject:* Re: [ceph-users] Estimate Max IOPS of Cluster
>
>
> Max iops  depends on the hardware type/configuration for disks/cpu/network.
>
> For disks, the theoretical iops limit is
> read  = physical disk iops x number of disks
> write (with journal on same disk) = physical disk iops x number of disks /
> num of replicas / 3
> in practice real benchmarks will vary widely from this, I've seen numbers
> from 30 to 80 % of theoretical value.
>
> When the number of disks/cpu cores is high, the cpu bottleneck kicks in,
> again it depends on hardware but you could use a performance tool such as
> atop to know when this happens on your setup. There is no theoretical
> measure of this, but one good analysis i find is Nick Fisk:
> http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/
>
>
> Cheers
> /Maged
>
> *From:* John Petrini 
> *Sent:* Tuesday, January 03, 2017 10:15 PM
> *To:* ceph-users 
> *Subject:* [ceph-users] Estimate Max IOPS of Cluster
>
> Hello,
>
> Does any one have a reasonably accurate way to determine the max IOPS of a
> Ceph cluster?
>
> Thank You,
>
> ___
>
> John Petrini
>
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw setup issue

2017-01-04 Thread Brian Andrus
Regardless of whether it worked before, have you verified your RadosGWs
have write access to monitors? They will need it if you want the RadosGW to
create its own pools.

ceph auth get 

On Wed, Jan 4, 2017 at 8:59 AM, Kamble, Nitin A 
wrote:

>
> > On Dec 26, 2016, at 2:48 AM, Orit Wasserman  wrote:
> >
> > On Fri, Dec 23, 2016 at 3:42 AM, Kamble, Nitin A
> >  wrote:
> >> I am trying to setup radosgw on a ceph cluster, and I am seeing some
> issues where google is not helping. I hope some of the developers would be
> able to help here.
> >>
> >>
> >> I tried to create radosgw as mentioned here [0] on a jewel cluster. And
> it gives the following error in log file after starting radosgw.
> >>
> >>
> >> 2016-12-22 17:36:46.755786 7f084beeb9c0  0 set uid:gid to 167:167
> (ceph:ceph)
> >> 2016-12-22 17:36:46.755849 7f084beeb9c0  0 ceph version
> 10.2.2-118-g894a5f8 (894a5f8d878d4b267f80b90a4bffce157f2b4ba7), process
> radosgw, pid 10092
> >> 2016-12-22 17:36:46.763821 7f084beeb9c0  1 -- :/0 messenger.start
> >> 2016-12-22 17:36:46.764731 7f084beeb9c0  1 -- :/1011033520 -->
> 39.0.16.7:6789/0 -- auth(proto 0 40 bytes epoch 0) v1 -- ?+0
> 0x7f084c8e9f60 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.765055 7f084beda700  1 -- 39.0.16.9:0/1011033520
> learned my addr 39.0.16.9:0/1011033520
> >> 2016-12-22 17:36:46.765492 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 1  mon_map magic: 0 v1  195+0+0
> (146652916 0 0) 0x7f0814000a60 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.765562 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 2  auth_reply(proto 2 0 (0) Success) v1
>  33+0+0 (1206278719 0 0) 0x7f0814000ee0 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.765697 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> --> 39.0.16.7:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> 0x7f08180013b0 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.765968 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 3  auth_reply(proto 2 0 (0) Success) v1
>  222+0+0 (4230455906 0 0) 0x7f0814000ee0 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.766053 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> --> 39.0.16.7:6789/0 -- auth(proto 2 181 bytes epoch 0) v1 -- ?+0
> 0x7f0818001830 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.766315 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 4  auth_reply(proto 2 0 (0) Success) v1
>  425+0+0 (3179848142 0 0) 0x7f0814001180 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.766383 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> --> 39.0.16.7:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0
> 0x7f084c8ea440 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.766452 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520
> --> 39.0.16.7:6789/0 -- mon_subscribe({osdmap=0}) v2 -- ?+0
> 0x7f084c8ea440 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.766518 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 5  mon_map magic: 0 v1  195+0+0
> (146652916 0 0) 0x7f0814001110 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.766671 7f08227fc700  2 
> >> RGWDataChangesLog::ChangesRenewThread:
> start
> >> 2016-12-22 17:36:46.766691 7f084beeb9c0 20 get_system_obj_state:
> rctx=0x7ffec2850d00 obj=.rgw.root:default.realm state=0x7f084c8efdf8
> s->prefetch_data=0
> >> 2016-12-22 17:36:46.766750 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 6  osd_map(9506..9506 src has 8863..9506)
> v3  66915+0+0 (689048617 0 0) 0x7f0814011680 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.767029 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520
> --> 39.0.16.7:6789/0 -- mon_get_version(what=osdmap handle=1) v1 -- ?+0
> 0x7f084c8f05f0 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.767163 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 7  mon_get_version_reply(handle=1
> version=9506) v2  24+0+0 (2817198406 0 0) 0x7f0814001110 con
> 0x7f084c8e9480
> >> 2016-12-22 17:36:46.767214 7f084beeb9c0 20 get_system_obj_state:
> rctx=0x7ffec2850210 obj=.rgw.root:default.realm state=0x7f084c8efdf8
> s->prefetch_data=0
> >> 2016-12-22 17:36:46.767231 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520
> --> 39.0.16.7:6789/0 -- mon_get_version(what=osdmap handle=2) v1 -- ?+0
> 0x7f084c8f0ac0 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.767341 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 8  mon_get_version_reply(handle=2
> version=9506) v2  24+0+0 (1826043941 0 0) 0x7f0814001110 con
> 0x7f084c8e9480
> >> 2016-12-22 17:36:46.767367 7f084beeb9c0 10 could not read realm id: (2)
> No such file or directory
> >> 2016-12-22 17:36:46.767390 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520
> --> 39.0.16.7:6789/0 -- mon_get_version(what=osdmap handle=3) v1 -- ?+0
> 0x7f084c8efe50 con 0x7f084c8e9480
> >> 2016-12-22 17:36:46.767496 7f082a7fc700  1 -- 39.0.16.9:0/1011033520
> <== mon.0 39.0.16.7:6789/0 9  mon_get_version_reply(handle=3
> version=9506) v2  24+0+0 (3600349867 0 0) 0x7f0814001110 con
> 0x7f084c8

Re: [ceph-users] radosgw setup issue

2017-01-04 Thread Kamble, Nitin A

> On Dec 26, 2016, at 2:48 AM, Orit Wasserman  wrote:
> 
> On Fri, Dec 23, 2016 at 3:42 AM, Kamble, Nitin A
>  wrote:
>> I am trying to setup radosgw on a ceph cluster, and I am seeing some issues 
>> where google is not helping. I hope some of the developers would be able to 
>> help here.
>> 
>> 
>> I tried to create radosgw as mentioned here [0] on a jewel cluster. And it 
>> gives the following error in log file after starting radosgw.
>> 
>> 
>> 2016-12-22 17:36:46.755786 7f084beeb9c0  0 set uid:gid to 167:167 (ceph:ceph)
>> 2016-12-22 17:36:46.755849 7f084beeb9c0  0 ceph version 10.2.2-118-g894a5f8 
>> (894a5f8d878d4b267f80b90a4bffce157f2b4ba7), process radosgw, pid 10092
>> 2016-12-22 17:36:46.763821 7f084beeb9c0  1 -- :/0 messenger.start
>> 2016-12-22 17:36:46.764731 7f084beeb9c0  1 -- :/1011033520 --> 
>> 39.0.16.7:6789/0 -- auth(proto 0 40 bytes epoch 0) v1 -- ?+0 0x7f084c8e9f60 
>> con 0x7f084c8e9480
>> 2016-12-22 17:36:46.765055 7f084beda700  1 -- 39.0.16.9:0/1011033520 learned 
>> my addr 39.0.16.9:0/1011033520
>> 2016-12-22 17:36:46.765492 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 1  mon_map magic: 0 v1  195+0+0 (146652916 0 
>> 0) 0x7f0814000a60 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.765562 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 2  auth_reply(proto 2 0 (0) Success) v1  
>> 33+0+0 (1206278719 0 0) 0x7f0814000ee0 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.765697 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 --> 
>> 39.0.16.7:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7f08180013b0 
>> con 0x7f084c8e9480
>> 2016-12-22 17:36:46.765968 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 3  auth_reply(proto 2 0 (0) Success) v1  
>> 222+0+0 (4230455906 0 0) 0x7f0814000ee0 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.766053 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 --> 
>> 39.0.16.7:6789/0 -- auth(proto 2 181 bytes epoch 0) v1 -- ?+0 0x7f0818001830 
>> con 0x7f084c8e9480
>> 2016-12-22 17:36:46.766315 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 4  auth_reply(proto 2 0 (0) Success) v1  
>> 425+0+0 (3179848142 0 0) 0x7f0814001180 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.766383 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 --> 
>> 39.0.16.7:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f084c8ea440 con 
>> 0x7f084c8e9480
>> 2016-12-22 17:36:46.766452 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520 --> 
>> 39.0.16.7:6789/0 -- mon_subscribe({osdmap=0}) v2 -- ?+0 0x7f084c8ea440 con 
>> 0x7f084c8e9480
>> 2016-12-22 17:36:46.766518 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 5  mon_map magic: 0 v1  195+0+0 (146652916 0 
>> 0) 0x7f0814001110 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.766671 7f08227fc700  2 
>> RGWDataChangesLog::ChangesRenewThread: start
>> 2016-12-22 17:36:46.766691 7f084beeb9c0 20 get_system_obj_state: 
>> rctx=0x7ffec2850d00 obj=.rgw.root:default.realm state=0x7f084c8efdf8 
>> s->prefetch_data=0
>> 2016-12-22 17:36:46.766750 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 6  osd_map(9506..9506 src has 8863..9506) v3  
>> 66915+0+0 (689048617 0 0) 0x7f0814011680 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.767029 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520 --> 
>> 39.0.16.7:6789/0 -- mon_get_version(what=osdmap handle=1) v1 -- ?+0 
>> 0x7f084c8f05f0 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.767163 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 7  mon_get_version_reply(handle=1 version=9506) 
>> v2  24+0+0 (2817198406 0 0) 0x7f0814001110 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.767214 7f084beeb9c0 20 get_system_obj_state: 
>> rctx=0x7ffec2850210 obj=.rgw.root:default.realm state=0x7f084c8efdf8 
>> s->prefetch_data=0
>> 2016-12-22 17:36:46.767231 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520 --> 
>> 39.0.16.7:6789/0 -- mon_get_version(what=osdmap handle=2) v1 -- ?+0 
>> 0x7f084c8f0ac0 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.767341 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 8  mon_get_version_reply(handle=2 version=9506) 
>> v2  24+0+0 (1826043941 0 0) 0x7f0814001110 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.767367 7f084beeb9c0 10 could not read realm id: (2) No 
>> such file or directory
>> 2016-12-22 17:36:46.767390 7f084beeb9c0  1 -- 39.0.16.9:0/1011033520 --> 
>> 39.0.16.7:6789/0 -- mon_get_version(what=osdmap handle=3) v1 -- ?+0 
>> 0x7f084c8efe50 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.767496 7f082a7fc700  1 -- 39.0.16.9:0/1011033520 <== 
>> mon.0 39.0.16.7:6789/0 9  mon_get_version_reply(handle=3 version=9506) 
>> v2  24+0+0 (3600349867 0 0) 0x7f0814001110 con 0x7f084c8e9480
>> 2016-12-22 17:36:46.767518 7f084beeb9c0 10 failed to list objects 
>> pool_iterate_begin() returned r=-2
>> 2016-12-22 17:36:46.767542 7f084beeb9c0 20 get_system_obj_state: 
>> rctx=0x7ffec2850420 obj=.rgw.root:zone_names.defau

Re: [ceph-users] Estimate Max IOPS of Cluster

2017-01-04 Thread Maged Mokhtar

if you are asking about what tools to use:
http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance

You should run many concurrent processes on different clients 



From: Maged Mokhtar 
Sent: Wednesday, January 04, 2017 6:45 PM
To: John Petrini ; ceph-users 
Subject: Re: [ceph-users] Estimate Max IOPS of Cluster



Max iops  depends on the hardware type/configuration for disks/cpu/network.

For disks, the theoretical iops limit is 
read  = physical disk iops x number of disks
write (with journal on same disk) = physical disk iops x number of disks / num 
of replicas / 3
in practice real benchmarks will vary widely from this, I've seen numbers from 
30 to 80 % of theoretical value.

When the number of disks/cpu cores is high, the cpu bottleneck kicks in, again 
it depends on hardware but you could use a performance tool such as atop to 
know when this happens on your setup. There is no theoretical measure of this, 
but one good analysis i find is Nick Fisk:
http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/


Cheers
/Maged



From: John Petrini 
Sent: Tuesday, January 03, 2017 10:15 PM
To: ceph-users 
Subject: [ceph-users] Estimate Max IOPS of Cluster


Hello, 


Does any one have a reasonably accurate way to determine the max IOPS of a Ceph 
cluster?


Thank You,

___


John Petrini






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Storage system

2017-01-04 Thread Patrick McGarry
Moving this to ceph-user list where it'll get some attention.

On Thu, Dec 22, 2016 at 2:08 PM, SIBALA, SATISH  wrote:

> Hi,
>
>
>
> Could you please give me an recommendation on kind of Ceph storage to be
> used with NGINX proxy server (Object / Block / FileSystem)?
>
>
>
> Best Regards
>
> Satish
>
> [image: cid:image001.png@01D1D904.36628D60]
>
>
>



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephalocon Sponsorships Open

2017-01-04 Thread Patrick McGarry
Hey Wes,

We'd love to have you guys. I'll send out another note once we open
the CFP though, this is just for those who wish to sponsor to help
make it happen. Thanks for your interest, and keep an eye out for the
CFP! :)


On Thu, Dec 22, 2016 at 2:16 PM, Wes Dillingham
 wrote:
> I / my group / our organization would be interested in discussing our
> deployment of Ceph and how we are using it, deploying it, future plans etc.
> This sounds like an exciting event. We look forward to hearing more details.
>
> On Thu, Dec 22, 2016 at 1:44 PM, Patrick McGarry 
> wrote:
>>
>> Hey cephers,
>>
>> Just letting you know that we're opening the flood gates for
>> sponsorship opportunities at Cephalocon next year (23-25 Aug 2017,
>> Boston, MA). If you would be interested in sponsoring/exhibiting at
>> our inaugural Ceph conference, please drop me a line. Thanks!
>>
>>
>> --
>>
>> Best Regards,
>>
>> Patrick McGarry
>> Director Ceph Community || Red Hat
>> http://ceph.com  ||  http://community.redhat.com
>> @scuttlemonkey || @ceph
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Respectfully,
>
> Wes Dillingham
> wes_dilling...@harvard.edu
> Research Computing | Infrastructure Engineer
> Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 210
>



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Estimate Max IOPS of Cluster

2017-01-04 Thread Maged Mokhtar

Max iops  depends on the hardware type/configuration for disks/cpu/network.

For disks, the theoretical iops limit is 
read  = physical disk iops x number of disks
write (with journal on same disk) = physical disk iops x number of disks / num 
of replicas / 3
in practice real benchmarks will vary widely from this, I've seen numbers from 
30 to 80 % of theoretical value.

When the number of disks/cpu cores is high, the cpu bottleneck kicks in, again 
it depends on hardware but you could use a performance tool such as atop to 
know when this happens on your setup. There is no theoretical measure of this, 
but one good analysis i find is Nick Fisk:
http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/


Cheers
/Maged



From: John Petrini 
Sent: Tuesday, January 03, 2017 10:15 PM
To: ceph-users 
Subject: [ceph-users] Estimate Max IOPS of Cluster


Hello, 


Does any one have a reasonably accurate way to determine the max IOPS of a Ceph 
cluster?


Thank You,

___


John Petrini






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Tonight's CDM Cancelled

2017-01-04 Thread Patrick McGarry
Hey cephers,

Given the number of devs still out on holiday I am cancelling the Ceph
Developer Monthly call that was slated for 9p EST tonight. Sorry for
the short notice. We'll see you in a month at the 12p time slot.
Thanks.


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client.admin accidently removed caps/permissions

2017-01-04 Thread Jim Kilborn
Disregard,   you can fix this by doing using the monitor id and keyring file:



cd /var/lib/ceph/mon/monname

ceph -n mon. --keyring keyring  auth caps client.admin mds 'allow *' osd 'allow 
*' mon 'allow *'





Sent from Mail for Windows 10



From: Jim Kilborn
Sent: Wednesday, January 4, 2017 9:19 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] client.admin accidently removed caps/permissions



Hello:

I was trying to fix an problem with mds caps, and caused my admin user to have 
no mon caps.
I ran:

ceph auth caps client.admin mds 'allow *'

I didn’t realize I had to pass the mon and osd caps as well. Now, when I try to 
run any command, I get

2017-01-04 08:58:44.009250 7f5441f62700  0 librados: client.admin 
authentication error (13) Permission denied
Error connecting to cluster: PermissionDeniedError

What is the simpliest way to get my client.admin caps/permissions fixed?


Sent from Mail for Windows 10

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate cephfs metadata to SSD in running cluster

2017-01-04 Thread Mike Miller

Wido, all,

can you point me to the "recent benchmarks" so I can have a look?
How do you define "performance"? I would not expect cephFS throughput to 
change, but it is surprising to me that metadata on SSD will have no 
measurable effect on latency.


- mike

On 1/3/17 10:49 AM, Wido den Hollander wrote:



Op 3 januari 2017 om 2:49 schreef Mike Miller :


will metadata on SSD improve latency significantly?



No, as I said in my previous e-mail, recent benchmarks showed that storing 
CephFS metadata on SSD does not improve performance.

It still might be good to do since it's not that much data thus recovery will 
go quickly, but don't expect a CephFS performance improvement.

Wido


Mike

On 1/2/17 11:50 AM, Wido den Hollander wrote:



Op 2 januari 2017 om 10:33 schreef Shinobu Kinjo :


I've never done migration of cephfs_metadata from spindle disks to
ssds. But logically you could achieve this through 2 phases.

 #1 Configure CRUSH rule including spindle disks and ssds
 #2 Configure CRUSH rule for just pointing to ssds
  * This would cause massive data shuffling.


Not really, usually the CephFS metadata isn't that much data.

Recent benchmarks (can't find them now) show that storing CephFS metadata on 
SSD doesn't really improve performance though.

Wido




On Mon, Jan 2, 2017 at 2:36 PM, Mike Miller  wrote:

Hi,

Happy New Year!

Can anyone point me to specific walkthrough / howto instructions how to move
cephfs metadata to SSD in a running cluster?

How is crush to be modified step by step such that the metadata migrate to
SSD?

Thanks and regards,

Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] client.admin accidently removed caps/permissions

2017-01-04 Thread Jim Kilborn
Hello:

I was trying to fix an problem with mds caps, and caused my admin user to have 
no mon caps.
I ran:

ceph auth caps client.admin mds 'allow *'

I didn’t realize I had to pass the mon and osd caps as well. Now, when I try to 
run any command, I get

2017-01-04 08:58:44.009250 7f5441f62700  0 librados: client.admin 
authentication error (13) Permission denied
Error connecting to cluster: PermissionDeniedError

What is the simpliest way to get my client.admin caps/permissions fixed?


Sent from Mail for Windows 10

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph pg active+clean+inconsistent

2017-01-04 Thread Andras Pataki

# ceph pg debug unfound_objects_exist
FALSE

Andras

On 01/03/2017 11:38 PM, Shinobu Kinjo wrote:

Would you run:

  # ceph pg debug unfound_objects_exist

On Wed, Jan 4, 2017 at 5:31 AM, Andras Pataki
 wrote:

Here is the output of ceph pg query for one of hte active+clean+inconsistent
PGs:

{
 "state": "active+clean+inconsistent",
 "snap_trimq": "[]",
 "epoch": 342982,
 "up": [
 319,
 90,
 51
 ],
 "acting": [
 319,
 90,
 51
 ],
 "actingbackfill": [
 "51",
 "90",
 "319"
 ],
 "info": {
 "pgid": "6.92c",
 "last_update": "342982'41304",
 "last_complete": "342982'41304",
 "log_tail": "342980'38259",
 "last_user_version": 41304,
 "last_backfill": "MAX",
 "last_backfill_bitwise": 0,
 "purged_snaps": "[]",
 "history": {
 "epoch_created": 262553,
 "last_epoch_started": 342598,
 "last_epoch_clean": 342613,
 "last_epoch_split": 0,
 "last_epoch_marked_full": 0,
 "same_up_since": 342596,
 "same_interval_since": 342597,
 "same_primary_since": 342597,
 "last_scrub": "342982'41177",
 "last_scrub_stamp": "2017-01-02 18:19:48.081750",
 "last_deep_scrub": "342965'37465",
 "last_deep_scrub_stamp": "2016-12-20 16:31:06.438823",
 "last_clean_scrub_stamp": "2016-12-11 12:51:19.258816"
 },
 "stats": {
 "version": "342982'41304",
 "reported_seq": "43600",
 "reported_epoch": "342982",
 "state": "active+clean+inconsistent",
 "last_fresh": "2017-01-03 15:27:15.075176",
 "last_change": "2017-01-02 18:19:48.081806",
 "last_active": "2017-01-03 15:27:15.075176",
 "last_peered": "2017-01-03 15:27:15.075176",
 "last_clean": "2017-01-03 15:27:15.075176",
 "last_became_active": "2016-11-01 16:21:23.328639",
 "last_became_peered": "2016-11-01 16:21:23.328639",
 "last_unstale": "2017-01-03 15:27:15.075176",
 "last_undegraded": "2017-01-03 15:27:15.075176",
 "last_fullsized": "2017-01-03 15:27:15.075176",
 "mapping_epoch": 342596,
 "log_start": "342980'38259",
 "ondisk_log_start": "342980'38259",
 "created": 262553,
 "last_epoch_clean": 342613,
 "parent": "0.0",
 "parent_split_bits": 0,
 "last_scrub": "342982'41177",
 "last_scrub_stamp": "2017-01-02 18:19:48.081750",
 "last_deep_scrub": "342965'37465",
 "last_deep_scrub_stamp": "2016-12-20 16:31:06.438823",
 "last_clean_scrub_stamp": "2016-12-11 12:51:19.258816",
 "log_size": 3045,
 "ondisk_log_size": 3045,
 "stats_invalid": false,
 "dirty_stats_invalid": false,
 "omap_stats_invalid": false,
 "hitset_stats_invalid": false,
 "hitset_bytes_stats_invalid": false,
 "pin_stats_invalid": true,
 "stat_sum": {
 "num_bytes": 16929346269,
 "num_objects": 4881,
 "num_object_clones": 0,
 "num_object_copies": 14643,
 "num_objects_missing_on_primary": 0,
 "num_objects_missing": 0,
 "num_objects_degraded": 0,
 "num_objects_misplaced": 0,
 "num_objects_unfound": 0,
 "num_objects_dirty": 4881,
 "num_whiteouts": 0,
 "num_read": 7592,
 "num_read_kb": 19593996,
 "num_write": 42541,
 "num_write_kb": 47306915,
 "num_scrub_errors": 1,
 "num_shallow_scrub_errors": 1,
 "num_deep_scrub_errors": 0,
 "num_objects_recovered": 5807,
 "num_bytes_recovered": 22691211916,
 "num_keys_recovered": 0,
 "num_objects_omap": 0,
 "num_objects_hit_set_archive": 0,
 "num_bytes_hit_set_archive": 0,
 "num_flush": 0,
 "num_flush_kb": 0,
 "num_evict": 0,
 "num_evict_kb": 0,
 "num_promote": 0,
 "num_flush_mode_high": 0,
 "num_flush_mode_low": 0,
 "num_evict_mode_some": 0,
 "num_evict_mode_full": 0,
 "num_objects_pinned": 0
 },
 "up": [
 319,
 90,
 51
 ],
 "acting": [
 319,
 90,
 51
 ],
 "blocked_by": [],
 "up_primary": 319,
 "acting_primary": 319

Re: [ceph-users] performance with/without dmcrypt OSD

2017-01-04 Thread M Ranga Swami Reddy
On Tue, Jan 3, 2017 at 10:31 PM, Graham Allan  wrote:

> We did some CBT tests here a few months ago which included some dmcrypt
> comparisons - the performance hit was non-zero, but close enough, around
> ~2-3%.
>
> (CentOS 7.2 with E5-2630 v4 cpus, jewel release, default dmcrypt
> parameters which IIRC is AES-XTS now)



Is it possible to share the CBT results? just to check.

Thanks
Swami





>
>

> On 1/3/2017 7:48 AM, Adrien Gillard wrote:
>
>> There has been talks on the subject in the mailing list before [1] which
>> concur with Nick's experience as long as you use AES-XTS.
>>
>>
>> [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-
>> March/008444.html
>>
>> On Tue, Jan 3, 2017 at 2:30 PM, Nick Fisk > > wrote:
>>
>> __ __
>>
>> __ __
>>
>> *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com
>> ] *On Behalf Of *Kent Borg
>> *Sent:* 03 January 2017 12:47
>> *To:* M Ranga Swami Reddy > >
>> *Cc:* ceph-users > >
>> *Subject:* Re: [ceph-users] performance with/without dmcrypt OSD
>>
>> __ __
>>
>> On 01/03/2017 06:42 AM, M Ranga Swami Reddy wrote:
>>
>> 
>>
>> On Tue, Jan 3, 2017 at 6:17 AM, Kent Borg > > wrote:
>>
>> __ __
>>
>> Assuming I am understanding the question...
>>
>> If there isn't too big a performance hit, it makes disk
>> disposal (we expect disks to die, right?) much simpler.
>>
>> __ __
>>
>> __ __
>>
>> OK. Thanks. But if I have a big volumes in TB size (10 TB
>> volume)  and writing/reading from the big volumes - will impact
>> on performance like write and read speed?
>>
>>
>> I'd like to know, too.
>>
>> -kb
>>
>> __ __
>>
>> Not specifically related to Ceph, but I built a 14 disk RAID 6 array
>> (mdadm) for a recent “secure high performance seeding device in a
>> briefcase” project and used dmcrypt on it. I could easily obtain
>> over 1GB/s reads and writes. From tests there was no noticeable
>> performance impact and CPU usage on a Xeon E3 was nothing to be
>> concerned about. All modern CPU’s will HW accelerate the process if
>> you use the AES-XTS cipher, I suspect there might be a severe
>> performance impact without.
>>
>> __ __
>>
>> Also as Ceph+network itself brings a fair amount of overhead, I
>> wouldn’t suspect that dmcrypt would introduce any noticeable
>> overhead of its own.
>>
>>
> --
> Graham Allan
> Minnesota Supercomputing Institute - g...@umn.edu
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance with/without dmcrypt OSD

2017-01-04 Thread M Ranga Swami Reddy
Thanks for the link.

On Tue, Jan 3, 2017 at 7:18 PM, Adrien Gillard 
wrote:

> There has been talks on the subject in the mailing list before [1] which
> concur with Nick's experience as long as you use AES-XTS.
>
>
> [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2016-March/008444.html
>
> On Tue, Jan 3, 2017 at 2:30 PM, Nick Fisk  wrote:
>
>>
>>
>>
>>
>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
>> Of *Kent Borg
>> *Sent:* 03 January 2017 12:47
>> *To:* M Ranga Swami Reddy 
>> *Cc:* ceph-users 
>> *Subject:* Re: [ceph-users] performance with/without dmcrypt OSD
>>
>>
>>
>> On 01/03/2017 06:42 AM, M Ranga Swami Reddy wrote:
>>
>> On Tue, Jan 3, 2017 at 6:17 AM, Kent Borg  wrote:
>>
>>
>>
>> Assuming I am understanding the question...
>>
>> If there isn't too big a performance hit, it makes disk disposal (we
>> expect disks to die, right?) much simpler.
>>
>>
>>
>>
>>
>> OK. Thanks. But if I have a big volumes in TB size (10 TB volume)  and
>> writing/reading from the big volumes - will impact on performance
>> like write and read speed?
>>
>>
>> I'd like to know, too.
>>
>> -kb
>>
>>
>>
>> Not specifically related to Ceph, but I built a 14 disk RAID 6 array
>> (mdadm) for a recent “secure high performance seeding device in a
>> briefcase” project and used dmcrypt on it. I could easily obtain over 1GB/s
>> reads and writes. From tests there was no noticeable performance impact and
>> CPU usage on a Xeon E3 was nothing to be concerned about. All modern CPU’s
>> will HW accelerate the process if you use the AES-XTS cipher, I suspect
>> there might be a severe performance impact without.
>>
>>
>>
>> Also as Ceph+network itself brings a fair amount of overhead, I wouldn’t
>> suspect that dmcrypt would introduce any noticeable overhead of its own.
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance with/without dmcrypt OSD

2017-01-04 Thread M Ranga Swami Reddy
Thank you Nick. To summarize - dmcrypt doesn't add notable performance
impact on the ceph cluster.

Thanks
Swami

On Tue, Jan 3, 2017 at 7:00 PM, Nick Fisk  wrote:

>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Kent Borg
> *Sent:* 03 January 2017 12:47
> *To:* M Ranga Swami Reddy 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] performance with/without dmcrypt OSD
>
>
>
> On 01/03/2017 06:42 AM, M Ranga Swami Reddy wrote:
>
> On Tue, Jan 3, 2017 at 6:17 AM, Kent Borg  wrote:
>
>
>
> Assuming I am understanding the question...
>
> If there isn't too big a performance hit, it makes disk disposal (we
> expect disks to die, right?) much simpler.
>
>
>
>
>
> OK. Thanks. But if I have a big volumes in TB size (10 TB volume)  and
> writing/reading from the big volumes - will impact on performance
> like write and read speed?
>
>
> I'd like to know, too.
>
> -kb
>
>
>
> Not specifically related to Ceph, but I built a 14 disk RAID 6 array
> (mdadm) for a recent “secure high performance seeding device in a
> briefcase” project and used dmcrypt on it. I could easily obtain over 1GB/s
> reads and writes. From tests there was no noticeable performance impact and
> CPU usage on a Xeon E3 was nothing to be concerned about. All modern CPU’s
> will HW accelerate the process if you use the AES-XTS cipher, I suspect
> there might be a severe performance impact without.
>
>
>
> Also as Ceph+network itself brings a fair amount of overhead, I wouldn’t
> suspect that dmcrypt would introduce any noticeable overhead of its own.
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Automatic OSD start on Jewel

2017-01-04 Thread Fabian Grünbichler
On Wed, Jan 04, 2017 at 12:55:56PM +0100, Florent B wrote:
> On 01/04/2017 12:18 PM, Fabian Grünbichler wrote:
> > On Wed, Jan 04, 2017 at 12:03:39PM +0100, Florent B wrote:
> >> Hi everyone,
> >>
> >> I have a problem with automatic start of OSDs on Debian Jessie with Ceph
> >> Jewel.
> >>
> >> My osd.0 is using /dev/sda5 for data and /dev/sda2 for journal, it is
> >> listed in ceph-disk list :
> >>
> >> /dev/sda :
> >>  /dev/sda1 other, 21686148-6449-6e6f-744e-656564454649
> >>  /dev/sda3 other, linux_raid_member
> >>  /dev/sda4 other, linux_raid_member
> >>  /dev/sda2 ceph journal, for /dev/sda5
> >>  /dev/sda5 ceph data, active, cluster ceph, osd.0, journal /dev/sda2
> >>
> >> It was created with ceph-disk prepare.
> >>
> >> When I run "ceph-disk activate /dev/sda5", it is mounted and started.
> >>
> >> If I run "systemctl start ceph-disk@/dev/sda5", the same, it's OK. But
> >> this is a service that can't be "enabled" !!
> >>
> >> But on reboot, nothing happen. The only thing which tries to start is
> >> ceph-osd@0 service (enabled by ceph-disk, not me), and of course it
> >> fails because its data is not mounted.
> >>
> >> I think udev rules should do this, but it does not seem to.
> >>
> >>
> >> root@host102:~# sgdisk -i 2 /dev/sda
> >> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> >> Partition unique GUID: D0F4F00F-723D-4DAD-BA2E-93D52EB564C1
> >> First sector: 2048 (at 1024.0 KiB)
> >> Last sector: 9765887 (at 4.7 GiB)
> >> Partition size: 9763840 sectors (4.7 GiB)
> >> Attribute flags: 
> >> Partition name: 'ceph journal'
> >>
> >> root@host102:~# sgdisk -i 5 /dev/sda
> >> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
> >> Partition unique GUID: 5AB4F732-AFBE-4DEA-A4C6-AD290C1302D9
> >> First sector: 123047424 (at 58.7 GiB)
> >> Last sector: 1953459199 (at 931.5 GiB)
> >> Partition size: 1830411776 sectors (872.8 GiB)
> >> Attribute flags: 
> >> Partition name: 'ceph data'
> >>
> >>
> >> Does someone have an idea of what's going on ?
> >>
> >> Thank you.
> >>
> >> Florent
> > are you using the packages from ceph.com? if so, you might be affected
> > by http://tracker.ceph.com/issues/18305 (and
> > http://tracker.ceph.com/issues/17889)
> >
> > did you mask the ceph.service unit generated from the ceph init script?
> >
> > what does "systemctl status '*ceph*'" show? what does "journalctl -b
> > '*ceph*'" show?
> >
> > what happens if you run "ceph-disk activate-all"? (this is what is
> > called last in the init script and will probably trigger mounting of the
> > OSD disk/partition and starting of the ceph-osd@..  service)
> >
> 
> Thank you, that was the problem : I disabled ceph.service unit because I
> thought it was an "old" thing, I didn't knew it is always used.
> Re-enabling it did the trick.
> 
> Isn't it an "old way" of doing things ?
> 

I am not sure if the init script was left on purpose or if nobody
realized that the existing systemd units don't cover all the activation
paths because the init script was forgotten and hides this fact quite
well. I assume the latter ;)

IMHO the current situation is wrong, which is why I filed the bug
(including a proposed fix). especially since the init script actually
starts monitors using systemd-run as transient units instead of via
ceph-mon@XYZ, so on monitor nodes the startup situation can get quite
confusing and racy. so far there hasn't been any feedback - maybe this
thread will help and get some more eyes to look at it..

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High OSD apply latency right after new year (the leap second?)

2017-01-04 Thread Alexandre DERUMIER
yes, 
same here on 3 productions clusters.

no impact, but a nice happy new year alert ;)


Seem that google provide ntp servers to avoid brutal 1 second leap

https://developers.google.com/time/smear


- Mail original -
De: "Craig Chi" 
À: "ceph-users" 
Envoyé: Mercredi 4 Janvier 2017 11:26:21
Objet: [ceph-users] High OSD apply latency right after new year (the leap   
second?)

Hi List, 

Three of our Ceph OSDs got unreasonably high latency right after the first 
second of the new year (2017/01/01 00:00:00 UTC, I have attached the metrics 
and I am in UTC+8 timezone). There is exactly a pg (size=3) just contains these 
3 OSDs. 

The OSD apply latency is usually up to 25 minutes, and I can also see this 
large number randomly when I execute "ceph osd perf" command. But the 3 OSDs 
does not have strange behavior and are performing fine so far. 

I have no idea how "ceph osd perf" is implemented, but does it have relation to 
the leap second this year? Since the cluster is not on production, and the 
developers were all celebrating new year at that time, I can not think of other 
possibilities. 

Do your cluster also get this interestingly unexpected new year's gift too? 
Sincerely, 
Craig Chi 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Automatic OSD start on Jewel

2017-01-04 Thread Fabian Grünbichler
On Wed, Jan 04, 2017 at 12:03:39PM +0100, Florent B wrote:
> Hi everyone,
> 
> I have a problem with automatic start of OSDs on Debian Jessie with Ceph
> Jewel.
> 
> My osd.0 is using /dev/sda5 for data and /dev/sda2 for journal, it is
> listed in ceph-disk list :
> 
> /dev/sda :
>  /dev/sda1 other, 21686148-6449-6e6f-744e-656564454649
>  /dev/sda3 other, linux_raid_member
>  /dev/sda4 other, linux_raid_member
>  /dev/sda2 ceph journal, for /dev/sda5
>  /dev/sda5 ceph data, active, cluster ceph, osd.0, journal /dev/sda2
> 
> It was created with ceph-disk prepare.
> 
> When I run "ceph-disk activate /dev/sda5", it is mounted and started.
> 
> If I run "systemctl start ceph-disk@/dev/sda5", the same, it's OK. But
> this is a service that can't be "enabled" !!
> 
> But on reboot, nothing happen. The only thing which tries to start is
> ceph-osd@0 service (enabled by ceph-disk, not me), and of course it
> fails because its data is not mounted.
> 
> I think udev rules should do this, but it does not seem to.
> 
> 
> root@host102:~# sgdisk -i 2 /dev/sda
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> Partition unique GUID: D0F4F00F-723D-4DAD-BA2E-93D52EB564C1
> First sector: 2048 (at 1024.0 KiB)
> Last sector: 9765887 (at 4.7 GiB)
> Partition size: 9763840 sectors (4.7 GiB)
> Attribute flags: 
> Partition name: 'ceph journal'
> 
> root@host102:~# sgdisk -i 5 /dev/sda
> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
> Partition unique GUID: 5AB4F732-AFBE-4DEA-A4C6-AD290C1302D9
> First sector: 123047424 (at 58.7 GiB)
> Last sector: 1953459199 (at 931.5 GiB)
> Partition size: 1830411776 sectors (872.8 GiB)
> Attribute flags: 
> Partition name: 'ceph data'
> 
> 
> Does someone have an idea of what's going on ?
> 
> Thank you.
> 
> Florent

are you using the packages from ceph.com? if so, you might be affected
by http://tracker.ceph.com/issues/18305 (and
http://tracker.ceph.com/issues/17889)

did you mask the ceph.service unit generated from the ceph init script?

what does "systemctl status '*ceph*'" show? what does "journalctl -b
'*ceph*'" show?

what happens if you run "ceph-disk activate-all"? (this is what is
called last in the init script and will probably trigger mounting of the
OSD disk/partition and starting of the ceph-osd@..  service)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] High OSD apply latency right after new year (the leap second?)

2017-01-04 Thread Craig Chi
Hi List,

Three of our Ceph OSDs got unreasonably high latency right after the first 
second of the new year (2017/01/01 00:00:00 UTC, I have attached the metrics 
and I am in UTC+8 timezone). There is exactly a pg (size=3) just contains these 
3 OSDs.

The OSD apply latency is usually up to 25 minutes, and I can also see this 
large number randomly when I execute "ceph osd perf" command. But the 3 OSDs 
does not have strange behavior and are performing fine so far.

I have no idea how "ceph osd perf" is implemented, but does it have relation to 
the leap second this year? Since the cluster is not on production, and the 
developers were all celebrating new year at that time, I can not think of other 
possibilities.

Do your cluster also get this interestingly unexpected new year's gift too?

Sincerely,
Craig Chi___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Is this a deadlock?

2017-01-04 Thread Shinobu Kinjo
On Wed, Jan 4, 2017 at 6:05 PM, 许雪寒  wrote:
> We've already restarted the OSD successfully.
> Now, we are trying to figure out why the OSD suicide itself

Network issue which causes pretty unstable communication with other
OSDs in same acting set causes suicide usually.

>
> Re: [ceph-users] Is this a deadlock?
>
> Hi, thanks for the quick reply.
>
> We manually deployed this OSD, and it has been running for more than half a 
> year. The output last night should be the latter one that you metioned Last 
> night, one of our switch got some problem and made the OSD unconnected to 
> other peer, which in turn made the monitor to wrongly mark the OSD down.
>
> Thank you:-)
>
>
>
> On Wed, 4 Jan 2017 07:49:03 + 许雪寒 wrote:
>
>> Hi, everyone.
>>
>> Recently in one of our online ceph cluster, one OSD suicided itself after 
>> experiencing some network connectivity problem, and the OSD log is as 
>> follows:
>>
>
> Version of Ceph and all relevant things would help.
> Also "some network connectivity problem" is vague, if it were something like 
> a bad port or overloaded switch you'd think that more than one OSD would be 
> affected.
>
> [snip, I have nothing to comment on that part]
>>
>>
>
>> And by the way, when we first tried to restart OSD who committed suicide 
>> through “/etc/init.d/ceph start osd.619”, an error was reported, and it said 
>> something like “OSD.619 is not found”, which seemed that OSD.619 was never 
>> created in this cluster. We are really confused, please help us.
>>
> How did you create that OSD?
> Manually or with ceph-deploy?
> The fact that you're trying to use a SYS-V initscript suggests both and older 
> Ceph version and OS and thus more likely a manual install.
>
> In which case that OSD needs to be defined in ceph.conf on that node.
> Full output of that error message would have told us these things, like:
> ---
> root@ceph-04:~# /etc/init.d/ceph start osd.444
> /etc/init.d/ceph: osd.444 not found (/etc/ceph/ceph.conf defines mon.ceph-04 
> osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24 , /var/lib/ceph 
> defines mon.ceph-04 osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24)
> ---
> The above is the output from a Hammer cluster with OSDs deployed with 
> ceph-deploy.
> And incidentally the "ceph.conf" part of the output is a blatant lie and just 
> a repetition of what it gathered from /var/lib/ceph.
>
> This is a Hammer cluster with manually deployed OSDs:
> ---
> engtest03:~# /etc/init.d/ceph start osd.33
> /etc/init.d/ceph: osd.33 not found (/etc/ceph/ceph.conf defines mon.engtest03 
> mon.engtest04 mon.engtest05 mon.irt03 mon.irt04 mds.engtest03 osd.20 osd.21 
> osd.22 osd.23, /var/lib/ceph defines )
> ---
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Is this a deadlock?

2017-01-04 Thread 许雪寒
We've already restarted the OSD successfully.
Now, we are trying to figure out why the OSD suicide itself

Re: [ceph-users] Is this a deadlock?

Hi, thanks for the quick reply.

We manually deployed this OSD, and it has been running for more than half a 
year. The output last night should be the latter one that you metioned Last 
night, one of our switch got some problem and made the OSD unconnected to other 
peer, which in turn made the monitor to wrongly mark the OSD down.

Thank you:-)



On Wed, 4 Jan 2017 07:49:03 + 许雪寒 wrote:

> Hi, everyone.
> 
> Recently in one of our online ceph cluster, one OSD suicided itself after 
> experiencing some network connectivity problem, and the OSD log is as follows:
>

Version of Ceph and all relevant things would help.
Also "some network connectivity problem" is vague, if it were something like a 
bad port or overloaded switch you'd think that more than one OSD would be 
affected.

[snip, I have nothing to comment on that part]
> 
> 

> And by the way, when we first tried to restart OSD who committed suicide 
> through “/etc/init.d/ceph start osd.619”, an error was reported, and it said 
> something like “OSD.619 is not found”, which seemed that OSD.619 was never 
> created in this cluster. We are really confused, please help us.
> 
How did you create that OSD?
Manually or with ceph-deploy?
The fact that you're trying to use a SYS-V initscript suggests both and older 
Ceph version and OS and thus more likely a manual install.

In which case that OSD needs to be defined in ceph.conf on that node.
Full output of that error message would have told us these things, like:
---
root@ceph-04:~# /etc/init.d/ceph start osd.444
/etc/init.d/ceph: osd.444 not found (/etc/ceph/ceph.conf defines mon.ceph-04 
osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24 , /var/lib/ceph defines 
mon.ceph-04 osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24)
---
The above is the output from a Hammer cluster with OSDs deployed with 
ceph-deploy.
And incidentally the "ceph.conf" part of the output is a blatant lie and just a 
repetition of what it gathered from /var/lib/ceph.

This is a Hammer cluster with manually deployed OSDs:
---
engtest03:~# /etc/init.d/ceph start osd.33
/etc/init.d/ceph: osd.33 not found (/etc/ceph/ceph.conf defines mon.engtest03 
mon.engtest04 mon.engtest05 mon.irt03 mon.irt04 mds.engtest03 osd.20 osd.21 
osd.22 osd.23, /var/lib/ceph defines )
---

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: Is this a deadlock?

2017-01-04 Thread 许雪寒
Hi, thanks for the quick reply.

We manually deployed this OSD, and it has been running for more than half a 
year. The output last night should be the latter one that you metioned
Last night, one of our switch got some problem and made the OSD unconnected to 
other peer, which in turn made the monitor to wrongly mark the OSD down.

Thank you:-)



On Wed, 4 Jan 2017 07:49:03 + 许雪寒 wrote:

> Hi, everyone.
> 
> Recently in one of our online ceph cluster, one OSD suicided itself after 
> experiencing some network connectivity problem, and the OSD log is as follows:
>

Version of Ceph and all relevant things would help.
Also "some network connectivity problem" is vague, if it were something like a 
bad port or overloaded switch you'd think that more than one OSD would be 
affected.

[snip, I have nothing to comment on that part]
> 
> 

> And by the way, when we first tried to restart OSD who committed suicide 
> through “/etc/init.d/ceph start osd.619”, an error was reported, and it said 
> something like “OSD.619 is not found”, which seemed that OSD.619 was never 
> created in this cluster. We are really confused, please help us.
> 
How did you create that OSD?
Manually or with ceph-deploy?
The fact that you're trying to use a SYS-V initscript suggests both and older 
Ceph version and OS and thus more likely a manual install.

In which case that OSD needs to be defined in ceph.conf on that node.
Full output of that error message would have told us these things, like:
---
root@ceph-04:~# /etc/init.d/ceph start osd.444
/etc/init.d/ceph: osd.444 not found (/etc/ceph/ceph.conf defines mon.ceph-04 
osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24 , /var/lib/ceph defines 
mon.ceph-04 osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24)
---
The above is the output from a Hammer cluster with OSDs deployed with 
ceph-deploy.
And incidentally the "ceph.conf" part of the output is a blatant lie and just a 
repetition of what it gathered from /var/lib/ceph.

This is a Hammer cluster with manually deployed OSDs:
---
engtest03:~# /etc/init.d/ceph start osd.33
/etc/init.d/ceph: osd.33 not found (/etc/ceph/ceph.conf defines mon.engtest03 
mon.engtest04 mon.engtest05 mon.irt03 mon.irt04 mds.engtest03 osd.20 osd.21 
osd.22 osd.23, /var/lib/ceph defines )
---

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph monitor first deployment error

2017-01-04 Thread Gmail
Hi All,

I’m new to Ceph, I’m trying to install Ceph on VMs on my laptop.

I’m running CentOS Linux release 7.3.1611 (Core) with kernel 4.4.39 and Ceph 
10.2.5

I’ve the following config file:

[root@ceph-mon ~]# cat /etc/ceph/ceph.conf
fsid = 6f34b66d-1893-4d4b-8e20-08206525a0a5
mon initial members = ceph-mon
mon host = 192.168.56.101
cluster network = 192.168.56.101/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 2# Write an object n times.
osd pool default min size = 1# Allow writing n copy in a degraded state.
osd pool default pg num = 333
osd pool default pgp num = 333
osd crush chooseleaf type = 1


then I ran the following:

[root@ceph-mon ~]# ceph-authtool --create-keyring /tmp/ceph.mon.keyring 
--gen-key -n mon. --cap mon 'allow *’
creating /tmp/ceph.mon.keyring
[root@ceph-mon ~]# ceph-authtool --create-keyring 
/etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap 
mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
creating /etc/ceph/ceph.client.admin.keyring
[root@ceph-mon ~]# ceph-authtool /tmp/ceph.mon.keyring --import-keyring 
/etc/ceph/ceph.client.admin.keyring
importing contents of /etc/ceph/ceph.client.admin.keyring into 
/tmp/ceph.mon.keyring
[root@ceph-mon ~]# monmaptool --create --add ceph-mon 192.168.56.101 --fsid 
6f34b66d-1893-4d4b-8e20-08206525a0a5 /tmp/monmap
monmaptool: monmap file /tmp/monmap
monmaptool: set fsid to 6f34b66d-1893-4d4b-8e20-08206525a0a5
monmaptool: writing epoch 0 to /tmp/monmap (1 monitors)
[root@ceph-mon ~]# chmod +r /tmp/ceph.mon.keyring
[root@ceph-mon ~]# chmod +r /tmp/monmap
[root@ceph-mon ~]# sudo -u ceph ceph-mon --cluster ceph --mkfs -i ceph-mon 
--monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
ceph-mon: set fsid to 6f34b66d-1893-4d4b-8e20-08206525a0a5
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-ceph-mon for mon.ceph-mon
[root@ceph-mon ~]# touch /var/lib/ceph/mon/ceph-ceph-mon/done
[root@ceph-mon ~]# systemctl start ceph.target


Then I try to verify that the monitor is running, I get this:

[root@ceph-mon ~]# ceph -s
2017-01-03 23:59:10.741193 7ff1c8580700  0 -- :/915241400 >> 
192.168.56.101:6789/0 pipe(0x7ff1c4064290 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x7ff1c405c820).fault
^CTraceback (most recent call last):
  File "/usr/bin/ceph", line 948, in 
retval = main()
  File "/usr/bin/ceph", line 852, in main
prefix='get_command_descriptions')
  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 1300, in 
json_command
raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
RuntimeError: "None": exception "['{"prefix": "get_command_descriptions"}']": 
exception You cannot perform that operation on a Rados object in state 
configuring.


Any clue on what am I doing wrong?!


— Bishoy___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is this a deadlock?

2017-01-04 Thread Christian Balzer
On Wed, 4 Jan 2017 07:49:03 + 许雪寒 wrote:

> Hi, everyone.
> 
> Recently in one of our online ceph cluster, one OSD suicided itself after 
> experiencing some network connectivity problem, and the OSD log is as follows:
>

Version of Ceph and all relevant things would help.
Also "some network connectivity problem" is vague, if it were something
like a bad port or overloaded switch you'd think that more than one OSD
would be affected.

[snip, I have nothing to comment on that part]
> 
> 

> And by the way, when we first tried to restart OSD who committed suicide 
> through “/etc/init.d/ceph start osd.619”, an error was reported, and it said 
> something like “OSD.619 is not found”, which seemed that OSD.619 was never 
> created in this cluster. We are really confused, please help us.
> 
How did you create that OSD?
Manually or with ceph-deploy?
The fact that you're trying to use a SYS-V initscript suggests both and
older Ceph version and OS and thus more likely a manual install.

In which case that OSD needs to be defined in ceph.conf on that node.
Full output of that error message would have told us these things, like:
---
root@ceph-04:~# /etc/init.d/ceph start osd.444
/etc/init.d/ceph: osd.444 not found (/etc/ceph/ceph.conf defines mon.ceph-04 
osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24 , /var/lib/ceph defines 
mon.ceph-04 osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24)
---
The above is the output from a Hammer cluster with OSDs deployed with
ceph-deploy.
And incidentally the "ceph.conf" part of the output is a blatant lie and
just a repetition of what it gathered from /var/lib/ceph.

This is a Hammer cluster with manually deployed OSDs:
---
engtest03:~# /etc/init.d/ceph start osd.33
/etc/init.d/ceph: osd.33 not found (/etc/ceph/ceph.conf defines mon.engtest03 
mon.engtest04 mon.engtest05 mon.irt03 mon.irt04 mds.engtest03 osd.20 osd.21 
osd.22 osd.23, /var/lib/ceph defines )
---

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com