Re: [ceph-users] ceph-mon fails to start on rasberry pi (raspbian 8.0)

2017-12-15 Thread Joao Eduardo Luis

On 12/15/2017 07:03 PM, Andrew Knapp wrote:
Has anyone else tried this and had similar problems?  Any advice on how 
to proceed or work around this issue?


The daemon's log, somewhere in /var/log/ceph/ceph-mon..log, should 
have more info. Upload that somewhere and we'll take a look.


  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple independent rgw instances on same cluster

2017-12-15 Thread David Turner
I have 3 realms running in 2 datacenters. 4 realms in total since 2 of them
are running with multi-site between the datacenters. We have RGW's for each
of the realms that only run for 1 realm each.

Be careful on how many PGs you create for everything as there are a lot of
pools required for this.

On Fri, Dec 15, 2017, 6:16 PM Graham Allan  wrote:

> I'm just looking for a sanity check on this...
>
> I want two separate rados gateways on the same (luminous) cluster to be
> completely independent - separate pools, users, data, no sync.
>
> After some experimentation it seems like the appropriate thing is to set
> them up using separate realms. Does that make sense?
>
> Thanks,
>
> Graham
> --
> Graham Allan
> Minnesota Supercomputing Institute - g...@umn.edu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multiple independent rgw instances on same cluster

2017-12-15 Thread Graham Allan

I'm just looking for a sanity check on this...

I want two separate rados gateways on the same (luminous) cluster to be 
completely independent - separate pools, users, data, no sync.


After some experimentation it seems like the appropriate thing is to set 
them up using separate realms. Does that make sense?


Thanks,

Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Logging pool

2017-12-15 Thread Robin H. Johnson
On Fri, Dec 15, 2017 at 05:21:37PM +, David Turner wrote:
> We're trying to build an auditing system for when a user key pair performs
> an operation on a bucket (put, delete, creating a bucket, etc) and so far
> were only able to find this information in the level 10 debug logging in
> the rgw systems logs.
> 
> We noticed that our rgw log pool has been growing somewhat indefinitely and
> we had to move it off of the nvme's and put it to HDD's due to it's growing
> size.  What is in that pool and how can it be accessed?  I haven't found
> the right terms to search for to find anything about what's in this pool on
> the ML or on Google.
> 
> What I would like to do is export the log to ElasticSearch, cleanup the log
> on occasion, and hopefully find the information we're looking for to
> fulfill our user auditing without having our RGW daemons running on debug
> level 10 (which is a lot of logging!).
I have a terrible solution in HAProxy's Lua that recognizes most S3
operations and spits out UDP/logs based on that.

It's not ideal, has LOTS of drawbacks (mostly in duplication of code,
incl S3 signature stuff).

I'd be very interested in writing useful log data out either in a
difference channel or as part of the HTTP response (key, bucket, object,
operation, actual bytes moved [esp for in-place S3 COPY])

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-15 Thread Cary
James,

You can set these values in ceph.conf.

[global]
...
osd pool default size = 3
osd pool default min size  = 2
...

New pools that are created will use those values.

If you run a "ceph -s"  and look at the "usage" line, it shows how
much space is: 1 used, 2 available, 3 total. ie.

usage:   19465 GB used, 60113 GB / 79578 GB avail

We choose to use Openstack with Ceph in this decade and do the other
things, not because they are easy, but because they are hard...;-p


Cary
-Dynamic

On Fri, Dec 15, 2017 at 10:12 PM, David Turner  wrote:
> In conjunction with increasing the pool size to 3, also increase the pool
> min_size to 2.  `ceph df` and `ceph osd df` will eventually show the full
> size in use in your cluster.  In particular the output of `ceph df` with
> available size in a pool takes into account the pools replication size.
> Continue watching ceph -s or ceph -w to see when the backfilling for your
> change to replication size finishes.
>
> On Fri, Dec 15, 2017 at 5:06 PM James Okken 
> wrote:
>>
>> This whole effort went extremely well, thanks to Cary, and Im not used to
>> that with CEPH so far. (And openstack ever)
>> Thank you Cary.
>>
>> Ive upped the replication factor and now I see "replicated size 3" in each
>> of my pools. Is this the only place to check replication level? Is there a
>> Global setting or only a setting per Pool?
>>
>> ceph osd pool ls detail
>> pool 0 'rbd' replicated size 3..
>> pool 1 'images' replicated size 3...
>> ...
>>
>> One last question!
>> At this replication level how can I tell how much total space I actually
>> have now?
>> Do I just 1/3 the Global size?
>>
>> ceph df
>> GLOBAL:
>> SIZE   AVAIL  RAW USED %RAW USED
>> 13680G 12998G 682G  4.99
>> POOLS:
>> NAMEID USED %USED MAX AVAIL OBJECTS
>> rbd 0 0 0 6448G   0
>> images  1  216G  3.24 6448G   27745
>> backups 2 0 0 6448G   0
>> volumes 3  117G  1.79 6448G   30441
>> compute 4 0 0 6448G   0
>>
>> ceph osd df
>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE VAR  PGS
>>  0 0.81689  1.0   836G 36549M   800G 4.27 0.86  67
>>  4 3.7  1.0  3723G   170G  3553G 4.58 0.92 270
>>  1 0.81689  1.0   836G 49612M   788G 5.79 1.16  56
>>  5 3.7  1.0  3723G   192G  3531G 5.17 1.04 282
>>  2 0.81689  1.0   836G 33639M   803G 3.93 0.79  58
>>  3 3.7  1.0  3723G   202G  3521G 5.43 1.09 291
>>   TOTAL 13680G   682G 12998G 4.99
>> MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67
>>
>> Thanks!
>>
>> -Original Message-
>> From: Cary [mailto:dynamic.c...@gmail.com]
>> Sent: Friday, December 15, 2017 4:05 PM
>> To: James Okken
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server
>> cluster)
>>
>> James,
>>
>>  Those errors are normal. Ceph creates the missing files. You can check
>> "/var/lib/ceph/osd/ceph-6", before and after you run those commands to see
>> what files are added there.
>>
>>  Make sure you get the replication factor set.
>>
>>
>> Cary
>> -Dynamic
>>
>> On Fri, Dec 15, 2017 at 6:11 PM, James Okken 
>> wrote:
>> > Thanks again Cary,
>> >
>> > Yes, once all the backfilling was done I was back to a Healthy cluster.
>> > I moved on to the same steps for the next server in the cluster, it is
>> > backfilling now.
>> > Once that is done I will do the last server in the cluster, and then I
>> > think I am done!
>> >
>> > Just checking on one thing. I get these messages when running this
>> > command. I assume this is OK, right?
>> > root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid
>> > 25c21708-f756-4593-bc9e-c5506622cf07
>> > 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open:
>> > disabling aio for non-block journal.  Use journal_force_aio to force
>> > use of aio anyway
>> > 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open:
>> > disabling aio for non-block journal.  Use journal_force_aio to force
>> > use of aio anyway
>> > 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1
>> > filestore(/var/lib/ceph/osd/ceph-4) could not find
>> > #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or
>> > directory
>> > 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store
>> > /var/lib/ceph/osd/ceph-4 for osd.4 fsid
>> > 2b9f7957-d0db-481e-923e-89972f6c594f
>> > 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file:
>> > /var/lib/ceph/osd/ceph-4/keyring: can't open
>> > /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
>> > 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring
>> > /var/lib/ceph/osd/ceph-4/keyring
>> >
>> > thanks
>> >
>> > -Original Message-
>> > From: Cary 

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-15 Thread David Turner
In conjunction with increasing the pool size to 3, also increase the pool
min_size to 2.  `ceph df` and `ceph osd df` will eventually show the full
size in use in your cluster.  In particular the output of `ceph df` with
available size in a pool takes into account the pools replication size.
Continue watching ceph -s or ceph -w to see when the backfilling for your
change to replication size finishes.

On Fri, Dec 15, 2017 at 5:06 PM James Okken 
wrote:

> This whole effort went extremely well, thanks to Cary, and Im not used to
> that with CEPH so far. (And openstack ever)
> Thank you Cary.
>
> Ive upped the replication factor and now I see "replicated size 3" in each
> of my pools. Is this the only place to check replication level? Is there a
> Global setting or only a setting per Pool?
>
> ceph osd pool ls detail
> pool 0 'rbd' replicated size 3..
> pool 1 'images' replicated size 3...
> ...
>
> One last question!
> At this replication level how can I tell how much total space I actually
> have now?
> Do I just 1/3 the Global size?
>
> ceph df
> GLOBAL:
> SIZE   AVAIL  RAW USED %RAW USED
> 13680G 12998G 682G  4.99
> POOLS:
> NAMEID USED %USED MAX AVAIL OBJECTS
> rbd 0 0 0 6448G   0
> images  1  216G  3.24 6448G   27745
> backups 2 0 0 6448G   0
> volumes 3  117G  1.79 6448G   30441
> compute 4 0 0 6448G   0
>
> ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE VAR  PGS
>  0 0.81689  1.0   836G 36549M   800G 4.27 0.86  67
>  4 3.7  1.0  3723G   170G  3553G 4.58 0.92 270
>  1 0.81689  1.0   836G 49612M   788G 5.79 1.16  56
>  5 3.7  1.0  3723G   192G  3531G 5.17 1.04 282
>  2 0.81689  1.0   836G 33639M   803G 3.93 0.79  58
>  3 3.7  1.0  3723G   202G  3521G 5.43 1.09 291
>   TOTAL 13680G   682G 12998G 4.99
> MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67
>
> Thanks!
>
> -Original Message-
> From: Cary [mailto:dynamic.c...@gmail.com]
> Sent: Friday, December 15, 2017 4:05 PM
> To: James Okken
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server
> cluster)
>
> James,
>
>  Those errors are normal. Ceph creates the missing files. You can check
> "/var/lib/ceph/osd/ceph-6", before and after you run those commands to see
> what files are added there.
>
>  Make sure you get the replication factor set.
>
>
> Cary
> -Dynamic
>
> On Fri, Dec 15, 2017 at 6:11 PM, James Okken 
> wrote:
> > Thanks again Cary,
> >
> > Yes, once all the backfilling was done I was back to a Healthy cluster.
> > I moved on to the same steps for the next server in the cluster, it is
> backfilling now.
> > Once that is done I will do the last server in the cluster, and then I
> think I am done!
> >
> > Just checking on one thing. I get these messages when running this
> command. I assume this is OK, right?
> > root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid
> > 25c21708-f756-4593-bc9e-c5506622cf07
> > 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open:
> > disabling aio for non-block journal.  Use journal_force_aio to force
> > use of aio anyway
> > 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open:
> > disabling aio for non-block journal.  Use journal_force_aio to force
> > use of aio anyway
> > 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1
> > filestore(/var/lib/ceph/osd/ceph-4) could not find
> > #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or
> > directory
> > 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store
> > /var/lib/ceph/osd/ceph-4 for osd.4 fsid
> > 2b9f7957-d0db-481e-923e-89972f6c594f
> > 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file:
> > /var/lib/ceph/osd/ceph-4/keyring: can't open
> > /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
> > 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring
> > /var/lib/ceph/osd/ceph-4/keyring
> >
> > thanks
> >
> > -Original Message-
> > From: Cary [mailto:dynamic.c...@gmail.com]
> > Sent: Thursday, December 14, 2017 7:13 PM
> > To: James Okken
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server
> > cluster)
> >
> > James,
> >
> >  Usually once the misplaced data has balanced out the cluster should
> reach a healthy state. If you run a "ceph health detail" Ceph will show you
> some more detail about what is happening.  Is Ceph still recovering, or has
> it stalled? has the "objects misplaced (62.511%"
> > changed to a lower %?
> >
> > Cary
> > -Dynamic
> >
> > On Thu, Dec 14, 2017 at 10:52 PM, James Okken 
> wrote:
> >> Thanks Cary!
> >>
> >> Your directions worked on my 

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-15 Thread Ronny Aasen
if you have a global setting in ceph.conf it will only affect the 
creation of new pools. i reccomend using the default

size:3 + min_size:2

also check your pools that you have min_size=2

kind regards
Ronny Aasen

On 15.12.2017 23:00, James Okken wrote:

This whole effort went extremely well, thanks to Cary, and Im not used to that 
with CEPH so far. (And openstack ever)
Thank you Cary.

Ive upped the replication factor and now I see "replicated size 3" in each of 
my pools. Is this the only place to check replication level? Is there a Global setting or 
only a setting per Pool?

ceph osd pool ls detail
pool 0 'rbd' replicated size 3..
pool 1 'images' replicated size 3...
...

One last question!
At this replication level how can I tell how much total space I actually have 
now?
Do I just 1/3 the Global size?

ceph df
GLOBAL:
 SIZE   AVAIL  RAW USED %RAW USED
 13680G 12998G 682G  4.99
POOLS:
 NAMEID USED %USED MAX AVAIL OBJECTS
 rbd 0 0 0 6448G   0
 images  1  216G  3.24 6448G   27745
 backups 2 0 0 6448G   0
 volumes 3  117G  1.79 6448G   30441
 compute 4 0 0 6448G   0

ceph osd df
ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE VAR  PGS
  0 0.81689  1.0   836G 36549M   800G 4.27 0.86  67
  4 3.7  1.0  3723G   170G  3553G 4.58 0.92 270
  1 0.81689  1.0   836G 49612M   788G 5.79 1.16  56
  5 3.7  1.0  3723G   192G  3531G 5.17 1.04 282
  2 0.81689  1.0   836G 33639M   803G 3.93 0.79  58
  3 3.7  1.0  3723G   202G  3521G 5.43 1.09 291
   TOTAL 13680G   682G 12998G 4.99
MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67

Thanks!

-Original Message-
From: Cary [mailto:dynamic.c...@gmail.com]
Sent: Friday, December 15, 2017 4:05 PM
To: James Okken
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

James,

  Those errors are normal. Ceph creates the missing files. You can check 
"/var/lib/ceph/osd/ceph-6", before and after you run those commands to see what 
files are added there.

  Make sure you get the replication factor set.


Cary
-Dynamic

On Fri, Dec 15, 2017 at 6:11 PM, James Okken  wrote:

Thanks again Cary,

Yes, once all the backfilling was done I was back to a Healthy cluster.
I moved on to the same steps for the next server in the cluster, it is 
backfilling now.
Once that is done I will do the last server in the cluster, and then I think I 
am done!

Just checking on one thing. I get these messages when running this command. I 
assume this is OK, right?
root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid
25c21708-f756-4593-bc9e-c5506622cf07
2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open:
disabling aio for non-block journal.  Use journal_force_aio to force
use of aio anyway
2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open:
disabling aio for non-block journal.  Use journal_force_aio to force
use of aio anyway
2017-12-15 17:28:22.856444 7fd2f9e928c0 -1
filestore(/var/lib/ceph/osd/ceph-4) could not find
#-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or
directory
2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store
/var/lib/ceph/osd/ceph-4 for osd.4 fsid
2b9f7957-d0db-481e-923e-89972f6c594f
2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file:
/var/lib/ceph/osd/ceph-4/keyring: can't open
/var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring
/var/lib/ceph/osd/ceph-4/keyring

thanks

-Original Message-
From: Cary [mailto:dynamic.c...@gmail.com]
Sent: Thursday, December 14, 2017 7:13 PM
To: James Okken
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server
cluster)

James,

  Usually once the misplaced data has balanced out the cluster should reach a healthy state. If you 
run a "ceph health detail" Ceph will show you some more detail about what is happening.  
Is Ceph still recovering, or has it stalled? has the "objects misplaced (62.511%"
changed to a lower %?

Cary
-Dynamic

On Thu, Dec 14, 2017 at 10:52 PM, James Okken  wrote:

Thanks Cary!

Your directions worked on my first sever. (once I found the missing carriage 
return in your list of commands, the email musta messed it up.

For anyone else:
chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd
'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 2 
commands:
chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4
osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring

Cary, what am I looking for in ceph -w and ceph -s to show the status of the 
data moving?
Seems 

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-15 Thread James Okken
This whole effort went extremely well, thanks to Cary, and Im not used to that 
with CEPH so far. (And openstack ever)
Thank you Cary.

Ive upped the replication factor and now I see "replicated size 3" in each of 
my pools. Is this the only place to check replication level? Is there a Global 
setting or only a setting per Pool?

ceph osd pool ls detail
pool 0 'rbd' replicated size 3..
pool 1 'images' replicated size 3...
...

One last question!
At this replication level how can I tell how much total space I actually have 
now?
Do I just 1/3 the Global size?

ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
13680G 12998G 682G  4.99
POOLS:
NAMEID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 6448G   0
images  1  216G  3.24 6448G   27745
backups 2 0 0 6448G   0
volumes 3  117G  1.79 6448G   30441
compute 4 0 0 6448G   0

ceph osd df
ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE VAR  PGS
 0 0.81689  1.0   836G 36549M   800G 4.27 0.86  67
 4 3.7  1.0  3723G   170G  3553G 4.58 0.92 270
 1 0.81689  1.0   836G 49612M   788G 5.79 1.16  56
 5 3.7  1.0  3723G   192G  3531G 5.17 1.04 282
 2 0.81689  1.0   836G 33639M   803G 3.93 0.79  58
 3 3.7  1.0  3723G   202G  3521G 5.43 1.09 291
  TOTAL 13680G   682G 12998G 4.99
MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67

Thanks!

-Original Message-
From: Cary [mailto:dynamic.c...@gmail.com] 
Sent: Friday, December 15, 2017 4:05 PM
To: James Okken
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

James,

 Those errors are normal. Ceph creates the missing files. You can check 
"/var/lib/ceph/osd/ceph-6", before and after you run those commands to see what 
files are added there.

 Make sure you get the replication factor set.


Cary
-Dynamic

On Fri, Dec 15, 2017 at 6:11 PM, James Okken  wrote:
> Thanks again Cary,
>
> Yes, once all the backfilling was done I was back to a Healthy cluster.
> I moved on to the same steps for the next server in the cluster, it is 
> backfilling now.
> Once that is done I will do the last server in the cluster, and then I think 
> I am done!
>
> Just checking on one thing. I get these messages when running this command. I 
> assume this is OK, right?
> root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid 
> 25c21708-f756-4593-bc9e-c5506622cf07
> 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open: 
> disabling aio for non-block journal.  Use journal_force_aio to force 
> use of aio anyway
> 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open: 
> disabling aio for non-block journal.  Use journal_force_aio to force 
> use of aio anyway
> 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or 
> directory
> 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store 
> /var/lib/ceph/osd/ceph-4 for osd.4 fsid 
> 2b9f7957-d0db-481e-923e-89972f6c594f
> 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file: 
> /var/lib/ceph/osd/ceph-4/keyring: can't open 
> /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
> 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring 
> /var/lib/ceph/osd/ceph-4/keyring
>
> thanks
>
> -Original Message-
> From: Cary [mailto:dynamic.c...@gmail.com]
> Sent: Thursday, December 14, 2017 7:13 PM
> To: James Okken
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server 
> cluster)
>
> James,
>
>  Usually once the misplaced data has balanced out the cluster should reach a 
> healthy state. If you run a "ceph health detail" Ceph will show you some more 
> detail about what is happening.  Is Ceph still recovering, or has it stalled? 
> has the "objects misplaced (62.511%"
> changed to a lower %?
>
> Cary
> -Dynamic
>
> On Thu, Dec 14, 2017 at 10:52 PM, James Okken  
> wrote:
>> Thanks Cary!
>>
>> Your directions worked on my first sever. (once I found the missing carriage 
>> return in your list of commands, the email musta messed it up.
>>
>> For anyone else:
>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd 
>> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 
>> 2 commands:
>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4 
>> osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>>
>> Cary, what am I looking for in ceph -w and ceph -s to show the status of the 
>> data moving?
>> Seems like the data is moving and that I have some issue...
>>
>> root@node-53:~# ceph -w
>> cluster 

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-15 Thread Cary
James,

 Those errors are normal. Ceph creates the missing files. You can
check "/var/lib/ceph/osd/ceph-6", before and after you run those
commands to see what files are added there.

 Make sure you get the replication factor set.


Cary
-Dynamic

On Fri, Dec 15, 2017 at 6:11 PM, James Okken  wrote:
> Thanks again Cary,
>
> Yes, once all the backfilling was done I was back to a Healthy cluster.
> I moved on to the same steps for the next server in the cluster, it is 
> backfilling now.
> Once that is done I will do the last server in the cluster, and then I think 
> I am done!
>
> Just checking on one thing. I get these messages when running this command. I 
> assume this is OK, right?
> root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid 
> 25c21708-f756-4593-bc9e-c5506622cf07
> 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open: 
> disabling aio for non-block journal.  Use journal_force_aio to force use of 
> aio anyway
> 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open: 
> disabling aio for non-block journal.  Use journal_force_aio to force use of 
> aio anyway
> 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory
> 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store 
> /var/lib/ceph/osd/ceph-4 for osd.4 fsid 2b9f7957-d0db-481e-923e-89972f6c594f
> 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file: 
> /var/lib/ceph/osd/ceph-4/keyring: can't open 
> /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
> 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring 
> /var/lib/ceph/osd/ceph-4/keyring
>
> thanks
>
> -Original Message-
> From: Cary [mailto:dynamic.c...@gmail.com]
> Sent: Thursday, December 14, 2017 7:13 PM
> To: James Okken
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
>
> James,
>
>  Usually once the misplaced data has balanced out the cluster should reach a 
> healthy state. If you run a "ceph health detail" Ceph will show you some more 
> detail about what is happening.  Is Ceph still recovering, or has it stalled? 
> has the "objects misplaced (62.511%"
> changed to a lower %?
>
> Cary
> -Dynamic
>
> On Thu, Dec 14, 2017 at 10:52 PM, James Okken  
> wrote:
>> Thanks Cary!
>>
>> Your directions worked on my first sever. (once I found the missing carriage 
>> return in your list of commands, the email musta messed it up.
>>
>> For anyone else:
>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd
>> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 
>> 2 commands:
>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4
>> osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>>
>> Cary, what am I looking for in ceph -w and ceph -s to show the status of the 
>> data moving?
>> Seems like the data is moving and that I have some issue...
>>
>> root@node-53:~# ceph -w
>> cluster 2b9f7957-d0db-481e-923e-89972f6c594f
>>  health HEALTH_WARN
>> 176 pgs backfill_wait
>> 1 pgs backfilling
>> 27 pgs degraded
>> 1 pgs recovering
>> 26 pgs recovery_wait
>> 27 pgs stuck degraded
>> 204 pgs stuck unclean
>> recovery 10322/84644 objects degraded (12.195%)
>> recovery 52912/84644 objects misplaced (62.511%)
>>  monmap e3: 3 mons at 
>> {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0}
>> election epoch 138, quorum 0,1,2 node-45,node-44,node-43
>>  osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs
>> flags sortbitwise,require_jewel_osds
>>   pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects
>> 370 GB used, 5862 GB / 6233 GB avail
>> 10322/84644 objects degraded (12.195%)
>> 52912/84644 objects misplaced (62.511%)
>>  308 active+clean
>>  176 active+remapped+wait_backfill
>>   26 active+recovery_wait+degraded
>>1 active+remapped+backfilling
>>1 active+recovering+degraded recovery io 100605
>> kB/s, 14 objects/s
>>   client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr
>>
>> 2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1
>> activating, 1 active+recovering+degraded, 26
>> active+recovery_wait+degraded, 1 active+remapped+backfilling, 307
>> active+clean, 176 active+remapped+wait_backfill; 333 GB data, 369 GB
>> used, 5863 GB / 6233 GB avail; 0 B/s rd, 101107 B/s wr, 19 op/s;
>> 10354/84644 objects degraded (12.232%); 52912/84644 objects misplaced
>> (62.511%); 12224 kB/s, 2 objects/s recovering
>> 2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1
>> 

Re: [ceph-users] RGW Logging pool

2017-12-15 Thread ceph . novice
we never managed to make it work, but I guess the "RGW metadata search" 
[c|sh]ould have been "the official solution"...

- http://ceph.com/geen-categorie/rgw-metadata-search/
- https://marc.info/?l=ceph-devel=149152531005431=2
- http://ceph.com/rgw/new-luminous-rgw-metadata-search/

there was also a solution based on HAproxy as beeing the "middleware" between 
the S3 clients and the RGW service, which I cannot find now...

should you solve your problem, PLEASE post how you did it (with real 
examples/commands)... because
- exactly this was one of the core requirements (beside life cycle, which 
didn't work as well :| ) in a PoC here and CEPH/RGW failed
- I would still like to push CEPH for coming projects... but all of them have 
the "metasearch" requirement

Thanks and regards
 

Gesendet: Freitag, 15. Dezember 2017 um 18:21 Uhr
Von: "David Turner" 
An: ceph-users , "Yehuda Sadeh-Weinraub" 

Betreff: [ceph-users] RGW Logging pool

We're trying to build an auditing system for when a user key pair performs an 
operation on a bucket (put, delete, creating a bucket, etc) and so far were 
only able to find this information in the level 10 debug logging in the rgw 
systems logs.
 
We noticed that our rgw log pool has been growing somewhat indefinitely and we 
had to move it off of the nvme's and put it to HDD's due to it's growing size.  
What is in that pool and how can it be accessed?  I haven't found the right 
terms to search for to find anything about what's in this pool on the ML or on 
Google.
 
What I would like to do is export the log to ElasticSearch, cleanup the log on 
occasion, and hopefully find the information we're looking for to fulfill our 
user auditing without having our RGW daemons running on debug level 10 (which 
is a lot of logging!).___ 
ceph-users mailing list ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-mon fails to start on rasberry pi (raspbian 8.0)

2017-12-15 Thread Andrew Knapp
I recently purchased 3 raspberry pi nodes to create a small storage cluster
to test with at my home.  I found a couple of procedures on setting this up
so it appears folks have successfully done this (
https://www.linkedin.com/pulse/ceph-raspberry-pi-rahul-vijayan/).

I am running Raspbian GNU/Linux 8.0 (jessie).  I'm using ceph-deploy to
install the cluster and it appears to install version 10.2.5-7.2+rpi1 of
the ceph ARM packages.

When I try to start the ceph-mon service I get the following error from
systemd:

Dec 14 19:59:46 ceph-master systemd[1]: Starting Ceph cluster monitor
daemon...
Dec 14 19:59:46 ceph-master systemd[1]: Started Ceph cluster monitor
daemon.
Dec 14 19:59:47 ceph-master ceph-mon[28237]: *** Caught signal
(Segmentation fault) **
Dec 14 19:59:47 ceph-master ceph-mon[28237]: in thread 756a5c30
thread_name:admin_socket
Dec 14 19:59:47 ceph-master systemd[1]: ceph-mon@ceph-master.service:
main process exited, code=killed, status=11/SEGV
Dec 14 19:59:47 ceph-master systemd[1]: Unit
ceph-mon@ceph-master.service entered failed state.
Dec 14 19:59:47 ceph-master systemd[1]: ceph-mon@ceph-master.service
holdoff time over, scheduling restart.
Dec 14 19:59:47 ceph-master systemd[1]: Stopping Ceph cluster monitor
daemon...
Dec 14 19:59:47 ceph-master systemd[1]: Starting Ceph cluster monitor
daemon...
Dec 14 19:59:47 ceph-master systemd[1]: Started Ceph cluster monitor
daemon.
Dec 14 19:59:49 ceph-master ceph-mon[28256]: *** Caught signal
(Segmentation fault) **
Dec 14 19:59:49 ceph-master ceph-mon[28256]: in thread 75654c30
thread_name:admin_socket
Dec 14 19:59:49 ceph-master ceph-mon[28256]: ceph version 10.2.5
(c461ee19ecbc0c5c330aca20f7392c9a00730367)
Dec 14 19:59:49 ceph-master ceph-mon[28256]: 1: (()+0x4b1348)
[0x54fae348]
Dec 14 19:59:49 ceph-master ceph-mon[28256]: 2:
(__default_sa_restorer()+0) [0x768bb480]
Dec 14 19:59:49 ceph-master ceph-mon[28256]: 3:
(AdminSocket::do_accept()+0x28) [0x550ca154]
Dec 14 19:59:49 ceph-master ceph-mon[28256]: 4:
(AdminSocket::entry()+0x22c) [0x550cc458]
Dec 14 19:59:49 ceph-master systemd[1]: ceph-mon@ceph-master.service:
main process exited, code=killed, status=11/SEGV
Dec 14 19:59:49 ceph-master systemd[1]: Unit
ceph-mon@ceph-master.service entered failed state.
Dec 14 19:59:49 ceph-master systemd[1]: ceph-mon@ceph-master.service
holdoff time over, scheduling restart.
Dec 14 19:59:49 ceph-master systemd[1]: Stopping Ceph cluster monitor
daemon...
Dec 14 19:59:49 ceph-master systemd[1]: Starting Ceph cluster monitor
daemon...
Dec 14 19:59:49 ceph-master systemd[1]: Started Ceph cluster monitor
daemon.
Dec 14 19:59:50 ceph-master ceph-mon[28271]: *** Caught signal
(Segmentation fault) **
Dec 14 19:59:50 ceph-master ceph-mon[28271]: in thread 755fcc30
thread_name:admin_socket
Dec 14 19:59:50 ceph-master systemd[1]: ceph-mon@ceph-master.service:
main process exited, code=killed, status=11/SEGV
Dec 14 19:59:50 ceph-master systemd[1]: Unit
ceph-mon@ceph-master.service entered failed state.
Dec 14 19:59:50 ceph-master systemd[1]: ceph-mon@ceph-master.service
holdoff time over, scheduling restart.
Dec 14 19:59:50 ceph-master systemd[1]: Stopping Ceph cluster monitor
daemon...
Dec 14 19:59:50 ceph-master systemd[1]: Starting Ceph cluster monitor
daemon...
Dec 14 19:59:50 ceph-master systemd[1]: ceph-mon@ceph-master.service
start request repeated too quickly, refusing to start.
Dec 14 19:59:50 ceph-master systemd[1]: Failed to start Ceph cluster
monitor daemon.
Dec 14 19:59:50 ceph-master systemd[1]: Unit
ceph-mon@ceph-master.service entered failed state.

I'm looking for guidance here as I'm not sure why this doesn't work.  I am
using the following URLs for my apt repos:

root@ceph-master:~# cat /etc/apt/sources.list
deb http://mirrordirector.raspbian.org/raspbian/ testing main contrib
non-free rpi

root@ceph-master:~# cat /etc/apt/sources.list.d/ceph.list
deb https://download.ceph.com/debian-jewel/ jessie main

Has anyone else tried this and had similar problems?  Any advice on how to
proceed or work around this issue?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-15 Thread James Okken
Thanks again Cary,

Yes, once all the backfilling was done I was back to a Healthy cluster.
I moved on to the same steps for the next server in the cluster, it is 
backfilling now.
Once that is done I will do the last server in the cluster, and then I think I 
am done!

Just checking on one thing. I get these messages when running this command. I 
assume this is OK, right?
root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid 
25c21708-f756-4593-bc9e-c5506622cf07
2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2017-12-15 17:28:22.856444 7fd2f9e928c0 -1 filestore(/var/lib/ceph/osd/ceph-4) 
could not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or 
directory
2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store 
/var/lib/ceph/osd/ceph-4 for osd.4 fsid 2b9f7957-d0db-481e-923e-89972f6c594f
2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file: 
/var/lib/ceph/osd/ceph-4/keyring: can't open /var/lib/ceph/osd/ceph-4/keyring: 
(2) No such file or directory
2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring 
/var/lib/ceph/osd/ceph-4/keyring

thanks

-Original Message-
From: Cary [mailto:dynamic.c...@gmail.com] 
Sent: Thursday, December 14, 2017 7:13 PM
To: James Okken
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

James,

 Usually once the misplaced data has balanced out the cluster should reach a 
healthy state. If you run a "ceph health detail" Ceph will show you some more 
detail about what is happening.  Is Ceph still recovering, or has it stalled? 
has the "objects misplaced (62.511%"
changed to a lower %?

Cary
-Dynamic

On Thu, Dec 14, 2017 at 10:52 PM, James Okken  wrote:
> Thanks Cary!
>
> Your directions worked on my first sever. (once I found the missing carriage 
> return in your list of commands, the email musta messed it up.
>
> For anyone else:
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd 
> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 2 
> commands:
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4 
> osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>
> Cary, what am I looking for in ceph -w and ceph -s to show the status of the 
> data moving?
> Seems like the data is moving and that I have some issue...
>
> root@node-53:~# ceph -w
> cluster 2b9f7957-d0db-481e-923e-89972f6c594f
>  health HEALTH_WARN
> 176 pgs backfill_wait
> 1 pgs backfilling
> 27 pgs degraded
> 1 pgs recovering
> 26 pgs recovery_wait
> 27 pgs stuck degraded
> 204 pgs stuck unclean
> recovery 10322/84644 objects degraded (12.195%)
> recovery 52912/84644 objects misplaced (62.511%)
>  monmap e3: 3 mons at 
> {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0}
> election epoch 138, quorum 0,1,2 node-45,node-44,node-43
>  osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs
> flags sortbitwise,require_jewel_osds
>   pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects
> 370 GB used, 5862 GB / 6233 GB avail
> 10322/84644 objects degraded (12.195%)
> 52912/84644 objects misplaced (62.511%)
>  308 active+clean
>  176 active+remapped+wait_backfill
>   26 active+recovery_wait+degraded
>1 active+remapped+backfilling
>1 active+recovering+degraded recovery io 100605 
> kB/s, 14 objects/s
>   client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr
>
> 2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1 
> activating, 1 active+recovering+degraded, 26 
> active+recovery_wait+degraded, 1 active+remapped+backfilling, 307 
> active+clean, 176 active+remapped+wait_backfill; 333 GB data, 369 GB 
> used, 5863 GB / 6233 GB avail; 0 B/s rd, 101107 B/s wr, 19 op/s; 
> 10354/84644 objects degraded (12.232%); 52912/84644 objects misplaced 
> (62.511%); 12224 kB/s, 2 objects/s recovering
> 2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1 
> active+recovering+degraded, 26 active+recovery_wait+degraded, 1 
> active+remapped+backfilling, 308 active+clean, 176 
> active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 
> 6233 GB avail; 0 B/s rd, 92788 B/s wr, 61 op/s; 10322/84644 objects 
> degraded (12.195%); 52912/84644 objects misplaced (62.511%); 100605 
> kB/s, 14 objects/s recovering
> 2017-12-14 22:46:00.474335 mon.0 [INF] pgmap v3936176: 512 pgs: 1 
> active+recovering+degraded, 26 

[ceph-users] RGW Logging pool

2017-12-15 Thread David Turner
We're trying to build an auditing system for when a user key pair performs
an operation on a bucket (put, delete, creating a bucket, etc) and so far
were only able to find this information in the level 10 debug logging in
the rgw systems logs.

We noticed that our rgw log pool has been growing somewhat indefinitely and
we had to move it off of the nvme's and put it to HDD's due to it's growing
size.  What is in that pool and how can it be accessed?  I haven't found
the right terms to search for to find anything about what's in this pool on
the ML or on Google.

What I would like to do is export the log to ElasticSearch, cleanup the log
on occasion, and hopefully find the information we're looking for to
fulfill our user auditing without having our RGW daemons running on debug
level 10 (which is a lot of logging!).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache tier unexpected behavior: promote on lock

2017-12-15 Thread Gregory Farnum
On Thu, Dec 14, 2017 at 9:11 AM, Захаров Алексей  wrote:
> Hi, Gregory,
> Thank you for your answer!
>
> Is there a way to not promote on "locking", when not using EC pools?
> Is it possible to make this configurable?
>
> We don't use EC pool. So, for us this meachanism is overhead. It only adds
> more load on both pools and network.

Unfortunately I don't think there's an easy way to avoid it that
exists right now. The caching is generally not set up well for
handling these kinds of things, but it's possible the logic to proxy
class operations onto replicated pools might not be *too*
objectionable
-Greg

>
> 14.12.2017, 01:16, "Gregory Farnum" :
>
> Voluntary “locking” in RADOS is an “object class” operation. These are not
> part of the core API and cannot run on EC pools, so any operation using them
> will cause an immediate promotion.
> On Wed, Dec 13, 2017 at 4:02 AM Захаров Алексей 
> wrote:
>
> Hello,
>
> I've found that when client gets lock on object then ceph ignores any
> promotion settings and promotes this object immedeatly.
>
> Is it a bug or a feature?
> Is it configurable?
>
> Hope for any help!
>
> Ceph version: 10.2.10 and 12.2.2
> We use libradosstriper-based clients.
>
> Cache pool settings:
> size: 3
> min_size: 2
> crash_replay_interval: 0
> pg_num: 2048
> pgp_num: 2048
> crush_ruleset: 0
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: true
> nodeep-scrub: false
> hit_set_type: bloom
> hit_set_period: 60
> hit_set_count: 30
> hit_set_fpp: 0.05
> use_gmt_hitset: 1
> auid: 0
> target_max_objects: 0
> target_max_bytes: 18819770744832
> cache_target_dirty_ratio: 0.4
> cache_target_dirty_high_ratio: 0.6
> cache_target_full_ratio: 0.8
> cache_min_flush_age: 60
> cache_min_evict_age: 180
> min_read_recency_for_promote: 15
> min_write_recency_for_promote: 15
> fast_read: 0
> hit_set_grade_decay_rate: 50
> hit_set_search_last_n: 30
>
> To get lock via cli (to test behavior) we use:
> # rados -p poolname lock get --lock-tag weird_ceph_locks --lock-cookie
> `uuid` objectname striper.lock
> Right after that object could be found in caching pool.
>
>
> --
> Regards,
> Aleksei Zakharov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Regards,
> Aleksei Zakharov
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Snap trim queue length issues

2017-12-15 Thread Sage Weil
On Fri, 15 Dec 2017, Piotr Dałek wrote:
> On 17-12-14 05:31 PM, David Turner wrote:
> > I've tracked this in a much more manual way.  I would grab a random subset
> > [..]
> > 
> > This was all on a Hammer cluster.  The changes to the snap trimming queues
> > going into the main osd thread made it so that our use case was not viable
> > on Jewel until changes to Jewel that happened after I left.  It's exciting
> > that this will actually be a reportable value from the cluster.
> > 
> > Sorry that this story doesn't really answer your question, except to say
> > that people aware of this problem likely have a work around for it.  However
> > I'm certain that a lot more clusters are impacted by this than are aware of
> > it and being able to quickly see that would be beneficial to troubleshooting
> > problems.  Backporting would be nice.  I run a few Jewel clusters that have
> > some VM's and it would be nice to see how well the cluster handle snap
> > trimming.  But they are much less critical on how much snapshots they do.
> 
> Thanks for your response, it pretty much confirms what I though:
> - users aware of issue have their own hacks that don't need to be efficient or
> convenient.
> - users unaware of issue are, well, unaware and at risk of serious service
> disruption once disk space is all used up.
> 
> Hopefully it'll be convincing enough for devs. ;)

Your PR looks great!  I commented with a nit on the format of the warning 
itself.

I expect this is trivial to backport to luminous; it will need to be 
partially reimplemented for jewel (with some care around the pg_stat_t and 
a different check for the jewel-style health checks).

sage___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to raise priority for a pg repair

2017-12-15 Thread David Turner
The method I've used in the past to initiate a repair quickly was to set
osd_max_deep_scrubs to 0 across the cluster and then set it to 2 on only
the osds that were involved in the pg.  Alternatively you could just
increase that setting to 3 or more on only those osds involved in the pg to
trigger the pg to start repairing, but that will allow those osds to
potentially have multiple deep scrubs happening at the same time.

On Fri, Dec 15, 2017 at 9:03 AM Vincent Godin  wrote:

> We have some scrub errors on our cluster. A ceph pg repair x.xxx is
> take in account only after hours. It seems to be linked to deep-scrubs
> which are running at the same time. It 's look like it has to wait for
> a slot before launching the repair. I have then two question :
> is it possible to launch a repair while the flag nodeep-scrub is set ?
> is it possible to raise priority of the pg repair for it to start quickly ?
>
> Thanks
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] S3 objects deleted but storage doesn't free space

2017-12-15 Thread David Turner
You can check to see how backed up your GC is with `radosgw-admin gc list |
wc -l`.  In one of our clusters, we realized that early testing and
re-configuring of the realm completely messed up the GC and that realm had
never actually deleted in an object in all the time it had been running in
production.  The way we found out of that mess was to create a new realm
and manually copy over bucket contents from one to the other (for any data
that couldn't be lost) and blasted away the rest.  We now no longer have
any GC problems in RGW with the fresh realm with an identical workload.

It took about 2 weeks for the data pool of the bad realm to finish deleting
form the cluster (60% full 10TB drives).

On Thu, Dec 14, 2017 at 7:41 PM Jan-Willem Michels 
wrote:

>
> Hi there all,
> Perhaps someone can help.
>
> We tried to free some storage so we deleted a lot S3 objects. The bucket
> has also valuable data so we can't delete the whole bucket.
> Everything went fine, but used storage space doesn't get  less.  We are
> expecting several TB of data to be freed.
>
> We then learned of garbage collection. So we thought let's wait. But
> even day's later no real change.
> We started " radosgw-admin gc process ", that never finished , or
> displayed any  error or anything.
> Could find anything like -verbose or debug for this command or find a
> place with log to debug what is going on when radosgw-admin is working
>
> We tried to change the default settings, we got from old posting.
> We have put them in global and tried  also in [client.rgw..]
> rgw_gc_max_objs =7877 ( but also rgw_gc_max_objs =200 or
> rgw_gc_max_objs =1000)
> rgw_lc_max_objs = 7877
> rgw_gc_obj_min_wait = 300
> rgw_gc_processor_period = 600
> rgw_gc_processor_max_time = 600
>
> We restarted the  ceph-radosgw several times, the computers, all over
> period of days etc . Tried radosgw-admin gc process a few times etc.
> Did not find any references in radosgw logs like gc:: delete etc. But we
> don't know what to look for
> System is well, nor errors or warnings. But system is in use ( we are
> loading up data) -> Will GC only run when idle?
>
> When we count them with "radosgw-admin gc list | grep oid | wc -l" we get
> 11:00 18.086.665 objects
> 13:00 18.086.665 objects
> 15:00 18.086.665 objects
> so no change in objects after hours
>
> When we list "radosgw-admin gc list" we get files like
>   radosgw-admin gc list | more
> [
>  {
>  "tag": "b5687590-473f-4386-903f-d91a77b8d5cd.7354141.21122\u",
>  "time": "2017-12-06 11:04:56.0.459704s",
>  "objs": [
>  {
>  "pool": "default.rgw.buckets.data",
>  "oid":
>
> "b5687590-473f-4386-903f-d91a77b8d5cd.44121.4__shadow_.5OtA02n_GU8TkP08We_SLrT5GL1ihuS_1",
>  "key": "",
>  "instance": ""
>  },
>  {
>  "pool": "default.rgw.buckets.data",
>  "oid":
>
> "b5687590-473f-4386-903f-d91a77b8d5cd.44121.4__shadow_.5OtA02n_GU8TkP08We_SLrT5GL1ihuS_2",
>  "key": "",
>  "instance": ""
>  },
>  {
>  "pool": "default.rgw.buckets.data",
>  "oid":
>
> "b5687590-473f-4386-903f-d91a77b8d5cd.44121.4__shadow_.5OtA02n_GU8TkP08We_SLrT5GL1ihuS_3",
>  "key": "",
>  "instance": ""
>  },
>
>   A few questions ->
>
> Who purges the gc list. Is it on the the radosgw machines. Or is it done
> distributed on the OSD's?
> Where do i have to change default "rgw_gc_max_objs =1000". We tried
> everywhere. We have used "tell" to change them in OSD and MON systems
> and also on the RGW endpoint's which we restarted.
>
> We have two radosgw endpoints. Is there a lock that only one will act,
> or will they both try to delete. Can we free / display such a lock
>
> How can I debug the application radosgw-admin. In which log files to
> look, what would be example of message.
>
> If I know an oid like above. Can I manually delete such an oid.
>
> Suppose we would delete the complete bucket with "radosgw-admin bucket
> rm --bucket=mybucket --purge-objects --inconsistent-index" would that
> also get rid of the GC files that allready there?
>
> Thanks  ahead for your time,
>
> JW Michels
>
>
>
>
>
>
>
> q
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to raise priority for a pg repair

2017-12-15 Thread Vincent Godin
We have some scrub errors on our cluster. A ceph pg repair x.xxx is
take in account only after hours. It seems to be linked to deep-scrubs
which are running at the same time. It 's look like it has to wait for
a slot before launching the repair. I have then two question :
is it possible to launch a repair while the flag nodeep-scrub is set ?
is it possible to raise priority of the pg repair for it to start quickly ?

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs mds millions of caps

2017-12-15 Thread Webert de Souza Lima
So,

On Fri, Dec 15, 2017 at 10:58 AM, Yan, Zheng  wrote:

>
> 300k are ready quite a lot. opening them requires long time. does you
> mail server really open so many files?


Yes, probably. It's a commercial solution. A few thousand domains, dozens
of thousands of users and god knows how any mailboxes.
>From the daemonperf you can see the write workload is high, so yes, too
much files opening (dovecot mdbox stores multiple e-mails per file, split
into many files).

I checked 4.4 kernel, it includes the code that trim cache when mds
> recovers.


Ok, all nodes are running 4.4.0-75-generic. The fix might have been
included in a newer version.
I'll upgrade it asap.


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs mds millions of caps

2017-12-15 Thread Webert de Souza Lima
Thanks

On Fri, Dec 15, 2017 at 10:46 AM, Yan, Zheng  wrote:

>  recent
> version kernel client and ceph-fuse should trim they cache
> aggressively when mds recovers.
>

So the bug (not sure if I can call it a bug) is already fixed in newer
kernel? Can I just update the kernel and expect this to be fixed?
Could you tell me which kernel version?


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*

>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs mds millions of caps

2017-12-15 Thread Yan, Zheng
On Fri, Dec 15, 2017 at 8:46 PM, Yan, Zheng  wrote:
> On Fri, Dec 15, 2017 at 6:54 PM, Webert de Souza Lima
>  wrote:
>> Hello, Mr. Yan
>>
>> On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng  wrote:
>>>
>>>
>>> The client hold so many capabilities because kernel keeps lots of
>>> inodes in its cache. Kernel does not trim inodes by itself if it has
>>> no memory pressure. It seems you have set mds_cache_size config to a
>>> large value.
>>
>>
>> Yes, I have set mds_cache_size = 300
>> I usually set this value according to the number of ceph.dir.rentries in
>> cephfs. Isn't that a good approach?
>>
>> I have 2 directories in cephfs root, sum of ceph.dir.rentries is 4670933,
>> for which I would set mds_cache_size to 5M (if I had enough RAM for that in
>> the MDS server).
>>
>> # getfattr -d -m ceph.dir.* index
>> # file: index
>> ceph.dir.entries="776"
>> ceph.dir.files="0"
>> ceph.dir.rbytes="52742318965"
>> ceph.dir.rctime="1513334528.09909569540"
>> ceph.dir.rentries="709233"
>> ceph.dir.rfiles="459512"
>> ceph.dir.rsubdirs="249721"
>> ceph.dir.subdirs="776"
>>
>>
>> # getfattr -d -m ceph.dir.* mail
>> # file: mail
>> ceph.dir.entries="786"
>> ceph.dir.files="1"
>> ceph.dir.rbytes="15000378101390"
>> ceph.dir.rctime="1513334524.0993982498"
>> ceph.dir.rentries="3961700"
>> ceph.dir.rfiles="3531068"
>> ceph.dir.rsubdirs="430632"
>> ceph.dir.subdirs="785"
>>
>>
>>> mds cache size isn't large enough, so mds does not ask
>>> the client to trim its inode cache neither. This can affect
>>> performance. we should make mds recognize idle client and ask idle
>>> client to trim its caps more aggressively
>>
>>
>> I think you mean that the mds cache IS large enough, right? So it doesn't
>> bother the clients.

yes, I mean the cache config is large enough.

>>
>>> This can affect performance. we should make mds recognize idle client and
>>> ask idle client to trim its caps more aggressively
>>
>>
>> One recurrent problem I have, which I guess is caused by a network issue
>> (ceph cluster in vrack), is that my MDS servers start switching who is the
>> active.
>> This happens after a lease_timeout occur in the mon, then I get "dne in the
>> mds map" from the active MDS and it suicides.
>> Even though I use standby-replay, the standby takes from 15min up to 2 hours
>> to take over as active. I see that it loads all inodes (by issuing "perf
>> dump mds" on the mds daemon).
>>
>> So, question is: if the number of caps is as low as it is supposed to be
>> (around 300k) instead if 5M, would the MDS be active faster in such case of
>> a failure?

300k are ready quite a lot. opening them requires long time. does you
mail server really open so many files?

>
> yes, mds recovery should be faster when client fewer caps. recent
> version kernel client and ceph-fuse should trim they cache
> aggressively when mds recovers.
>

I checked 4.4 kernel, it includes the code that trim cache when mds recovers.

> Regards
> Yan, Zheng
>
>>
>> Regards,
>>
>> Webert Lima
>> DevOps Engineer at MAV Tecnologia
>> Belo Horizonte - Brasil
>> IRC NICK - WebertRLZ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs mds millions of caps

2017-12-15 Thread Yan, Zheng
On Fri, Dec 15, 2017 at 6:54 PM, Webert de Souza Lima
 wrote:
> Hello, Mr. Yan
>
> On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng  wrote:
>>
>>
>> The client hold so many capabilities because kernel keeps lots of
>> inodes in its cache. Kernel does not trim inodes by itself if it has
>> no memory pressure. It seems you have set mds_cache_size config to a
>> large value.
>
>
> Yes, I have set mds_cache_size = 300
> I usually set this value according to the number of ceph.dir.rentries in
> cephfs. Isn't that a good approach?
>
> I have 2 directories in cephfs root, sum of ceph.dir.rentries is 4670933,
> for which I would set mds_cache_size to 5M (if I had enough RAM for that in
> the MDS server).
>
> # getfattr -d -m ceph.dir.* index
> # file: index
> ceph.dir.entries="776"
> ceph.dir.files="0"
> ceph.dir.rbytes="52742318965"
> ceph.dir.rctime="1513334528.09909569540"
> ceph.dir.rentries="709233"
> ceph.dir.rfiles="459512"
> ceph.dir.rsubdirs="249721"
> ceph.dir.subdirs="776"
>
>
> # getfattr -d -m ceph.dir.* mail
> # file: mail
> ceph.dir.entries="786"
> ceph.dir.files="1"
> ceph.dir.rbytes="15000378101390"
> ceph.dir.rctime="1513334524.0993982498"
> ceph.dir.rentries="3961700"
> ceph.dir.rfiles="3531068"
> ceph.dir.rsubdirs="430632"
> ceph.dir.subdirs="785"
>
>
>> mds cache size isn't large enough, so mds does not ask
>> the client to trim its inode cache neither. This can affect
>> performance. we should make mds recognize idle client and ask idle
>> client to trim its caps more aggressively
>
>
> I think you mean that the mds cache IS large enough, right? So it doesn't
> bother the clients.
>
>> This can affect performance. we should make mds recognize idle client and
>> ask idle client to trim its caps more aggressively
>
>
> One recurrent problem I have, which I guess is caused by a network issue
> (ceph cluster in vrack), is that my MDS servers start switching who is the
> active.
> This happens after a lease_timeout occur in the mon, then I get "dne in the
> mds map" from the active MDS and it suicides.
> Even though I use standby-replay, the standby takes from 15min up to 2 hours
> to take over as active. I see that it loads all inodes (by issuing "perf
> dump mds" on the mds daemon).
>
> So, question is: if the number of caps is as low as it is supposed to be
> (around 300k) instead if 5M, would the MDS be active faster in such case of
> a failure?

yes, mds recovery should be faster when client fewer caps. recent
version kernel client and ceph-fuse should trim they cache
aggressively when mds recovers.

Regards
Yan, Zheng

>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> Belo Horizonte - Brasil
> IRC NICK - WebertRLZ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Odd object blocking IO on PG

2017-12-15 Thread Gregory Farnum
For those following along at home, already done:
http://tracker.ceph.com/issues/22440



On Fri, Dec 15, 2017 at 1:57 AM Brad Hubbard  wrote:

> On Wed, Dec 13, 2017 at 11:39 PM, Nick Fisk  wrote:
> > Boom!! Fixed it. Not sure if the behavior I stumbled from is correct, but
> > this has a potential to break a few things for people moving from Jewel
> to
> > Luminous if they potentially had a few too many PG’s.
> >
> >
> >
> > Firstly, how I stumbled across it. I whacked the logging up to max on
> OSD 68
> > and saw this mentioned in the logs
> >
> >
> >
> > osd.68 106454 maybe_wait_for_max_pg withhold creation of pg 0.1cf: 403 >=
> > 400
> >
> >
> >
> > This made me search through the code for this warning string
> >
> >
> >
> > https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L4221
> >
> >
> >
> > Which jogged my memory about the changes in Luminous regarding max PG’s
> > warning, and in particular these two config options
> >
> > mon_max_pg_per_osd
> >
> > osd_max_pg_per_osd_hard_ratio
> >
> >
> >
> > In my cluster I have just over 200 PG’s per OSD, but the node with OSD.68
> > in, has 8TB disks instead of 3TB for the rest of the cluster. This means
> > these OSD’s were taking a lot more PG’s than the average would suggest.
> So
> > in Luminous 200x2 gives a hard limit of 400, which is what that error
> > message in the log suggests is the limit. I set the
> > osd_max_pg_per_osd_hard_ratio  option to 3 and restarted the OSD and hey
> > presto everything fell into line.
> >
> >
> >
> > Now a question. I get the idea around these settings to stop making too
> many
> > or pools with too many PG’s. But is it correct they can break an existing
> > pool which is maybe making the new PG on an OSD due to CRUSH layout being
> > modified?
>
> It would be good to capture this in a tracker Nick so it can be
> explored in  more depth.
>
> >
> >
> >
> > Nick
> >
> >
> >
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Nick Fisk
> > Sent: 13 December 2017 11:14
> > To: 'Gregory Farnum' 
> > Cc: 'ceph-users' 
> > Subject: Re: [ceph-users] Odd object blocking IO on PG
> >
> >
> >
> >
> >
> > On Tue, Dec 12, 2017 at 12:33 PM Nick Fisk  wrote:
> >
> >
> >> That doesn't look like an RBD object -- any idea who is
> >> "client.34720596.1:212637720"?
> >
> > So I think these might be proxy ops from the cache tier, as there are
> also
> > block ops on one of the cache tier OSD's, but this time it actually lists
> > the object name. Block op on cache tier.
> >
> >"description": "osd_op(client.34720596.1:212637720 17.ae78c1cf
> > 17:f3831e75:::rbd_data.15a5e20238e1f29.000388ad:head
> [set-alloc-hint
> > object_size 4194304 write_size 4194304,write 2584576~16384] snapc 0=[]
> > RETRY=2 ondisk+retry+write+known_if_redirected e104841)",
> > "initiated_at": "2017-12-12 16:25:32.435718",
> > "age": 13996.681147,
> > "duration": 13996.681203,
> > "type_data": {
> > "flag_point": "reached pg",
> > "client_info": {
> > "client": "client.34720596",
> > "client_addr": "10.3.31.41:0/2600619462",
> > "tid": 212637720
> >
> > I'm a bit baffled at the moment what's going. The pg query (attached) is
> not
> > showing in the main status that it has been blocked from peering or that
> > there are any missing objects. I've tried restarting all OSD's I can see
> > relating to the PG in case they needed a bit of a nudge.
> >
> >
> >
> > Did that fix anything? I don't see anything immediately obvious but I'm
> not
> > practiced in quickly reading that pg state output.
> >
> >
> >
> > What's the output of "ceph -s"?
> >
> >
> >
> > Hi Greg,
> >
> >
> >
> > No restarting OSD’s didn’t seem to help. But I did make some progress
> late
> > last night. By stopping OSD.68 the cluster unlocks itself and IO can
> > progress. However as soon as it starts back up, 0.1cf and a couple of
> other
> > PG’s again get stuck in an activating state. If I out the OSD, either
> with
> > it up or down, then some other PG’s seem to get hit by the same problem
> as
> > CRUSH moves PG mappings around to other OSD’s.
> >
> >
> >
> > So there definitely seems to be some sort of weird peering issue
> somewhere.
> > I have seen a very similar issue before on this cluster where after
> running
> > the crush reweight script to balance OSD utilization, the weight got set
> too
> > low and PG’s were unable to peer. I’m not convinced this is what’s
> happening
> > here as all the weights haven’t changed, but I’m intending to explore
> this
> > further just in case.
> >
> >
> >
> > With 68 down
> >
> > pgs: 1071783/48650631 objects degraded (2.203%)
> >
> >  5923 active+clean
> >
> >  399  

Re: [ceph-users] Any RGW admin frontends?

2017-12-15 Thread Lenz Grimmer
Hi Dan,

On 12/15/2017 10:13 AM, Dan van der Ster wrote:

> As we are starting to ramp up our internal rgw service, I am wondering
> if someone already developed some "open source" high-level admin tools
> for rgw. On the one hand, we're looking for a web UI for users to create
> and see their credentials, quota, usage, and maybe a web bucket browser.

Except for a bucket browser, openATTIC 3.6 should provide the RGW
management features you are looking for.

> Then from our service PoV, we're additionally looking for tools for
> usage reporting, generating periodic high-level reports, ... 

We use Prometheus/Grafana for that, not sure if that would work for you?

> I'm aware of the OpenStack "Object Store" integration with rgw, but I'm
> curious what exists outside the OS universe.

The Inkscope folks have also started adding RGW management, but I'm not
sure how active this project is.

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph metric exporter HTTP Error 500

2017-12-15 Thread Lenz Grimmer
Hi,

On 12/15/2017 11:53 AM, Falk Mueller-Braun wrote:

> since we upgraded to Luminous (12.2.2), we use the internal Ceph
> exporter for getting the Ceph metrics to Prometheus. At random times we
> get a Internal Server Error from the Ceph exporter, with python having a
> key error with some random metric. Often it is "pg_*".
> 
> Here is an example of the python exception:
> 
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 
> 670, in respond
> response.body = self.handler()
>   File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 
> 217, in __call__
> self.body = self.oldhandler(*args, **kwargs)
>   File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 
> 61, in __call__
> return self.callable(*self.args, **self.kwargs)
>   File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics
> metrics = global_instance().collect()
>   File "/usr/lib/ceph/mgr/prometheus/module.py", line 324, in collect
> self.get_pg_status()
>   File "/usr/lib/ceph/mgr/prometheus/module.py", line 266, in 
> get_pg_status
> self.metrics[path].set(value)
> KeyError: 'pg_deep'
> 
> After a certain time (could be 3-5 minutes oder sometimes even 40
> minutes), the metric sending starts working again without any help.
> 
> Has anyone got an idea what could be done about that or does experience
> similar problems?

This seems to be a regression in 12.2.2 -
http://tracker.ceph.com/issues/22441 (which is a duplicate of
http://tracker.ceph.com/issues/22116)

And then there's another one that might be related:
http://tracker.ceph.com/issues/22313

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Latency metrics for mons, osd applies and commits

2017-12-15 Thread Falk Mueller-Braun
Hello,

since using the internal ceph exporter (after having upgrade to Ceph
Luminous [12.2.2]), I can't find any latency related metrics. Before, i
used the the latency of the monitors, the osd apply and commit latency,
which were pretty useful for monitoring.

Is there any possibility to get these metrics or are they just missing?

Thanks,
Falk

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs mds millions of caps

2017-12-15 Thread Webert de Souza Lima
Hello, Mr. Yan

On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng  wrote:

>
> The client hold so many capabilities because kernel keeps lots of
> inodes in its cache. Kernel does not trim inodes by itself if it has
> no memory pressure. It seems you have set mds_cache_size config to a
> large value.


Yes, I have set mds_cache_size = 300
I usually set this value according to the number of ceph.dir.rentries in
cephfs. Isn't that a good approach?

I have 2 directories in cephfs root, sum of ceph.dir.rentries is 4670933,
for which I would set mds_cache_size to 5M (if I had enough RAM for that in
the MDS server).

# getfattr -d -m ceph.dir.* index
# file: index
ceph.dir.entries="776"
ceph.dir.files="0"
ceph.dir.rbytes="52742318965"
ceph.dir.rctime="1513334528.09909569540"
ceph.dir.rentries="709233"
ceph.dir.rfiles="459512"
ceph.dir.rsubdirs="249721"
ceph.dir.subdirs="776"


# getfattr -d -m ceph.dir.* mail
# file: mail
ceph.dir.entries="786"
ceph.dir.files="1"
ceph.dir.rbytes="15000378101390"
ceph.dir.rctime="1513334524.0993982498"
ceph.dir.rentries="3961700"
ceph.dir.rfiles="3531068"
ceph.dir.rsubdirs="430632"
ceph.dir.subdirs="785"


mds cache size isn't large enough, so mds does not ask
> the client to trim its inode cache neither. This can affect
> performance. we should make mds recognize idle client and ask idle
> client to trim its caps more aggressively
>

I think you mean that the mds cache IS large enough, right? So it doesn't
bother the clients.

This can affect performance. we should make mds recognize idle client and
> ask idle client to trim its caps more aggressively
>

One recurrent problem I have, which I guess is caused by a network issue
(ceph cluster in vrack), is that my MDS servers start switching who is the
active.
This happens after a lease_timeout occur in the mon, then I get "dne in the
mds map" from the active MDS and it suicides.
Even though I use standby-replay, the standby takes from 15min up to 2
hours to take over as active. I see that it loads all inodes (by issuing
"perf dump mds" on the mds daemon).

So, question is: if the number of caps is as low as it is supposed to be
(around 300k) instead if 5M, would the MDS be active faster in such case of
a failure?

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph metric exporter HTTP Error 500

2017-12-15 Thread Falk Mueller-Braun
Hello,

since we upgraded to Luminous (12.2.2), we use the internal Ceph
exporter for getting the Ceph metrics to Prometheus. At random times we
get a Internal Server Error from the Ceph exporter, with python having a
key error with some random metric. Often it is "pg_*".

Here is an example of the python exception:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, 
in respond
response.body = self.handler()
  File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 
217, in __call__
self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, 
in __call__
return self.callable(*self.args, **self.kwargs)
  File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics
metrics = global_instance().collect()
  File "/usr/lib/ceph/mgr/prometheus/module.py", line 324, in collect
self.get_pg_status()
  File "/usr/lib/ceph/mgr/prometheus/module.py", line 266, in get_pg_status
self.metrics[path].set(value)
KeyError: 'pg_deep'

After a certain time (could be 3-5 minutes oder sometimes even 40
minutes), the metric sending starts working again without any help.


Has anyone got an idea what could be done about that or does experience
similar problems?

Thanks,
Falk

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems understanding 'ceph features' output

2017-12-15 Thread Massimo Sgaravatto
Thanks for your answer

Actually I have the very same configuration on the three "client hosts": on
each of them I simply mapped a single rbd volume ...

Cheers, Massimo

2017-12-15 11:10 GMT+01:00 Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de>:

> Hi,
>
>
> On 12/15/2017 10:56 AM, Massimo Sgaravatto wrote:
>
>> Hi
>>
>> I tried the jewel --> luminous update on a small testbed composed by:
>>
>> - 3 mon + mgr nodes
>> - 3 osd nodes (4 OSDs per each of this node)
>> - 3 clients (each client maps a single volume)
>>
>> *snipsnap*
>
>>
>>
>> [*]
>> "client": {
>> "group": {
>> "features": "0x40106b84a842a52",
>> "release": "jewel",
>> "num": 3
>> },
>> "group": {
>> "features": "0x1ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 5
>> }
>>
> AFAIK "client" does not refer to a host, but to the application running on
> the host. If you have several qemu+rbd based VMs running on a host, each VM
> with be considered an individual client.
>
> So I assume there are 3 ceph applications (e.g. three VMs) on the jewel
> host, and 5 applications on the two luminous hosts.
>
> Regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems understanding 'ceph features' output

2017-12-15 Thread Burkhard Linke

Hi,


On 12/15/2017 10:56 AM, Massimo Sgaravatto wrote:

Hi

I tried the jewel --> luminous update on a small testbed composed by:

- 3 mon + mgr nodes
- 3 osd nodes (4 OSDs per each of this node)
- 3 clients (each client maps a single volume)


*snipsnap*



[*]
    "client": {
        "group": {
            "features": "0x40106b84a842a52",
            "release": "jewel",
            "num": 3
        },
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 5
        }
AFAIK "client" does not refer to a host, but to the application running 
on the host. If you have several qemu+rbd based VMs running on a host, 
each VM with be considered an individual client.


So I assume there are 3 ceph applications (e.g. three VMs) on the jewel 
host, and 5 applications on the two luminous hosts.


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-15 Thread John Spray
On Fri, Dec 15, 2017 at 1:45 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi
>
> i used 3 nodes to deploy mds (each node also has mon on it)
>
> my config:
> [mds.ceph-node-10-101-4-17]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> [mds.ceph-node-10-101-4-21]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> [mds.ceph-node-10-101-4-22]
> mds_standby_replay = true
> mds_standby_for_rank = 0
>
> the mds stat:
> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
> up:standby
>
> i mount the cephfs on the ceph client, and run the test script to write data
> into file under the cephfs dir,
> when i reboot the master mds, and i found the data is not written into the
> file.
> after 15 seconds, data can be written into the file again
>
> so my question is:
> is this normal when reboot the master mds?
> when will the up:standby-replay mds take over the the cephfs?

The standby takes over after the active daemon has not reported to the
monitors for `mds_beacon_grace` seconds, which as you have noticed is
15s by default.

If you know you are rebooting something, you can pre-empt the timeout
mechanism by using "ceph mds fail" on the active daemon, to cause
another to take over right away.

John

> thanks
>
> 
> 13605702...@163.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problems understanding 'ceph features' output

2017-12-15 Thread Massimo Sgaravatto
Hi

I tried the jewel --> luminous update on a small testbed composed by:

- 3 mon + mgr nodes
- 3 osd nodes (4 OSDs per each of this node)
- 3 clients (each client maps a single volume)


In short:

- I updated the 3 mons
- I deployed mgr on the 3 mon hosts
- I updated the 3 osd nodes
- I updated 2 client nodes (one is still running jewel)

After the update, everything seems ok, but I am not able to understand the
"ceph features" output, which, for the client part, reports[*] 3 jewel
clients and 5 luminous clients (?!?)

A "grep  0x1ffddff8eea4fffb" (luminous) on the mon log file reports the
three mon-mgr nodes. While a "grep 0x40106b84a842a52" (jewel) doesn't
return anything

Any hints ?

Thanks, Massimo



[*]
"client": {
"group": {
"features": "0x40106b84a842a52",
"release": "jewel",
"num": 3
},
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 5
}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Any RGW admin frontends?

2017-12-15 Thread Dan van der Ster
Hi all,

As we are starting to ramp up our internal rgw service, I am wondering if
someone already developed some "open source" high-level admin tools for
rgw. On the one hand, we're looking for a web UI for users to create and
see their credentials, quota, usage, and maybe a web bucket browser. Then
from our service PoV, we're additionally looking for tools for usage
reporting, generating periodic high-level reports, ...
I'm aware of the OpenStack "Object Store" integration with rgw, but I'm
curious what exists outside the OS universe.

Best Regards,

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Snap trim queue length issues

2017-12-15 Thread Piotr Dałek

On 17-12-14 05:31 PM, David Turner wrote:
I've tracked this in a much more manual way.  I would grab a random subset 
[..]


This was all on a Hammer cluster.  The changes to the snap trimming queues 
going into the main osd thread made it so that our use case was not viable 
on Jewel until changes to Jewel that happened after I left.  It's exciting 
that this will actually be a reportable value from the cluster.


Sorry that this story doesn't really answer your question, except to say 
that people aware of this problem likely have a work around for it.  However 
I'm certain that a lot more clusters are impacted by this than are aware of 
it and being able to quickly see that would be beneficial to troubleshooting 
problems.  Backporting would be nice.  I run a few Jewel clusters that have 
some VM's and it would be nice to see how well the cluster handle snap 
trimming.  But they are much less critical on how much snapshots they do.


Thanks for your response, it pretty much confirms what I though:
- users aware of issue have their own hacks that don't need to be efficient 
or convenient.
- users unaware of issue are, well, unaware and at risk of serious service 
disruption once disk space is all used up.


Hopefully it'll be convincing enough for devs. ;)

--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com