[ceph-users] /var/lib/ceph/osd/ceph-xxx/current/meta shows "Structure needs cleaning"

2018-03-07 Thread 赵贺东
Hi All,

Every time after we activate osd, we got “Structure needs cleaning” in 
/var/lib/ceph/osd/ceph-xxx/current/meta.


/var/lib/ceph/osd/ceph-xxx/current/meta
# ls -l
ls: reading directory .: Structure needs cleaning
total 0

Could Anyone say something about this error?

Thank you!


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multipart Upload - POST fails

2018-03-07 Thread Ingo Reimann
No-one?


-Ursprüngliche Nachricht-
Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von
Ingo Reimann
Gesendet: Freitag, 2. März 2018 14:15
An: ceph-users
Betreff: [ceph-users] Multipart Upload - POST fails

Hi,

we discovered some problem with our installation - Multipart upload is not
working.

What we did:
* tried upload with cyberduck as well as with script from
http://tracker.ceph.com/issues/12790
* tried against jewel gateways and luminous gateways from old cluster
* tried against 12.2.4 gateway with jewel-era cluster

Surprisingly this is no signature problem as in the issue above, instead I
get the following in the logs:

2018-03-02 13:59:04.927353 7fe2053ca700  1 == starting new request
req=0x7fe2053c42c0 =
2018-03-02 13:59:04.927383 7fe2053ca700  2 req 61:0.30::POST
/luminous-12-2-4/Data128MB::initializing for trans_id =
tx0003d-005a994a98-10c84997-default
2018-03-02 13:59:04.927396 7fe2053ca700 10 rgw api priority: s3=5
s3website=4
2018-03-02 13:59:04.927399 7fe2053ca700 10 host=cephrgw01.dunkel.de
2018-03-02 13:59:04.927422 7fe2053ca700 20 subdomain=
domain=cephrgw01.dunkel.de in_hosted_domain=1 in_hosted_domain_s3website=0
2018-03-02 13:59:04.927427 7fe2053ca700 20 final domain/bucket subdomain=
domain=cephrgw01.dunkel.de in_hosted_domain=1 in_hosted_domain_s3website=0
s->info.domain=cephrgw01.dunkel.de
s->info.request_uri=/luminous-12-2-4/Data128MB
2018-03-02 13:59:04.927447 7fe2053ca700 10 meta>>
HTTP_X_AMZ_CONTENT_SHA256
2018-03-02 13:59:04.927454 7fe2053ca700 10 meta>> HTTP_X_AMZ_DATE
2018-03-02 13:59:04.927459 7fe2053ca700 10 x>>
x-amz-content-sha256:254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e4
52b97453917
2018-03-02 13:59:04.927464 7fe2053ca700 10 x>> x-amz-date:20180302T125904Z
2018-03-02 13:59:04.927493 7fe2053ca700 20 get_handler
handler=22RGWHandler_REST_Obj_S3
2018-03-02 13:59:04.927500 7fe2053ca700 10
handler=22RGWHandler_REST_Obj_S3
2018-03-02 13:59:04.927505 7fe2053ca700  2 req 61:0.000152:s3:POST
/luminous-12-2-4/Data128MB::getting op 4
2018-03-02 13:59:04.927512 7fe2053ca700 10
op=28RGWInitMultipart_ObjStore_S3
2018-03-02 13:59:04.927514 7fe2053ca700  2 req 61:0.000161:s3:POST
/luminous-12-2-4/Data128MB:init_multipart:verifying requester
2018-03-02 13:59:04.927519 7fe2053ca700 20
rgw::auth::StrategyRegistry::s3_main_strategy_t: trying
rgw::auth::s3::AWSAuthStrategy
2018-03-02 13:59:04.927524 7fe2053ca700 20 rgw::auth::s3::AWSAuthStrategy:
trying rgw::auth::s3::S3AnonymousEngine
2018-03-02 13:59:04.927531 7fe2053ca700 20
rgw::auth::s3::S3AnonymousEngine denied with reason=-1
2018-03-02 13:59:04.927533 7fe2053ca700 20 rgw::auth::s3::AWSAuthStrategy:
trying rgw::auth::s3::LocalEngine
2018-03-02 13:59:04.927569 7fe2053ca700 10 v4 signature format =
48cc8c61a70dde17932d925f65f843116199c1ca10094db83e7de05bfbd57dc4
2018-03-02 13:59:04.927584 7fe2053ca700 10 v4 credential format =
8DGDGA57XL9YPM8DGEQQ/20180302/us-east-1/s3/aws4_request
2018-03-02 13:59:04.927587 7fe2053ca700 10 access key id =
8DGDGA57XL9YPM8DGEQQ
2018-03-02 13:59:04.927589 7fe2053ca700 10 credential scope =
20180302/us-east-1/s3/aws4_request
2018-03-02 13:59:04.927620 7fe2053ca700 10 canonical headers format =
content-type:application/octet-stream
date:Fri, 02 Mar 2018 12:59:04 GMT
host:cephrgw01.dunkel.de
x-amz-content-sha256:254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e4
52b97453917
x-amz-date:20180302T125904Z

2018-03-02 13:59:04.927634 7fe2053ca700 10 payload request hash =
254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917
2018-03-02 13:59:04.927690 7fe2053ca700 10 canonical request = POST
/luminous-12-2-4/Data128MB uploads= content-type:application/octet-stream
date:Fri, 02 Mar 2018 12:59:04 GMT
host:cephrgw01.dunkel.de
x-amz-content-sha256:254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e4
52b97453917
x-amz-date:20180302T125904Z

content-type;date;host;x-amz-content-sha256;x-amz-date
254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917
2018-03-02 13:59:04.927696 7fe2053ca700 10 canonical request hash =
54e9858263535b46a3c4e51b2ae5c1d0bf5e7a7690c5bba722eea749e7b936c4
2018-03-02 13:59:04.927716 7fe2053ca700 10 string to sign =
AWS4-HMAC-SHA256
20180302T125904Z
20180302/us-east-1/s3/aws4_request
54e9858263535b46a3c4e51b2ae5c1d0bf5e7a7690c5bba722eea749e7b936c4
2018-03-02 13:59:04.927920 7fe2053ca700 10 date_k=
dcef1f3be70873f1cb3240f7a56320e3c6763e7cf4bfae0e3182d2f9525292cd
2018-03-02 13:59:04.927954 7fe2053ca700 10 region_k  =
3d83dd9161cf7ba15e6c8c28d264f6cfce9b848e927359f34364a6c8c98209b7
2018-03-02 13:59:04.927963 7fe2053ca700 10 service_k =
e0708e00dc6b52aa1d889f45cd1dcced2bb1b2eee1b62e94ad9813c555e8eda9
2018-03-02 13:59:04.927972 7fe2053ca700 10 signing_k =
1ae362c4b2f1666786404fdb56c62d4f393635b2ce76d46ba325097fd3aa645e
2018-03-02 13:59:04.928021 7fe2053ca700 10 generated signature =
48cc8c61a70dde17932d925f65f843116199c1ca10094db83e7de05bfbd57dc4
2018-03-02 13:59:04.928031 7fe2053ca700 15 

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-07 Thread shadow_lin
Hi David,
Thanks for the info.
Could I assume that if use active/passive multipath with rbd exclusive lock  
then all targets which support rbd(via block) are safe?
2018-03-08 

shadow_lin 



发件人:David Disseldorp 
发送时间:2018-03-08 08:47
主题:Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock
收件人:"shadow_lin"
抄送:"Mike Christie","Lazuardi 
Nasution","Ceph Users"

Hi shadowlin, 

On Wed, 7 Mar 2018 23:24:42 +0800, shadow_lin wrote: 

> Is it safe to use active/active multipath If use suse kernel with 
> target_core_rbd? 
> Thanks. 

A cross-gateway failover race-condition similar to what Mike described 
is currently possible with active/active target_core_rbd. It's a corner 
case that is dependent on a client assuming that unacknowledged I/O has 
been implicitly terminated and can be resumed via an alternate path, 
while the original gateway at the same time issues the original request 
such that it reaches the Ceph cluster after differing I/O to the same 
region via the alternate path. 
It's not something that we've observed in the wild, but is nevertheless 
a bug that is being worked on, with a resolution that should also be 
usable for active/active tcmu-runner. 

Cheers, David ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] improve single job sequencial read performance.

2018-03-07 Thread Alex Gorbachev
On Wed, Mar 7, 2018 at 8:37 PM, Alex Gorbachev  wrote:
> On Wed, Mar 7, 2018 at 9:43 AM, Cassiano Pilipavicius
>  wrote:
>> Hi all, this issue already have been discussed in older threads and I've
>> already tried most of the solutions proposed in older threads.
>>
>>
>> I have a small and  old ceph cluster (slarted in hammer and upgraded until
>> luminous 12.2.2) , connected thru single 1gbe link shared (I know this is
>> not optimal but for my workload it is handling the load reasonably well). I
>> use for RBD for small VMs in libvirtu/qemu.
>>
>> My problem is... If i need to copy a large file (cp, dd, tar), the read
>> speed is very low (15MB/s). I've tested the write speed of a single job with
>> dd zero (direct) > file and the speed is good enought for my environment
>> (80MB/s)
>>
>> If I run paralell jobs, I can saturate the network connection, the speed
>> scales with the number of jobs. I've tried setting read ahead on ceph.conf
>> and in the guest O.S
>>
>> I've never heard any report of a cluster using single 1gbe, maybe this speed
>> is what should I expect? The next week I will be upgrading the network for 2
>> x 10gbe (private and public) but I would like to know if I have any issue
>> that I need to address before, as the problem can be masked by the network
>> upgrade.
>>
>> If anyone can throw some light or point me in any direction or tell me
>> this is what you should expect I really apreciate. If anyone need more
>> info please let me know.
>
> Workarounds I have heard of or used:
>
> 1. Use fancy striping and parallelize that way
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-April/017744.html
>
> 2. Use lvm and set up a striped volume over multiple RBDs
>
> 3. Weird but we had seen improvement in sequential speeds with larger
> object size (16 MB) in the past
>
> 4. Caching solutions may help smooth out peaks and valleys of IO -
> bcache, flashcache and we have successfully used EnhanceIO with
> writethrough mode
>
> 5. Better SSD journals help if using filestore
>
> 6. Caching controllers, e.g. Areca
>
> --
> Alex Gorbachev
> Storcium
>
>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-07 Thread David Disseldorp
Hi shadowlin,

On Wed, 7 Mar 2018 23:24:42 +0800, shadow_lin wrote:

> Is it safe to use active/active multipath If use suse kernel with 
> target_core_rbd?
> Thanks.

A cross-gateway failover race-condition similar to what Mike described
is currently possible with active/active target_core_rbd. It's a corner
case that is dependent on a client assuming that unacknowledged I/O has
been implicitly terminated and can be resumed via an alternate path,
while the original gateway at the same time issues the original request
such that it reaches the Ceph cluster after differing I/O to the same
region via the alternate path.
It's not something that we've observed in the wild, but is nevertheless
a bug that is being worked on, with a resolution that should also be
usable for active/active tcmu-runner.

Cheers, David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Don't use ceph mds set max_mds

2018-03-07 Thread Patrick Donnelly
On Wed, Mar 7, 2018 at 5:29 AM, John Spray  wrote:
> On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Ster  wrote:
>> Hi all,
>>
>> What is the purpose of
>>
>>ceph mds set max_mds 
>>
>> ?
>>
>> We just used that by mistake on a cephfs cluster when attempting to
>> decrease from 2 to 1 active mds's.
>>
>> The correct command to do this is of course
>>
>>   ceph fs set  max_mds 
>>
>> So, is `ceph mds set max_mds` useful for something? If not, should it
>> be removed from the CLI?
>
> It's the legacy version of the command from before we had multiple
> filesystems.  Those commands are marked as obsolete internally so that
> they're not included in the --help output, but they're still handled
> (applied to the "default" filesystem) if called.
>
> The multi-fs stuff went in for Jewel, so maybe we should think about
> removing the old commands in Mimic: any thoughts Patrick?

These commands have already been removed (obsoleted) in master/Mimic.
You can no longer use them. In Luminous, the commands are deprecated
(basically, omitted from --help).

See also: https://tracker.ceph.com/issues/20596

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg inconsistent

2018-03-07 Thread Brad Hubbard
On Thu, Mar 8, 2018 at 1:22 AM, Harald Staub  wrote:
> "ceph pg repair" leads to:
> 5.7bd repair 2 errors, 0 fixed
>
> Only an empty list from:
> rados list-inconsistent-obj 5.7bd --format=json-pretty
>
> Inspired by http://tracker.ceph.com/issues/12577 , I tried again with more
> verbose logging and searched the osd logs e.g. for "!=", "mismatch", could
> not find anything interesting. Oh well, these are several millions of lines
> ...
>
> Any hint what I could look for?

Try searching for "scrub_compare_maps" and looking for "5.7bd" in that context.

>
> The 3 OSDs involved are running on 12.2.4, one of them is on BlueStore.
>
> Cheers
>  Harry
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-07 Thread shadow_lin
Hi Christie,
Is it safe to use active/passive multipath with krbd with exclusive lock for 
lio/tgt/scst/tcmu?
Is it safe to use active/active multipath If use suse kernel with 
target_core_rbd?
Thanks.

2018-03-07 


shadowlin




发件人:Mike Christie 
发送时间:2018-03-07 03:51
主题:Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock
收件人:"Lazuardi Nasution","Ceph 
Users"
抄送:

On 03/06/2018 01:17 PM, Lazuardi Nasution wrote: 
> Hi, 
>  
> I want to do load balanced multipathing (multiple iSCSI gateway/exporter 
> nodes) of iSCSI backed with RBD images. Should I disable exclusive lock 
> feature? What if I don't disable that feature? I'm using TGT (manual 
> way) since I get so many CPU stuck error messages when I was using LIO. 
>  

You are using LIO/TGT with krbd right? 

You cannot or shouldn't do active/active multipathing. If you have the 
lock enabled then it bounces between paths for each IO and will be slow. 
If you do not have it enabled then you can end up with stale IO 
overwriting current data. ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg inconsistent

2018-03-07 Thread Harald Staub

"ceph pg repair" leads to:
5.7bd repair 2 errors, 0 fixed

Only an empty list from:
rados list-inconsistent-obj 5.7bd --format=json-pretty

Inspired by http://tracker.ceph.com/issues/12577 , I tried again with 
more verbose logging and searched the osd logs e.g. for "!=", 
"mismatch", could not find anything interesting. Oh well, these are 
several millions of lines ...


Any hint what I could look for?

The 3 OSDs involved are running on 12.2.4, one of them is on BlueStore.

Cheers
 Harry
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Client Capabilities questions

2018-03-07 Thread John Spray
On Wed, Mar 7, 2018 at 2:45 PM, Kenneth Waegeman
 wrote:
> Hi all,
>
> I am playing with limiting client access to certain subdirectories of cephfs
> running latest 12.2.4 and latest centos 7.4 kernel, both using kernel client
> and fuse
>
> I am following  http://docs.ceph.com/docs/luminous/cephfs/client-auth/:
>
> To completely restrict the client to the bar directory, omit the root
> directory
>
> ceph fs authorize cephfs client.foo /bar rw
>
> When I mount this directory with fuse, this works. When I try to mount the
> subdirectory directly with the kernel client, I get
>
> mount error 13 = Permission denied
>
>
> This only seems to work when the root is readable.
>
> --> Is there a way to mount subdirectory with kernel client when parent in
> cephfs is not readable ?

The latest CentOS kernel isn't necessarily very recent: it sounds like
the version in use there is a little older (at one point the subdir
mount support had this quirk with the kclient that required the root
be readable).

> Then I checked the data pool with rados, but I can list/get/.. every object
> in the data pool using the client.foo key.
>
> I saw in the docs of master
> http://docs.ceph.com/docs/master/cephfs/client-auth/ that you can add a tag
> cephfs, but if I add this I can't write anything to cephfs anymore, so I
> guess this is not yet supported in luminous.
>
> --> Is there a way to limit the cephfs user to his data only (through
> cephfs) instead of being able to do everything on the pool, without needing
> a pool for every single cephfs client?

Yes.  You can do this with namespaces: set the
ceph.dir.layout.pool_namespace on the restricted subdir (before any
files are written in there), and then restrict the client's OSD caps
to that namespace within the pool, with a cap like "allow rw pool=foo
namespace=baz".

John

>
>
> Thanks!!
>
> Kenneth
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS Client Capabilities questions

2018-03-07 Thread Kenneth Waegeman

Hi all,

I am playing with limiting client access to certain subdirectories of 
cephfs running latest 12.2.4 and latest centos 7.4 kernel, both using 
kernel client and fuse


I am following http://docs.ceph.com/docs/luminous/cephfs/client-auth/:

/To completely restrict the client to the //|bar|//directory, omit the 
root directory/


//

///cephfsauthorizecephfsclient//.//foo///barrw///

When I mount this directory with fuse, this works. When I try to mount 
the subdirectory directly with the kernel client, I get


/mount error 13 = Permission denied /

This only seems to work when the root is readable.

--> Is there a way to mount subdirectory with kernel client when parent 
in cephfs is not readable ?



Then I checked the data pool with rados, but I can list/get/.. every 
object in the data pool using the client.foo key.


I saw in the docs of master 
http://docs.ceph.com/docs/master/cephfs/client-auth/ that you can add a 
tag cephfs, but if I add this I can't write anything to cephfs anymore, 
so I guess this is not yet supported in luminous.


--> Is there a way to limit the cephfs user to his data only (through 
cephfs) instead of being able to do everything on the pool, without 
needing a pool for every single cephfs client?




Thanks!!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] improve single job sequencial read performance.

2018-03-07 Thread Cassiano Pilipavicius
Hi all, this issue already have been discussed in older threads and I've 
already tried most of the solutions proposed in older threads.



I have a small and  old ceph cluster (slarted in hammer and upgraded 
until luminous 12.2.2) , connected thru single 1gbe link shared (I know 
this is not optimal but for my workload it is handling the load 
reasonably well). I use for RBD for small VMs in libvirtu/qemu.


My problem is... If i need to copy a large file (cp, dd, tar), the read 
speed is very low (15MB/s). I've tested the write speed of a single job 
with dd zero (direct) > file and the speed is good enought for my 
environment (80MB/s)


If I run paralell jobs, I can saturate the network connection, the speed 
scales with the number of jobs. I've tried setting read ahead on 
ceph.conf and in the guest O.S


I've never heard any report of a cluster using single 1gbe, maybe this 
speed is what should I expect? The next week I will be upgrading the 
network for 2 x 10gbe (private and public) but I would like to know if I 
have any issue that I need to address before, as the problem can be 
masked by the network upgrade.


If anyone can throw some light or point me in any direction or tell 
me this is what you should expect I really apreciate. If anyone 
need more info please let me know.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Don't use ceph mds set max_mds

2018-03-07 Thread John Spray
On Wed, Mar 7, 2018 at 2:02 PM, Dan van der Ster  wrote:
> On Wed, Mar 7, 2018 at 2:29 PM, John Spray  wrote:
>> On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Ster  
>> wrote:
>>> Hi all,
>>>
>>> What is the purpose of
>>>
>>>ceph mds set max_mds 
>>>
>>> ?
>>>
>>> We just used that by mistake on a cephfs cluster when attempting to
>>> decrease from 2 to 1 active mds's.
>>>
>>> The correct command to do this is of course
>>>
>>>   ceph fs set  max_mds 
>>>
>>> So, is `ceph mds set max_mds` useful for something? If not, should it
>>> be removed from the CLI?
>>
>> It's the legacy version of the command from before we had multiple
>> filesystems.  Those commands are marked as obsolete internally so that
>> they're not included in the --help output,
>
> Ahhh! It is indeed omitted from --help but I hadn't noticed because it
> is still rather helpful if you go ahead and run the command:
>
> # ceph mds set
> Invalid command:  missing required parameter
> var(max_mds|max_file_size|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags)
> mds set 
> max_mds|max_file_size|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags
>  {} :  set mds parameter  to 
> Error EINVAL: invalid command
>
> I suppose we just need a new generation of operators that would never
> even try these old deprecated commands ;)
>
>> but they're still handled
>> (applied to the "default" filesystem) if called.
>
> Hmm... does it apply if we never set the default fs (though only have one) ?
> (How do we even see/get the default fs?)

It'll automatically be set to the first filesystem created.

Now that I go look for the setting, I remember it's actually got the
slightly esoteric internal name of "legacy_client_fscid" (because it's
the filesystem ID that will get mounted by a legacy client that
doesn't know which filesystem it wants).  You set it with "ceph fs
set-default", but it looks like it got left out of FSMap::dump, so
there's no easy way to peek at it.

Created https://github.com/ceph/ceph/pull/20780

John





> What happened in our case is that I did `ceph mds set max_mds 1` then
> deactivated rank 2. This caused some sort of outage which deadlocked
> the mds's (they recovered after restarting). I assume the outage
> happened because I deactivated rank 2 while we still had max_mds=2 at
> the fs scope (and we had no standbys -- due to the v12.2.2->4 upgrade
> breakage).
>
> Thanks John!
>
> Dan
>
>>
>> The multi-fs stuff went in for Jewel, so maybe we should think about
>> removing the old commands in Mimic: any thoughts Patrick?
>>
>> John
>>
>>>
>>> Cheers, Dan
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Don't use ceph mds set max_mds

2018-03-07 Thread Dan van der Ster
On Wed, Mar 7, 2018 at 2:29 PM, John Spray  wrote:
> On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Ster  wrote:
>> Hi all,
>>
>> What is the purpose of
>>
>>ceph mds set max_mds 
>>
>> ?
>>
>> We just used that by mistake on a cephfs cluster when attempting to
>> decrease from 2 to 1 active mds's.
>>
>> The correct command to do this is of course
>>
>>   ceph fs set  max_mds 
>>
>> So, is `ceph mds set max_mds` useful for something? If not, should it
>> be removed from the CLI?
>
> It's the legacy version of the command from before we had multiple
> filesystems.  Those commands are marked as obsolete internally so that
> they're not included in the --help output,

Ahhh! It is indeed omitted from --help but I hadn't noticed because it
is still rather helpful if you go ahead and run the command:

# ceph mds set
Invalid command:  missing required parameter
var(max_mds|max_file_size|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags)
mds set 
max_mds|max_file_size|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags
 {} :  set mds parameter  to 
Error EINVAL: invalid command

I suppose we just need a new generation of operators that would never
even try these old deprecated commands ;)

> but they're still handled
> (applied to the "default" filesystem) if called.

Hmm... does it apply if we never set the default fs (though only have one) ?
(How do we even see/get the default fs?)

What happened in our case is that I did `ceph mds set max_mds 1` then
deactivated rank 2. This caused some sort of outage which deadlocked
the mds's (they recovered after restarting). I assume the outage
happened because I deactivated rank 2 while we still had max_mds=2 at
the fs scope (and we had no standbys -- due to the v12.2.2->4 upgrade
breakage).

Thanks John!

Dan

>
> The multi-fs stuff went in for Jewel, so maybe we should think about
> removing the old commands in Mimic: any thoughts Patrick?
>
> John
>
>>
>> Cheers, Dan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Don't use ceph mds set max_mds

2018-03-07 Thread John Spray
On Wed, Mar 7, 2018 at 10:11 AM, Dan van der Ster  wrote:
> Hi all,
>
> What is the purpose of
>
>ceph mds set max_mds 
>
> ?
>
> We just used that by mistake on a cephfs cluster when attempting to
> decrease from 2 to 1 active mds's.
>
> The correct command to do this is of course
>
>   ceph fs set  max_mds 
>
> So, is `ceph mds set max_mds` useful for something? If not, should it
> be removed from the CLI?

It's the legacy version of the command from before we had multiple
filesystems.  Those commands are marked as obsolete internally so that
they're not included in the --help output, but they're still handled
(applied to the "default" filesystem) if called.

The multi-fs stuff went in for Jewel, so maybe we should think about
removing the old commands in Mimic: any thoughts Patrick?

John

>
> Cheers, Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Uneven pg distribution cause high fs_apply_latency on osds with more pgs

2018-03-07 Thread shadow_lin
Hi list,
   Ceph version is jewel 10.2.10 and all osd are using filestore.
The Cluster has 96 osds and 1 pool with size=2 replication with 4096 pg(base on 
pg calculate method from ceph doc for 100pg/per osd).
The osd with the most pg count has 104 PGs and there are 6 osds have above 100 
PGs
Most of the osd have around 7x-9x PGs
The osd with the least pg count has 58 PGs

During the write test some of the osds have very high fs_apply_latency like 
1000ms-4000ms while the normal ones are like 100-600ms. The osds with high 
latency are always the ones with more pg on it.

iostat on the high latency osd shows the hdds are having high %util at about 
95%-96% while the normal ones are having %util at 40%-60%

I think the reason to cause this is because the osds have more pgs need to 
handle more write request to it.Is this right?
But even though the pg distribution is not even but the variation is not that 
much.How could the performance be so sensitive to it?

Is there anything I can do to improve the performance and reduce the latency?

How can I make the pg distribution to be more even?

Thanks


2018-03-07



shadowlin___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Journaling feature causes cluster to have slow requests and inconsistent PG

2018-03-07 Thread Alex Gorbachev
First noticed this problem in our ESXi/iSCSI cluster, but not I can
replicate it in lab with just Ubuntu:

1. Create an image with journaling (and required exclusive-lock) feature

2. Mount the image, make a fs and write a large file to it:

rbd-nbd map matte/scuttle2
/dev/nbd0

mkfs.xfs  /dev/nbd0
mount -t xfs /dev/nbd0 /srv/exports/sclun69
xfs_io -c "extsize 256M" /srv/exports/sclun69

root@lumd1:/var/log# dd if=/dev/zero of=/srv/exports/sclun69/junk
bs=1M count=280
280+0 records in
280+0 records out
293601280 bytes (2.9 TB, 2.7 TiB) copied, 35199.2 s, 83.4 MB/s

3. At some point, slow requests begin.

2018-03-06 22:00:00.000175 mon.lumc1 [INF] overall HEALTH_OK
2018-03-06 22:27:27.945814 mon.lumc1 [WRN] Health check failed: 1 slow
requests are blocked > 32 sec (REQUEST_SLOW)
2018-03-06 22:27:34.406352 mon.lumc1 [WRN] Health check update: 10
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-03-06 22:27:38.496184 mon.lumc1 [INF] Health check cleared:
REQUEST_SLOW (was: 10 slow requests are blocked > 32 sec)
2018-03-06 22:27:38.496215 mon.lumc1 [INF] Cluster is now healthy
2018-03-06 23:00:00.000196 mon.lumc1 [INF] overall HEALTH_OK
2018-03-06 23:29:45.538387 osd.4 [ERR] 12.308 shard 17: soid
12:10dbc229:::rbd_data.39e1022ae8944a.000cd96d:head candidate
had a read error
2018-03-06 23:29:56.937346 mon.lumc1 [ERR] Health check failed: 1
scrub errors (OSD_SCRUB_ERRORS)
2018-03-06 23:29:56.937415 mon.lumc1 [ERR] Health check failed:
Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2018-03-06 23:29:54.835693 osd.4 [ERR] 12.308 deep-scrub 0 missing, 1
inconsistent objects
2018-03-06 23:29:54.835703 osd.4 [ERR] 12.308 deep-scrub 1 errors
2018-03-07 00:00:00.000155 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub
errors; Possible data damage: 1 pg inconsistent
2018-03-07 01:00:00.000201 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub
errors; Possible data damage: 1 pg inconsistent
2018-03-07 02:00:00.000179 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub
errors; Possible data damage: 1 pg inconsistent
2018-03-07 03:00:00.000235 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub
errors; Possible data damage: 1 pg inconsistent


ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)



--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No more Luminous packages for Debian Jessie ??

2018-03-07 Thread Fabian Grünbichler
On Wed, Mar 07, 2018 at 02:04:52PM +0100, Fabian Grünbichler wrote:
> On Wed, Feb 28, 2018 at 10:24:50AM +0100, Florent B wrote:
> > Hi,
> > 
> > Since yesterday, the "ceph-luminous" repository does not contain any
> > package for Debian Jessie.
> > 
> > Is it expected ?
> 
> AFAICT the packages are all there[2], but the Packages file only
> references the ceph-deploy package so apt does not find the rest.
> 
> IMHO this looks like something went wrong when generating the repository
> metadata files - so maybe it's just a question of getting the people who
> maintain the repository to notice this thread ;)
> 

and as alfredo just pointed out on IRC, it has already been fixed!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No more Luminous packages for Debian Jessie ??

2018-03-07 Thread Fabian Grünbichler
On Wed, Feb 28, 2018 at 10:24:50AM +0100, Florent B wrote:
> Hi,
> 
> Since yesterday, the "ceph-luminous" repository does not contain any
> package for Debian Jessie.
> 
> Is it expected ?

AFAICT the packages are all there[2], but the Packages file only
references the ceph-deploy package so apt does not find the rest.

IMHO this looks like something went wrong when generating the repository
metadata files - so maybe it's just a question of getting the people who
maintain the repository to notice this thread ;)

2: http://download.ceph.com/debian-luminous/pool/main/c/ceph/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-07 Thread Jan Pekař - Imatic

On 6.3.2018 22:28, Gregory Farnum wrote:
On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic > wrote:


Hi all,

I have few problems on my cluster, that are maybe linked together and
now caused OSD down during pg repair.

First few notes about my cluster:

4 nodes, 15 OSDs installed on Luminous (no upgrade).
Replicated pools with 1 pool (pool 6) cached by ssd disks.
I don't detect any hardware failures (disk IO errors, restarts,
corrupted data etc).
I'm running RBDs using libvirt on debian wheezy and jessie (stable and
oldstable).
I'm snapshotting RBD's using Luminous client on Debian Jessie only.


When you say "cached by", do you mean there's a cache pool? Or are you 
using bcache or something underneath?


I mean cache pool.




Now problems, from light to severe:

1)
Almost every day I notice health some problems after deep scrub
1-2 inconsistent PG's with "read_error" on some osd's.
When I don't repair it, it disappears after few days (? another deep
scrub). There is no read_error on disks (disk check ok, no errors logged
in syslog).


2)
I noticed on my pool 6 (cached pool), that scrub reports some objects,
that shouldn't be there:

2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20(
v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299
ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710)
[1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod
8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps
no head for 6:07ffbc7b:::rbd_data.967992ae8944a.00061cb8:c2
(have MIN)

I think, that means orphaned snap object without his head replica. Maybe
snaptrim left it there? Why? Maybe error during snaptrim? Or
fstrim/discard removed "head" object (this is I hope nonsense)?

3)
I ended with one object (probably snap object), that has only 1 replica
(out from size 3) and when I try to repair it, my OSD crash with

/build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
recovery_info.ss.clone_snaps.end())
I guess, that it detected orphaned snap object I noticed at 2) and don't
repair it, just assterts and stop OSD. Am I right?

I noticed comment "// hmm, should we warn?" on ceph source at that
assert code. So should someone remove that assert?


There's a ticket https://tracker.ceph.com/issues/23030, which links to a 
much longer discussion on this mailing list between Sage and Stefan 
which discusses this particular assert. I'm not entirely clear from the 
rest of your story (and the lng history in that thread) if there are 
other potential causes, or if your story might help diagnose it. But I'd 
start there since AFAIK it's still a mystery that looks serious but has 
only a very small number of incidences. :/

-Greg


Thank you, I will go through it, but it looks not to be related to my issue.
Now I added new disk to cluster, upgraded to 12.2.4. on some nodes and 
so far no scrub errors.
If there is no clear answer to my OSD crash, I will try to wipe OSD 
containing problematic object (that causes my primary OSD fail), and 
rebuild it from other copies. Hope without that object, crash will not 
appear.
If you think, that osdmaptool is safe for fuse-mount and delete object 
by hand, I can try it. But I'm looking for some tool to do that "online" 
on all PG copies and with checks (that snap object is not referenced 
somewhere).


With regards
Jan Pekar




And my questions are

How can I fix issue with crashing OSD?
How can I safely remove that objects with missing head? Is there any
tool or force-snaptrim on non-existent snapshots? It is prod cluster so
I want to be careful. I have no problems now with data availability.
My last idea is to move RBD's to another pool, but have not enough space
to do that (as I know RBD can only copy not move) so I'm looking for
another clean solution.
And last question - how can I find, what is causing that read_erros and
snap object leftovers?

Should I paste my whole log? It is bigger than allowed post size.
Pasting most important events:

     -23> 2018-02-27 23:43:07.903368 7f4b3820e700  2 osd.1 pg_epoch:
8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20
repair 1 missing, 0 inconsistent objects
     -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1
log_channel(cluster)
log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects
     -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1
log_channel(cluster)
log [ERR] : 6.20 repair 3 errors, 2 fixed
   

Re: [ceph-users] No more Luminous packages for Debian Jessie ??

2018-03-07 Thread Sean Purdy
On Wed,  7 Mar 2018, Wei Jin said:
> Same issue here.
> Will Ceph community support Debian Jessie in the future?

Seems odd to stop it right in the middle of minor point releases.  Maybe it was 
an oversight?  Jessie's still supported in Debian as oldstable and not even in 
LTS yet.


Sean

 
> On Mon, Mar 5, 2018 at 6:33 PM, Florent B  wrote:
> > Jessie is no more supported ??
> > https://download.ceph.com/debian-luminous/dists/jessie/main/binary-amd64/Packages
> > only contains ceph-deploy package !
> >
> >
> > On 28/02/2018 10:24, Florent B wrote:
> >> Hi,
> >>
> >> Since yesterday, the "ceph-luminous" repository does not contain any
> >> package for Debian Jessie.
> >>
> >> Is it expected ?
> >>
> >> Thank you.
> >>
> >> Florent
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No more Luminous packages for Debian Jessie ??

2018-03-07 Thread Wei Jin
Same issue here.
Will Ceph community support Debian Jessie in the future?

On Mon, Mar 5, 2018 at 6:33 PM, Florent B  wrote:
> Jessie is no more supported ??
> https://download.ceph.com/debian-luminous/dists/jessie/main/binary-amd64/Packages
> only contains ceph-deploy package !
>
>
> On 28/02/2018 10:24, Florent B wrote:
>> Hi,
>>
>> Since yesterday, the "ceph-luminous" repository does not contain any
>> package for Debian Jessie.
>>
>> Is it expected ?
>>
>> Thank you.
>>
>> Florent
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Don't use ceph mds set max_mds

2018-03-07 Thread Dan van der Ster
Hi all,

What is the purpose of

   ceph mds set max_mds 

?

We just used that by mistake on a cephfs cluster when attempting to
decrease from 2 to 1 active mds's.

The correct command to do this is of course

  ceph fs set  max_mds 

So, is `ceph mds set max_mds` useful for something? If not, should it
be removed from the CLI?

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

2018-03-07 Thread shadow_lin
What you said make sense.
I have encountered a few hardware related issue that caused one osd to work 
abnormal and blocked all io of the whole cluster(all osd in one pool) which 
makes me think how to avoid this situation.

2018-03-07 

shadow_lin 



发件人:David Turner 
发送时间:2018-03-07 13:51
主题:Re: Re: [ceph-users] Why one crippled osd can slow down or block all request 
to the whole ceph cluster?
收件人:"shadow_lin"
抄送:"ceph-users"

Marking osds down is not without risks. You are taking away one of the copies 
of data for every PG on that osd. Also you are causing every PG on that osd to 
peer. If that osd comes back up, every PG on it again needs to peer and then 
they need to recover.


That is a lot of load and risks to automate into the system. Now let's take 
into consideration other causes of slow requests like having more IO load than 
your spindle can handle, backfilling settings set to aggressively (related to 
the first option), or networking problems. If the mon is detecting slow 
requests on OSDs and marking them down, you could end up marking half of your 
cluster down or causing corrupt data by flapping OSDs.


The mon will mark osds down if those settings I mentioned are met. If the osd 
isn't unresponsive enough to not respond to other OSDs or the mons, then there 
really isn't much that ceph can do to automate this safely. There are just so 
many variables. If ceph was a closed system on specific hardware, it could 
certainly be monitoring that hardware closely for early warning signs... But 
people are running Ceph on everything they can compile it for including 
raspberry pis. The cluster admin, however, should be able to add their own 
early detection for failures.


You can monitor a lot about disks including things such as average await in a 
host to see if the disks are taking longer than normal to respond. That 
particular check led us to find that we had several storage nodes with bad 
cache batteries on the controllers. Finding that explained some slowness we had 
noticed in the cluster. It also led us to a better method to catch that 
scenario sooner.


On Tue, Mar 6, 2018, 11:22 PM shadow_lin  wrote:

Hi Turner,
Thanks for your insight.
I am wondering if the mon can detect slow/blocked request from certain osd why 
can't mon mark a osd with blocked request down if the request is blocked for a 
certain time.

2018-03-07 

shadow_lin 



发件人:David Turner 
发送时间:2018-03-06 23:56
主题:Re: [ceph-users] Why one crippled osd can slow down or block all request to 
the whole ceph cluster?
收件人:"shadow_lin"
抄送:"ceph-users"

There are multiple settings that affect this.  osd_heartbeat_grace is probably 
the most apt.  If an OSD is not getting a response from another OSD for more 
than the heartbeat_grace period, then it will tell the mons that the OSD is 
down.  Once mon_osd_min_down_reporters have told the mons that an OSD is down, 
then the OSD will be marked down by the cluster.  If the OSD does not then talk 
to the mons directly to say that it is up, it will be marked out after 
mon_osd_down_out_interval is reached.  If it does talk to the mons to say that 
it is up, then it should be responding again and be fine. 


In your case where the OSD is half up, half down... I believe all you can 
really do is monitor your cluster and troubleshoot OSDs causing problems like 
this.  Basically every storage solution is vulnerable to this.  Sometimes an 
OSD just needs to be restarted due to being in a bad state somehow, or simply 
removed from the cluster because the disk is going bad.


On Sun, Mar 4, 2018 at 2:28 AM shadow_lin  wrote:

Hi list,
During my test of ceph,I find sometime the whole ceph cluster are blocked and 
the reason was one unfunctional osd.Ceph can heal itself if some osd is down, 
but it seems if some osd is half dead (have heart beat but can't handle 
request) then all the request which are directed to that osd would be blocked. 
If all osds are in one pool and the whole cluster would be blocked due to that 
one hanged osd.
I think this is because ceph will try to distribute the request to all osds and 
if one of the osd wont confirm the request is done then everything is blocked.
Is there a way to let ceph to mark the the crippled osd down if the requests 
direct to that osd are blocked more than certain time to avoid the whole 
cluster is blocked?

2018-03-04


shadow_lin 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Delete a Pool - how hard should be?

2018-03-07 Thread Max Cuttins

Il 06/03/2018 16:23, David Turner ha scritto:
That said, I do like the idea of being able to disable buckets, rbds, 
pools, etc so that no client could access them. That is useful for 
much more than just data deletion and won't prevent people from 
deleting data prematurely.


To me, if nobody can access data for 30 days and the customer didn't 
call me within those days, it's ok to delete definitly the data.

Which is the way should be.
Make easy to the admin delete data when he really wants.
Make possible to the user to stay some days without it's data till these 
data is obsolete and useless.
The autopurge of the trash of your mailbox works in the sameway and 
seems to me a reasonable way to handle precious data such personal emails.


It could be added as a requisite step to deleting a pool, rbd, etc. 
The process would need to be refactored as adding another step isn't 
viable.
This feature is much more complicated than it may seem on the surface. 
For pools, you could utilize cephx, except not everyone uses that... 
So maybe logic added to the osd map. Buckets would have to be 
completely in rgw. Rbds would probably have to be in the osd map as 
well. This is not a trivial change.


Mine was just a "/nice-to-have/" proposal.
There is no hurry in implement a secondary feature such this one.

About the logic is it possible to use something like this:

 * snapshot the pool with a special poolname
 * remove the original pool
 * give the possibility to restore the snapshot with it's original name.

I think this should suddenly stop all the connection to the original 
pool but leave all the data intact.

Maybe.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com