[ceph-users] Re: ACL for user in another teant

2020-05-14 Thread Vishwas Bm
Hi Prita,

Thanks for the response.  Yes, with boto package I was able to access the
bucket content.

*Thanks & Regards,*

*Vishwas *


On Thu, May 14, 2020 at 9:32 PM Pritha Srivastava 
wrote:

> Hi Vishwas,
>
> In the following bucket policy:
> Policy:{
>   "Version": "2012-10-17",
>   "Statement": [
> {
>   "Principal": {"AWS": ["arn:aws:iam::tenant1:user/Tom"]},
>   "Action": ["s3:ListBucket"],
>   "Effect": "Allow",
>   "Resource": "s3://tenant2/jerry-bucket"
> }
>   ]
> }
> 'Resource' should follow the AWS ARN format
> (arn:aws:s3::tenant2:jerry-bucket)
>
> Also, you won't be able to pass in a tenant name with bucket name using
> s3cmd. You can use boto for the same with bucket names of the format
> 'tenant:bucket' and disable bucket name validation using
> s3client.meta.events.unregister('before-parameter-build.s3',
> validate_bucket_name, if you plan to use boto3.
>
> Thanks,
> Pritha
>
> On Thu, May 14, 2020 at 2:01 PM Vishwas Bm  wrote:
>
>> When I tried as below also, similar error is coming:
>>
>> [root@vishwas-test cluster]# s3cmd --access_key=GY40PHWVK40A2G4XQH2D
>> --secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls
>> s3://tenant2/jerry-bucket
>> ERROR: Bucket 'tenant2' does not exist
>> ERROR: S3 error: 404 (NoSuchBucket)
>>
>>
>> [root@vishwas-test cluster]# s3cmd  --access_key=GY40PHWVK40A2G4XQH2D
>> --secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls
>> s3://tenant2:jerry-bucket
>> ERROR: S3 error: 403 (SignatureDoesNotMatch)
>>
>>
>> *Thanks & Regards,*
>>
>> *Vishwas *
>>
>>
>> On Thu, May 14, 2020 at 1:54 PM Vishwas Bm  wrote:
>>
>>> Hi Pritha,
>>>
>>> Thanks for the reply. Please find the user list, bucket list and also
>>> the command which I have used.
>>>
>>> [root@vishwas-test cluster]# radosgw-admin user list
>>> [
>>> "tenant2$Jerry",
>>> "tenant1$Tom"
>>> ]
>>>
>>> [root@vishwas-test cluster]# radosgw-admin bucket list
>>> [
>>> "tenant2/jerry-bucket"
>>> ]
>>>
>>> [root@vishwas-test cluster]# s3cmd info
>>> --access_key=HVTKORMH8LLDF76TKQGI
>>> --secret_key=9XFcvgMm4yBncA8D9SguEMVSBsUkhuuRLSbyuUPp s3://jerry-bucket
>>> s3://jerry-bucket/ (bucket):
>>>Location:  default
>>>Payer: BucketOwner
>>>Expiration Rule: none
>>>Policy:{
>>>   "Version": "2012-10-17",
>>>   "Statement": [
>>> {
>>>   "Principal": {"AWS": ["arn:aws:iam::tenant1:user/Tom"]},
>>>   "Action": ["s3:ListBucket"],
>>>   "Effect": "Allow",
>>>   "Resource": "s3://tenant2/jerry-bucket"
>>> }
>>>   ]
>>> }
>>>CORS:  none
>>>ACL:   Jerry: FULL_CONTROL
>>>
>>>
>>> When I try to list using Tom access keys, I get below error:
>>> [root@vishwas-test cluster]# s3cmd --access_key=GY40PHWVK40A2G4XQH2D
>>> --secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls s3://jerry-bucket
>>>
>>> *ERROR: Bucket 'jerry-bucket' does not existERROR: S3 error: 404
>>> (NoSuchBucket)*
>>>
>>>
>>> *Thanks & Regards,*
>>>
>>> *Vishwas *
>>>
>>>
>>> On Thu, May 14, 2020 at 11:54 AM Pritha Srivastava 
>>> wrote:
>>>
 Hi Vishwas,

 Bucket policy should let you access buckets in another tenant.
 What exact command are you using?

 Thanks,
 Pritha

 On Thursday, May 14, 2020, Vishwas Bm  wrote:

> > Hi,
> >
> > I have two users both belong to different tenant.
> >
> > Can I give permission for the user in another tenant to access the
> bucket
> > using setacl or setPolicy command ?
> > I tried the setacl command and setpolicy command, but it was not
> working ?
> > It used to say bucket not found, when the grantee tried to access.
> >
> > Is this supported ?
> >
> > *Thanks & Regards,*
> > *Vishwas *
> >
>
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] stale+active+clean PG

2020-05-14 Thread tomislav . raseta
Dear all,

We're running Ceph Luminous and we've recently hit an issue with some OSD's 
(autoout states, IO/CPU overload) which unfortunately resulted with one 
placement group with the state "stale+active+clean", it's a placement group 
from .rgw.root pool:

1.15  0  00 0   0  0 1  
  1stale+active+clean 2020-05-11 
23:22:51.396288 40'1  2142:152 [3,2,6]  3 [3,2,6]   
   340'1 2020-04-22 00:46:05.90441840'1 2020-04-20 
20:18:13.371396 0 

I guess there is no active replica of that object anywhere on the cluster. 
Restarting osd.3, osd.2 or osd.6 daemons does not help.

I've used ceph-objectstore-tool and successfully exported placement group from 
osd.3, osd.2 and osd.6 and tried to import it on a completely different OSD, 
the exports differ in filesize slightly, but the osd.3 wihch was the latest 
primary is the biggest so I've tried to import it on a different OSD, when 
starting up I see the following (this is from osd.1):
2020-05-14 21:43:19.779740 7f7880ac3700  1 osd.1 pg_epoch: 2459 pg[1.15( v 40'1 
(0'0,40'1] local-lis/les=2073/2074 n=0 ec=73/39 lis/c 2073/2073 les/c/f 
2074/2074/633 2145/39/2145) [] r=-1 lpr=2455 crt=40'1 lcod 0'0 unknown NOTIFY] 
state: transitioning to Stray

I see from previous pg dumps (several weeks before while it was still 
active+clean) that it was 115 bytes with zero objects in it but I am not sure 
how to interpret that.

As this is a pg from .rgw.root pool, I cannot get any response from the cluster 
when accessing (everything timeouts).

What is the correct course of action with this pg?

Any help would be greatly appriciated.

Thanks,
Tomislav
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using rbd-mirror in existing pools

2020-05-14 Thread Anthony D'Atri
Understandable concern.

FWIW I’ve used rbd-mirror to move thousands of volumes between clusters with 
zero clobbers.
 —aad


> On May 14, 2020, at 9:46 AM, Kees Meijs | Nefos  wrote:
> 
> My main concern is pulling images into a non-empty pool. It would be
> (very) bad if rbd-mirror tries to be smart and removes images that don't
> exist in the source pool.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using rbd-mirror in existing pools

2020-05-14 Thread Kees Meijs | Nefos
Thanks for clearing that up, Jason.

K.

On 14-05-2020 20:11, Jason Dillaman wrote:
> rbd-mirror can only remove images that (1) have mirroring enabled and
> (2) are not split-brained with its peer. It's totally fine to only
> mirror a subset of images within a pool and it's fine to only mirror
> one-way.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using rbd-mirror in existing pools

2020-05-14 Thread Jason Dillaman
On Thu, May 14, 2020 at 12:47 PM Kees Meijs | Nefos  wrote:

> Hi Anthony,
>
> A one-way mirror suits fine in my case (the old cluster will be
> dismantled in mean time) so I guess a single rbd-mirror daemon should
> suffice.
>
> The pool consists of OpenStack Cinder volumes containing a UUID (i.e.
> volume-ca69183a-9601-11ea-8e82-63973ea94e82 and such). The change of
> conflicts is near to zero.
>
> My main concern is pulling images into a non-empty pool. It would be
> (very) bad if rbd-mirror tries to be smart and removes images that don't
> exist in the source pool.
>

rbd-mirror can only remove images that (1) have mirroring enabled and (2)
are not split-brained with its peer. It's totally fine to only mirror a
subset of images within a pool and it's fine to only mirror one-way.


>
> Regards and thanks again,
> Kees
>
> On 14-05-2020 17:41, Anthony D'Atri wrote:
> > When you set up the rbd-mirror daemons with each others’ configs, and
> initiate mirroring of a volume, the destination will create the volume in
> the destination cluster and pull over data.
> >
> > Hopefully you’re creating unique volume names so there won’t be
> conflicts, but that said if the destination has a collision, it won’t be
> overwritten.
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using rbd-mirror in existing pools

2020-05-14 Thread Kees Meijs | Nefos
Hi Anthony,

A one-way mirror suits fine in my case (the old cluster will be
dismantled in mean time) so I guess a single rbd-mirror daemon should
suffice.

The pool consists of OpenStack Cinder volumes containing a UUID (i.e.
volume-ca69183a-9601-11ea-8e82-63973ea94e82 and such). The change of
conflicts is near to zero.

My main concern is pulling images into a non-empty pool. It would be
(very) bad if rbd-mirror tries to be smart and removes images that don't
exist in the source pool.

Regards and thanks again,
Kees

On 14-05-2020 17:41, Anthony D'Atri wrote:
> When you set up the rbd-mirror daemons with each others’ configs, and 
> initiate mirroring of a volume, the destination will create the volume in the 
> destination cluster and pull over data.
>
> Hopefully you’re creating unique volume names so there won’t be conflicts, 
> but that said if the destination has a collision, it won’t be overwritten.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ACL for user in another teant

2020-05-14 Thread Pritha Srivastava
Hi Vishwas,

In the following bucket policy:
Policy:{
  "Version": "2012-10-17",
  "Statement": [
{
  "Principal": {"AWS": ["arn:aws:iam::tenant1:user/Tom"]},
  "Action": ["s3:ListBucket"],
  "Effect": "Allow",
  "Resource": "s3://tenant2/jerry-bucket"
}
  ]
}
'Resource' should follow the AWS ARN format
(arn:aws:s3::tenant2:jerry-bucket)

Also, you won't be able to pass in a tenant name with bucket name using
s3cmd. You can use boto for the same with bucket names of the format
'tenant:bucket' and disable bucket name validation using
s3client.meta.events.unregister('before-parameter-build.s3',
validate_bucket_name, if you plan to use boto3.

Thanks,
Pritha

On Thu, May 14, 2020 at 2:01 PM Vishwas Bm  wrote:

> When I tried as below also, similar error is coming:
>
> [root@vishwas-test cluster]# s3cmd --access_key=GY40PHWVK40A2G4XQH2D
> --secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls
> s3://tenant2/jerry-bucket
> ERROR: Bucket 'tenant2' does not exist
> ERROR: S3 error: 404 (NoSuchBucket)
>
>
> [root@vishwas-test cluster]# s3cmd  --access_key=GY40PHWVK40A2G4XQH2D
> --secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls
> s3://tenant2:jerry-bucket
> ERROR: S3 error: 403 (SignatureDoesNotMatch)
>
>
> *Thanks & Regards,*
>
> *Vishwas *
>
>
> On Thu, May 14, 2020 at 1:54 PM Vishwas Bm  wrote:
>
>> Hi Pritha,
>>
>> Thanks for the reply. Please find the user list, bucket list and also the
>> command which I have used.
>>
>> [root@vishwas-test cluster]# radosgw-admin user list
>> [
>> "tenant2$Jerry",
>> "tenant1$Tom"
>> ]
>>
>> [root@vishwas-test cluster]# radosgw-admin bucket list
>> [
>> "tenant2/jerry-bucket"
>> ]
>>
>> [root@vishwas-test cluster]# s3cmd info
>> --access_key=HVTKORMH8LLDF76TKQGI
>> --secret_key=9XFcvgMm4yBncA8D9SguEMVSBsUkhuuRLSbyuUPp s3://jerry-bucket
>> s3://jerry-bucket/ (bucket):
>>Location:  default
>>Payer: BucketOwner
>>Expiration Rule: none
>>Policy:{
>>   "Version": "2012-10-17",
>>   "Statement": [
>> {
>>   "Principal": {"AWS": ["arn:aws:iam::tenant1:user/Tom"]},
>>   "Action": ["s3:ListBucket"],
>>   "Effect": "Allow",
>>   "Resource": "s3://tenant2/jerry-bucket"
>> }
>>   ]
>> }
>>CORS:  none
>>ACL:   Jerry: FULL_CONTROL
>>
>>
>> When I try to list using Tom access keys, I get below error:
>> [root@vishwas-test cluster]# s3cmd --access_key=GY40PHWVK40A2G4XQH2D
>> --secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls s3://jerry-bucket
>>
>> *ERROR: Bucket 'jerry-bucket' does not existERROR: S3 error: 404
>> (NoSuchBucket)*
>>
>>
>> *Thanks & Regards,*
>>
>> *Vishwas *
>>
>>
>> On Thu, May 14, 2020 at 11:54 AM Pritha Srivastava 
>> wrote:
>>
>>> Hi Vishwas,
>>>
>>> Bucket policy should let you access buckets in another tenant.
>>> What exact command are you using?
>>>
>>> Thanks,
>>> Pritha
>>>
>>> On Thursday, May 14, 2020, Vishwas Bm  wrote:
>>>
 > Hi,
 >
 > I have two users both belong to different tenant.
 >
 > Can I give permission for the user in another tenant to access the
 bucket
 > using setacl or setPolicy command ?
 > I tried the setacl command and setpolicy command, but it was not
 working ?
 > It used to say bucket not found, when the grantee tried to access.
 >
 > Is this supported ?
 >
 > *Thanks & Regards,*
 > *Vishwas *
 >

 >
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-14 Thread Anthony D'Atri
What is a saturated network with modern switched technologies?  Links to 
individual hosts?  Uplinks from TORS (public)?  Switch backplane (cluster)?


> That is correct.I didn't explain it clearly. I said that is because in
> some write only scenario  the public network and cluster network will
> all be saturated the same time.
> linyunfan
> 
> Janne Johansson  于2020年5月14日周四 下午3:42写道:
>> 
>> Den tors 14 maj 2020 kl 08:42 skrev lin yunfan :
>>> 
>>> Besides the recoverry  scenario , in a write only scenario the cluster
>>> network will use the almost the same bandwith as public network.
>> 
>> 
>> That would depend on the replication factor. If it is high, I would assume 
>> every MB from the client network would make (repl-factor - 1) times the data 
>> on the private network to send replication requests to the other OSD hosts 
>> with the same amount of data.
>> 
>> --
>> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph meltdown, need help

2020-05-14 Thread Frank Schilder
Dear Marc,

thank you for your endurance. I had another slightly different "meltdown", this 
time throwing the MGRs out and I adjusted yet another beacon grace time. 
Fortunately, after your communication, I didn't need to look very long.

To harden our cluster a bit further, I would like to adjust a number of 
advanced parameters I found after your hints. I would be most grateful if you 
(or anyone else receiving this) still have enough endurance left and could 
check whether what I want to do makes sense and if the choices I suggest will 
achieve what I want.

Parameters with section of documentation, default in "{}", current value plain, 
new value prefixed with "*". There is an error in the documentation, please let 
me know if my interpretation is correct.


MON-MGR beacon adjustments
--
https://docs.ceph.com/docs/mimic/mgr/administrator/

mon mgr beacon grace {30}  300

This helped mitigating the second type of meltdown. I took 2 times the longest 
observed "mon slow op" time to be safe (MGR beacon handling was slow op). Our 
MGRs are no longer thrown out in case of the incident (see very end for more 
info).


MON-OSD communication adjustments

https://docs.ceph.com/docs/mimic/rados/configuration/mon-osd-interaction/

osd beacon report interval {300}300
mon osd report timeout {900}3600
mon osd min down reporters {2} *3
mon osd reporter subtree level {host}  *datacenter
mon osd down out subtree limit {rack}  *host

"mon osd report timeout" is increased after your recommendation. It is set to a 
really high value as I don't see this critical for fail-over (the default 
time-out suggests that this is merely for clean-up and not essential for 
healthy I/O). OSDs are no longer thrown out in case of the incident (see very 
end for more info).

"down reporter options": We have 3 sites (sub-clusters) under region in our 
crush map (see below). Each of these regions can be considered "equally laggy" 
as described in the documentation. I do not want a laggy site to mark down OSDs 
from another (healthy) site without a single OSD of the other site confirming 
an issue. I would like to require that at least 1 OSD from each site needs to 
report an OSD down before something happens. Does "3" and "datacenter" achieve 
what I want? Is this a reasonable choice with our crush map?

Note that, as a speciality, DC2 currently links to some hosts of DC3 (to change 
in the future).

"mon osd down out subtree limit": A host in our cluster is currently the atomic 
unit which, if it goes down, should not trigger rebalancing on the cluster as 
this indicates a server and not a disk fail. In addition, if I understand it 
correctly, this will also act as an automatic "noout" on host level if, for 
example, a host gets rebooted.


mon osd laggy *

I saw tuning parameters for laggy OSDs. However, our incidents happen very 
sporadically and are extremely radical. I do not think that any reasonable 
estimator will be able to handle that. So my working hypothesis is, that I 
should not touch these.


Error in documentation


https://docs.ceph.com/docs/mimic/rados/configuration/mon-osd-interaction/#osds-report-their-status

osd_mon_report_interval_max {Error ENOENT:}
osd beacon report interval

The documentation mentions "osd mon report interval max", which doesn't exist. 
However "osd beacon report interval" exists but is not mentioned. I assume the 
second replaced the first?


Condensed crush tree


region R1
datacenter DC1
room DC1-R1
host ceph-08host ceph-09host ceph-10
host ceph-11
host ceph-12host ceph-13host ceph-14
host ceph-15
host ceph-16host ceph-17
datacenter DC2
host ceph-04host ceph-05host ceph-06host ceph-07
host ceph-18host ceph-19host ceph-20datacenter DC3
room DC3-R1
host ceph-04host ceph-05host ceph-06
host ceph-07
host ceph-18host ceph-19host ceph-20
host ceph-21
host ceph-22

Additional info about our meltdowns:

With "mon mgr beacon grace" and "mon osd report timeout" set to really high 
values, I finally managed to isolate a signal in our recordings that is 
connected with these strange incidents. It looks like a package storm is 
hitting exactly two MON+MGR nodes, leading to beacon time-outs with default 
settings. I will not continue this here, but rather prepare another thread 
"Cluster outage due to client IO" after checking network hardware. It looks as 
if two MON+MGR nodes are desperately trying to talk to each other but fail.

And this after only 1.5 years of relationship :)

Thanks for making it a second time!

Best regards,
=
Frank Schilder
AIT Risø Ca

[ceph-users] Re: Using rbd-mirror in existing pools

2020-05-14 Thread Anthony D'Atri


When you set up the rbd-mirror daemons with each others’ configs, and initiate 
mirroring of a volume, the destination will create the volume in the 
destination cluster and pull over data.

Hopefully you’re creating unique volume names so there won’t be conflicts, but 
that said if the destination has a collision, it won’t be overwritten.

> 
> Hi list,
> 
> Thanks again for pointing me towards rbd-mirror!
> 
> I've read documentation, old mailing list posts, blog posts and some
> additional guides. Seems like the tool to help me through my data migration.
> 
> Given one-way synchronisation and image-based (so, not pool based)
> configuration, it's still unclear to me how the mirroring will cope with
> an existing target pool, already consisting of (a lot of) images.
> 
> Has someone done this already? It feels quite scary with doom scenario's
> like "cleaning up" the target pool and such in mind...
> 
> To sum up: my goal is to mirror clusterA/somepool with some specific
> images to clusterB/someotherpool where already other images reside. The
> mirrored images should be kept in sync and the other images should be
> left alone completely.
> 
> Cheers,
> Kees
> 
> -- 
> https://nefos.nl/contact
> 
> Nefos IT bv
> Ambachtsweg 25 (industrienummer 4217)
> 5627 BZ Eindhoven
> Nederland
> 
> KvK 66494931
> 
> /Aanwezig op maandag, dinsdag, woensdag en vrijdag/
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-ansible replicated crush rule

2020-05-14 Thread Marc Boisis
Hello,

With ceph-ansible the default replicated crush rule is :
{
   "rule_id": 0,
   "rule_name": "replicated_rule",
   "ruleset": 0,
   "type": 1,
   "min_size": 1,
   "max_size": 10,
   "steps": [
   {
   "op": "take",
   "item": -1,
   "item_name": "default"
   },
   {
   "op": "chooseleaf_firstn",
   "num": 0,
   "type": "host"
   },
   {
   "op": "emit"
   }
   ]
   }

And I would like to have this:
{
   "rule_id": 0,
   "rule_name": "replicated_rule",
   "ruleset": 0,
   "type": 1,
   "min_size": 2,
   "max_size": 4,
   "steps": [
   {
   "op": "take",
   "item": -1,
   "item_name": "default"
   },
   {
   "op": "chooseleaf_firstn",
   "num": 0,
   "type": "rack"
   },
   {
   "op": "emit"
   }
   ]
   }

How can I do this within ceph-ansible playbook ?
I try :
crush_rule_replicated:
 name: replicated_rule
 root: default
 ruleset: 0
 type: replicated
 min_size: 2
 max_size: 4
 step: 
   - take default
   - chooseleaf firstn 0 type rack
   - emit
 default: true

crush_rules:
 - "{{ crush_rule_replicated }}" 


task path: roles/ceph-osd/tasks/crush_rules.yml:32 try to change it but it fail:
stdout_lines:
   - ''
   - 
'{"rule_id":0,"rule_name":"replicated_rule","ruleset":0,"type":1,"min_size":1,"max_size":10,"steps":[{"op":"take","item":-1,"item_name":"default"},{"op":"chooseleaf_firstn","num":0,"type":"host"},{"op":"emit"}]}'

Anyone have an idea ?
Regards

Marc
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using rbd-mirror in existing pools

2020-05-14 Thread Eugen Block
The pool names in both clusters have to be identical in addition to  
the required journal feature. It’s probably an advantage if the  
existing pool in the second cluster has a different name. In that case  
you can set up the mirror for a new pool without affecting the other  
pool and after mirroring has completed move the images into the  
destination pool.


Zitat von Zhenshi Zhou :


As a matter of my experience, rbd-mirror only copy the images with
journaling feature of clusterA to clusterB. It doesn't influence the other
images in the pool of clusterB. You'd better have a test on it.

Kees Meijs | Nefos  于2020年5月14日周四 下午10:22写道:


Hi list,

Thanks again for pointing me towards rbd-mirror!

I've read documentation, old mailing list posts, blog posts and some
additional guides. Seems like the tool to help me through my data
migration.

Given one-way synchronisation and image-based (so, not pool based)
configuration, it's still unclear to me how the mirroring will cope with
an existing target pool, already consisting of (a lot of) images.

Has someone done this already? It feels quite scary with doom scenario's
like "cleaning up" the target pool and such in mind...

To sum up: my goal is to mirror clusterA/somepool with some specific
images to clusterB/someotherpool where already other images reside. The
mirrored images should be kept in sync and the other images should be
left alone completely.

Cheers,
Kees

--
https://nefos.nl/contact

Nefos IT bv
Ambachtsweg 25 (industrienummer 4217)
5627 BZ Eindhoven
Nederland

KvK 66494931

/Aanwezig op maandag, dinsdag, woensdag en vrijdag/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
Eugen Block voice   : +49 40 5595175
NDE Netzdesign und -entwicklung AG  fax : +49 40 5595177
Postfach 61 03 15   e-mail  : usern...@nde.ag
D-22423 Hamburg

 Vertretungsberechtigter Vorstand: Jens-U. Mozdzen
  Vorsitzende des Aufsichtsrats: Angelika Torlée-Mozdzen
   Sitz und Registergericht: Hamburg, HRB 90934
  USt-IdNr: DE 814 013 983
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using rbd-mirror in existing pools

2020-05-14 Thread Zhenshi Zhou
As a matter of my experience, rbd-mirror only copy the images with
journaling feature of clusterA to clusterB. It doesn't influence the other
images in the pool of clusterB. You'd better have a test on it.

Kees Meijs | Nefos  于2020年5月14日周四 下午10:22写道:

> Hi list,
>
> Thanks again for pointing me towards rbd-mirror!
>
> I've read documentation, old mailing list posts, blog posts and some
> additional guides. Seems like the tool to help me through my data
> migration.
>
> Given one-way synchronisation and image-based (so, not pool based)
> configuration, it's still unclear to me how the mirroring will cope with
> an existing target pool, already consisting of (a lot of) images.
>
> Has someone done this already? It feels quite scary with doom scenario's
> like "cleaning up" the target pool and such in mind...
>
> To sum up: my goal is to mirror clusterA/somepool with some specific
> images to clusterB/someotherpool where already other images reside. The
> mirrored images should be kept in sync and the other images should be
> left alone completely.
>
> Cheers,
> Kees
>
> --
> https://nefos.nl/contact
>
> Nefos IT bv
> Ambachtsweg 25 (industrienummer 4217)
> 5627 BZ Eindhoven
> Nederland
>
> KvK 66494931
>
> /Aanwezig op maandag, dinsdag, woensdag en vrijdag/
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is a pgmap?

2020-05-14 Thread Frank Schilder
Unfortunately, my e-mail client does not collect threads properly.

Think I got my answer.

Form Janne Johansson:
> Since using computer time and date is fraught with peril, having the whole
> cluster just bump that single number every second (and writing it to the PG
> on each write) would allow a mostly idle PG that comes back after an hour
> of unexpected downtime to easily know if it needs no recovery, a little bit
> of delta to get up-to-date or a full copy from the primary in order to
> become a part of the replica set for that PG.

So an increase every second is expected.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 14 May 2020 12:37
To: Nghia Viet Tran; Bryan Henderson; Ceph users mailing list
Subject: [ceph-users] Re: What is a pgmap?

Hi, I also observe an increase in pgmap version every second or so, see snippet 
below. I run mimic 13.2.8 without any PG scaling/upmapping. Why does the 
version increase so often?

May 14 12:33:50 ceph-03 journal: cluster 2020-05-14 12:33:48.521546 mgr.ceph-02 
mgr.27460080 192.168.32.66:0/63 114833 : cluster [DBG] pgmap v114860: 2545 pgs: 
2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 
1.5 PiB / 1.8 PiB avail; 4.8 MiB/s rd, 11 MiB/s wr, 1.48 kop/s

May 14 12:33:50 ceph-02 journal: 2020-05-14 12:33:50.543 7fdb57c5b700  0 
log_channel(cluster) log [DBG] : pgmap v114861: 2545 pgs: 2 
active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 
PiB / 1.8 PiB avail; 5.6 MiB/s rd, 11 MiB/s wr, 1.21 kop/s

May 14 12:33:52 ceph-02 journal: 2020-05-14 12:33:52.565 7fdb57c5b700  0 
log_channel(cluster) log [DBG] : pgmap v114862: 2545 pgs: 2 
active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 
PiB / 1.8 PiB avail; 8.9 MiB/s rd, 16 MiB/s wr, 1.59 kop/s

The version increases every second, here from pgmap v114860 to  pgmap v114862. 
Current cluster status:

[root@gnosis]# ceph status
  cluster:
id: ---
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-02(active), standbys: ceph-01, ceph-03
mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
osd: 288 osds: 268 up, 268 in

  data:
pools:   10 pools, 2545 pgs
objects: 80.80 M objects, 195 TiB
usage:   249 TiB used, 1.5 PiB / 1.8 PiB avail
pgs: 2543 active+clean
 2active+clean+scrubbing+deep

  io:
client:   20 MiB/s rd, 21 MiB/s wr, 578 op/s rd, 1.08 kop/s wr

Thanks for any info!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Nghia Viet Tran 
Sent: 14 May 2020 03:49:38
To: Bryan Henderson; Ceph users mailing list
Subject: [ceph-users] Re: What is a pgmap?

If your Ceph cluster are running on the latest version of Ceph then the the 
pg_autoscaler probably  is the reason. After the period of time, Ceph will 
check the cluster status and increase/decrease the number of PG in the cluster 
if needed.

On 5/14/20, 03:37, "Bryan Henderson"  wrote:

I'm surprised I couldn't find this explained anywhere (I did look), but ...

What is the pgmap and why does it get updated every few seconds on a tiny
cluster that's mostly idle?

I do know what a placement group (PG) is and that when documentation talks
about placement group maps, it is talking about something else -- mapping of
PGs to OSDs by CRUSH and OSD maps.

--
Bryan Henderson   San Jose, California
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is a pgmap?

2020-05-14 Thread Frank Schilder
Hi, I also observe an increase in pgmap version every second or so, see snippet 
below. I run mimic 13.2.8 without any PG scaling/upmapping. Why does the 
version increase so often?

May 14 12:33:50 ceph-03 journal: cluster 2020-05-14 12:33:48.521546 mgr.ceph-02 
mgr.27460080 192.168.32.66:0/63 114833 : cluster [DBG] pgmap v114860: 2545 pgs: 
2 active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 
1.5 PiB / 1.8 PiB avail; 4.8 MiB/s rd, 11 MiB/s wr, 1.48 kop/s

May 14 12:33:50 ceph-02 journal: 2020-05-14 12:33:50.543 7fdb57c5b700  0 
log_channel(cluster) log [DBG] : pgmap v114861: 2545 pgs: 2 
active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 
PiB / 1.8 PiB avail; 5.6 MiB/s rd, 11 MiB/s wr, 1.21 kop/s

May 14 12:33:52 ceph-02 journal: 2020-05-14 12:33:52.565 7fdb57c5b700  0 
log_channel(cluster) log [DBG] : pgmap v114862: 2545 pgs: 2 
active+clean+scrubbing+deep, 2543 active+clean; 195 TiB data, 249 TiB used, 1.5 
PiB / 1.8 PiB avail; 8.9 MiB/s rd, 16 MiB/s wr, 1.59 kop/s

The version increases every second, here from pgmap v114860 to  pgmap v114862. 
Current cluster status:

[root@gnosis]# ceph status
  cluster:
id: ---
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-02(active), standbys: ceph-01, ceph-03
mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
osd: 288 osds: 268 up, 268 in

  data:
pools:   10 pools, 2545 pgs
objects: 80.80 M objects, 195 TiB
usage:   249 TiB used, 1.5 PiB / 1.8 PiB avail
pgs: 2543 active+clean
 2active+clean+scrubbing+deep

  io:
client:   20 MiB/s rd, 21 MiB/s wr, 578 op/s rd, 1.08 kop/s wr

Thanks for any info!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Nghia Viet Tran 
Sent: 14 May 2020 03:49:38
To: Bryan Henderson; Ceph users mailing list
Subject: [ceph-users] Re: What is a pgmap?

If your Ceph cluster are running on the latest version of Ceph then the the 
pg_autoscaler probably  is the reason. After the period of time, Ceph will 
check the cluster status and increase/decrease the number of PG in the cluster 
if needed.

On 5/14/20, 03:37, "Bryan Henderson"  wrote:

I'm surprised I couldn't find this explained anywhere (I did look), but ...

What is the pgmap and why does it get updated every few seconds on a tiny
cluster that's mostly idle?

I do know what a placement group (PG) is and that when documentation talks
about placement group maps, it is talking about something else -- mapping of
PGs to OSDs by CRUSH and OSD maps.

--
Bryan Henderson   San Jose, California
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Using rbd-mirror in existing pools

2020-05-14 Thread Kees Meijs | Nefos
Hi list,

Thanks again for pointing me towards rbd-mirror!

I've read documentation, old mailing list posts, blog posts and some
additional guides. Seems like the tool to help me through my data migration.

Given one-way synchronisation and image-based (so, not pool based)
configuration, it's still unclear to me how the mirroring will cope with
an existing target pool, already consisting of (a lot of) images.

Has someone done this already? It feels quite scary with doom scenario's
like "cleaning up" the target pool and such in mind...

To sum up: my goal is to mirror clusterA/somepool with some specific
images to clusterB/someotherpool where already other images reside. The
mirrored images should be kept in sync and the other images should be
left alone completely.

Cheers,
Kees

-- 
https://nefos.nl/contact

Nefos IT bv
Ambachtsweg 25 (industrienummer 4217)
5627 BZ Eindhoven
Nederland

KvK 66494931

/Aanwezig op maandag, dinsdag, woensdag en vrijdag/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-14 Thread Jason Dillaman
On Thu, May 14, 2020 at 3:12 AM Brad Hubbard  wrote:

> On Wed, May 13, 2020 at 6:00 PM Lomayani S. Laizer 
> wrote:
> >
> > Hello,
> >
> > Below is full debug log of 2 minutes before crash of virtual machine.
> Download from below url
> >
> > https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU
>
> This log has rbd debug output, but not rados :(
>
> I guess you'll need to try and capture a coredump if you can't get a
> backtrace.
>
> I'd also suggest opening a tracker in case one of the rbd devs has any
> ideas on this, or has seen something similar. Without a backtrace or
> core it will be impossible to definitively identify the issue though.
>

+1 to needing the backtrace. I don't see any indications of a problem in
that log.


> >
> >
> > apport.log
> >
> > Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount
> namespace, ignoring
> >
> > kernel.log
> > May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]:
> segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in
> librbd.so.1.12.0[7f4c8c65a000+5cb000]
> > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81
> c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48 8b 03
> 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08
> 48 8b 0b 48 21 f8 48 39 0c
> > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port
> 1(tap33511c4d-2c) entered disabled state
> > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c
> left promiscuous mode
> > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port
> 1(tap33511c4d-2c) entered disabled state
> >
> > syslog
> > compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip
> 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i
> > n librbd.so.1.12.0[7f4c8c65a000+5cb000]
> > May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81
> c3 d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8
> > 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08
> 48 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c
> > May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of file
> from qemu monitor
> > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Link
> DOWN
> > May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Lost
> carrier
> > May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port
> 1(tap33511c4d-2c) entered disabled state
> > May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c
> left promiscuous mode
> > May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port
> 1(tap33511c4d-2c) entered disabled state
> > May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to request
> link: No such device
> >
> > On Fri, May 8, 2020 at 5:40 AM Brad Hubbard  wrote:
> >>
> >> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer 
> wrote:
> >> >
> >> > Hello,
> >> > On my side at point of vm crash these are logs below. At the moment
> my debug is at 10 value. I will rise to 20 for full debug. these crashes
> are random and so far happens on very busy vms. Downgrading clients in host
> to Nautilus these crashes disappear
> >>
> >> You could try adding debug_rados as well but you may get a very large
> >> log so keep an eye on things.
> >>
> >> >
> >> > Qemu is not shutting down in general because other vms on the same
> host continues working
> >>
> >> A process can not reliably continue after encountering a segfault so
> >> the qemu-kvm process must be ending and therefore it should be
> >> possible to capture a coredump with the right configuration.
> >>
> >> In the following example, if you were to search for pid 6060 you would
> >> find it is no longer running.
> >> >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip
> 7f8165cc0a50 sp 7f81397f6490 error 4 in
> librbd.so.1.12.0[7f8165ab4000+537000]
> >>
> >> Without a backtrace at a minimum it may be very difficult to work out
> >> what's going on with certainty. If you open a tracker for the issue
> >> though maybe one of the devs specialising in rbd may have some
> >> feedback.
> >>
> >> >
> >> > 2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult:
> 0x7f88c80bfbf0 finish:  got {} for [0,24576] bl 24576
> >> > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult:
> 0x7f88c80f9330 finish: C_ObjectReadRequest: r=0
> >> > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult:
> 0x7f88c80f9330 finish:  got {} for [0,16384] bl 16384
> >> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState:
> 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
> >> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState:
> 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
> >> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10
> librbd::image::CloseRequest: 0x7f88c8175fd0 send_block_image_watcher
> >> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageWatche

[ceph-users] Bucket - radosgw-admin reshard process

2020-05-14 Thread CUZA Frédéric
Hi everyone,

I am facing an issue with bucket resharding.
It started with a warning on my ceph cluster health :

[root@ceph_monitor01 ~]# ceph -s
  cluster:
   id: 2da0734-2521-1p7r-8b4c-4a265219e807
health: HEALTH_WARN
1 large omap objects

Finds out I had a problem with a bucket :
"buckets": [
{
"bucket": "bucket-elementary-1",
"tenant": "",
"num_objects": 615915,
"num_shards": 3,
"objects_per_shard": 205305,
"fill_status": "OVER 100.00%"
},

I was wondering why the dynamic sharding wasn't doing its job but find out it 
was in the sharding queue :

[root@ceph_monitor01 ~]# radosgw-admin reshard list
[
{
"time": "2020-05-14 09:42:10.905080Z",
"tenant": "",
"bucket_name": " bucket-elementary-1",
"bucket_id": "97c1cfac-009f-4f7d-8d9d-9097c322c606.51988974.133",
"new_instance_id": "",
"old_num_shards": 3,
"new_num_shards": 12
}
]

As i wanted to process on taks in this queue I tried to run :

radosgw-admin reshard process

But turns out in an error :

[root@ceph_monitor01 ~]# radosgw-admin reshard process
ERROR: failed to process reshard logs, error=(22) Invalid argument
2020-05-14 14:15:10.225362 7f99b0437dc0  0 RGWReshardLock::lock failed to 
acquire lock on 
bucket-college-35:97c1cfac-009f-4f7d-8d9d-9097c322c606.51988974.133 ret=-22
2020-05-14 14:15:10.225376 7f99b0437dc0  0 process_single_logshardERROR in 
reshard_bucket bucket-elementary-1:(22) Invalid argument

Tried to cancel it tu do it manually but had the same error

[root@ceph_monitor01 ~]# radosgw-admin reshard cancel --bucket 
bucket-elementary-1
Error canceling bucket bucket-elementary-1 resharding: (22) Invalid argument
2020-05-14 14:16:42.196023 7fa0b1655dc0  0 RGWReshardLock::lock failed to 
acquire lock on 
bucket-elementary-1:97c1cfac-009f-4f7d-8d9d-9097c322c606.51988974.133 ret=-22

I found this post searching for an answer : 
https://tracker.ceph.com/issues/39970

But it seems that it will not help me since my whole cluster (monitors / data 
nodes / rgw) is on :
[root@ceph_monitor01 ~]# ceph --version
ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous 
(stable)

This is what I have on the status of this bucket :
[root@ceph_monitor01 ~]# radosgw-admin reshard status --bucket= 
bucket-elementary-1
[
{
"reshard_status": "CLS_RGW_RESHARD_NONE",
"new_bucket_instance_id": "",
"num_shards": 18446744073709551615
},
{
"reshard_status": "CLS_RGW_RESHARD_NONE",
"new_bucket_instance_id": "",
"num_shards": 18446744073709551615
},
{
"reshard_status": "CLS_RGW_RESHARD_NONE",
"new_bucket_instance_id": "",
"num_shards": 18446744073709551615
}
]

Any help would be appreciated.

Regards,

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW and the orphans

2020-05-14 Thread EDH - Manuel Rios
Hi Eric 

Any update about that?

Cluster status are critical and there's no a simple tool or cli provided in 
current releases that help to maintain our S3 Clusters Healthy.

Right now with the multipart/sharding bugs looks like a bunch of scrap.

Regards
Manuel



-Mensaje original-
De: EDH - Manuel Rios  
Enviado el: martes, 5 de mayo de 2020 15:34
Para: Katarzyna Myrek ; Eric Ivancich 
CC: ceph-users@ceph.io
Asunto: [ceph-users] Re: RGW and the orphans

Hi Eric,

Expected version to be included your tool in Nautilus? Maybe next reléase?

Best Regards
Manuel


-Mensaje original-
De: Katarzyna Myrek  Enviado el: lunes, 20 de abril de 2020 
12:19
Para: Eric Ivancich 
CC: EDH - Manuel Rios ; ceph-users@ceph.io
Asunto: Re: [ceph-users] RGW and the orphans

Hi Eric,

I will try your tool this week on lab clusters. Will get back to you when I get 
the results.

Kind regards / Pozdrawiam,
Katarzyna Myrek


pt., 17 kwi 2020 o 21:12 Eric Ivancich  napisał(a):
>
> On Apr 17, 2020, at 9:38 AM, Katarzyna Myrek  wrote:
>
> Hi Eric,
>
> Would it be possible to use it with an older cluster version (like 
> running new radosgw-admin in the container, connecting to the cluster 
> on 14.2.X)?
>
> Kind regards / Pozdrawiam,
> Katarzyna Myrek
>
>
> I did mention the nautilus backport PR in a separate reply 
> (https://github.com/ceph/ceph/pull/34127).
>
> You can try the master version and see. To the best of my recollection the 
> code porting did not involve any at-rest data structures. Instead it involved 
> internal reorganization of the code. I suspect it would work, but if you try 
> it, please report back what you find. Of course this is currently an 
> experimental feature and care (e.g., sanity checking) should be taken before 
> using the list produced to feed into a massive delete process.
>
> Eric
>
> --
> J. Eric Ivancich
>
> he / him / his
> Red Hat Storage
> Ann Arbor, Michigan, USA
>
>
>
> czw., 16 kwi 2020 o 19:58 EDH - Manuel Rios 
>  napisał(a):
>
>
> Hi Eric,
>
>
>
> Are there any ETA for get those script backported maybe in 14.2.10?
>
>
>
> Regards
>
> Manuel
>
>
>
>
>
> De: Eric Ivancich  Enviado el: jueves, 16 de 
> abril de 2020 19:05
> Para: Katarzyna Myrek ; EDH - Manuel Rios 
> 
> CC: ceph-users@ceph.io
> Asunto: Re: [ceph-users] RGW and the orphans
>
>
>
> There is currently a PR for an “orphans list” capability. I’m currently 
> working on the testing side to make sure it’s part of our teuthology suite.
>
>
>
> See: https://github.com/ceph/ceph/pull/34148
>
>
>
> Eric
>
>
>
>
>
> On Apr 16, 2020, at 9:26 AM, Katarzyna Myrek  wrote:
>
>
>
> Hi
>
> Thanks for the quick response.
>
> To be honest my cluster is getting full because of that trash and I am 
> at the point where I have to do the removal manually ;/.
>
> Kind regards / Pozdrawiam,
> Katarzyna Myrek
>
> czw., 16 kwi 2020 o 13:09 EDH - Manuel Rios 
>  napisał(a):
>
>
> Hi,
>
> From my experience orphans find didn't work since several releases ago, and 
> command should be re-coded or deprecated because its not running.
>
> Im our cases it loops over generated shards until RGW daemon crash.
>
> Interested into this post, in our case orphans find takes more than 24 hours 
> into start loop over shards, but never pass the shard 0 or 1.
>
> CEPH RGW devs, should provide any workaround script/ new tool or something to 
> maintain our rgw clusters. Because with the last bugs all rgw cluster got a 
> ton of trash, wasting resources and money.
>
> And manual cleaning is not trivial and easy.
>
> Waiting for more info,
>
> Manuel
>
>
> -Mensaje original-
> De: Katarzyna Myrek  Enviado el: jueves, 16 de 
> abril de 2020 12:38
> Para: ceph-users@ceph.io
> Asunto: [ceph-users] RGW and the orphans
>
> Hi
>
> Is there any new way to find and remove orphans from RGW pools on Nautilus?  
> I have found info that "orphans find" is now deprecated?
>
> I can see that I have tons of orphans in one of our clusters. Was wondering 
> how to safely remove them - make sure that they are really orphans.
> Does anyone have a good method for that?
>
> My cluster mostly has orphans from multipart uploads.
>
>
> Kind regards / Pozdrawiam,
> Katarzyna Myrek
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph orch ps => osd (Octopus 15.2.1)

2020-05-14 Thread Ml Ml
Hello,

any idea what´s wrong with my osd.34+35?

root@ceph01:~# ceph orch ps
NAMEHOSTSTATUS REFRESHED  AGE  VERSIONIMAGE
NAME   IMAGE ID  CONTAINER ID
(...)
osd.34  ceph04  running-  -
 
osd.35  ceph04  running-  -
 
(...)

root9411  0.3  0.1 4471132 55732 ?   Ssl  May04  42:43
/usr/sbin/dockerd -H fd://
root9429  0.2  0.0 4644008 28456 ?   Ssl  May04  29:15  \_
docker-containerd --config /var/run/docker/containerd/containerd.toml
--log-level info
root   15536  0.0  0.0 774832  3980 ?Sl   May04   0:48
 \_ docker-containerd-shim -namespace moby -workdir
/var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/a2f6
16715553  4.7 12.4 5905888 4068064 ? Ssl  May04 654:21
 |   \_ /usr/bin/ceph-osd -n osd.34 -f --setuser ceph --setgroup ceph
--default-log-to-file=false --default-log-to-stderr=true
root   17168  0.0  0.0 848628  3736 ?Sl   May04   0:50
 \_ docker-containerd-shim -namespace moby -workdir
/var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/5135
16717186  4.6 12.7 6006116 4179644 ? Ssl  May04 632:40
 |   \_ /usr/bin/ceph-osd -n osd.35 -f --setuser ceph --setgroup ceph
--default-log-to-file=false --default-log-to-stderr=true

ceph osd tree
-11  10.68115  host ceph04
 34hdd2.67029  osd.34   up   0.90002  1.0
 35hdd2.67029  osd.35   up   0.80005  1.0
 44hdd2.67029  osd.44   up   0.95001  1.0
 45hdd2.67029  osd.45   up   1.0  1.0

Thanks,
Michael
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Memory usage of OSD

2020-05-14 Thread Igor Fedotov

Rafal,

just to mention - stupid allocator is known to cause high memory usage 
in certain scenarios but it uses bluestore_alloc mempool.



Thanks,

Igor


On 5/13/2020 6:52 PM, Rafał Wądołowski wrote:

Mark,
Unfortunetly I closed terminal with mempool. But there was a lot of bytes used 
by bluestore_cache_other. That was the highest value (about 85%). The onode 
cache takes about 10%. PGlog and osdmaps was okey, low values. I saw some ideas 
that maybe compression_mode force in pool can make a mess.
One more thing, we are running stupid allocator. Right now I am decrease the 
osd_memory_target to 3GiB and will wait if ram problem occurs.




Regards,

Rafał Wądołowski


From: Mark Nelson 
Sent: Wednesday, May 13, 2020 3:30 PM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: Memory usage of OSD

On 5/13/20 12:43 AM, RafaĹ' WÄ...doĹ'owski wrote:

Hi,
I noticed strange situation in one of our clusters. The OSD deamons are taking 
too much RAM.
We are running 12.2.12 and have default configuration of osd_memory_target 
(4GiB).
Heap dump shows:

osd.2969 dumping heap profile now.

MALLOC: 6381526944 ( 6085.9 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +173373288 (  165.3 MiB) Bytes in central cache freelist
MALLOC: + 17163520 (   16.4 MiB) Bytes in transfer cache freelist
MALLOC: + 95339512 (   90.9 MiB) Bytes in thread cache freelists
MALLOC: + 28995744 (   27.7 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =   6696399008 ( 6386.2 MiB) Actual memory used (physical + swap)
MALLOC: +218267648 (  208.2 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =   691456 ( 6594.3 MiB) Virtual address space used
MALLOC:
MALLOC: 408276  Spans in use
MALLOC: 75  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

IMO "Bytes in use by application" should be less than osd_memory_target. Am I 
correct?
I checked heap dump with google-pprof and got following results.
Total: 149.4 MB
  60.5  40.5%  40.5% 60.5  40.5% 
rocksdb::UncompressBlockContentsForCompressionType
  34.2  22.9%  63.4% 34.2  22.9% ceph::buffer::create_aligned_in_mempool
  11.9   7.9%  71.3% 12.1   8.1% std::_Rb_tree::_M_emplace_hint_unique
  10.7   7.1%  78.5% 71.2  47.7% rocksdb::ReadBlockContents

Does it mean that most of RAM is used by rocksdb?


It looks like your heap dump is only accounting for 149.4MB of the
memory so probably not representative across the whole ~6.5G. Instead
could you try dumping the mempools via "ceph daemon osd.2969 dump_mempools"?



How can I take a deeper look into memory usage ?


Beyond looking at the mempools, you can see the bluestore cache
allocation information by either enabling debug bluestore and debug
priority_cache_manager 5, or potentially looking at the PCM perf
counters (I'm not sure if those were in 14.2.12 though). Between the
heap data, mempool data, and priority cache records, it should become
clearer what's going on.


Mark




Regards,

RafaĹ' WÄ...doĹ'owski



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: iscsi issues with ceph (Nautilus) + tcmu-runner

2020-05-14 Thread Phil Regnauld
Mike Christie (mchristi) writes:
> 
> I've never seen this kernel crash before. It might be helpful to send
> more of the log before the kernel warning below.

These are the messages leading up to the warning (pretty much the same, 
with
the occasional notice about an ongoing deep scrub
(2 active+clean+scrubbing+deep+repair), and at 05:09:34, the first 
ABORT_TASK
shows up. I don't know how busy the cluster was, but we normally have no
backups at that time (but veeam might have been taking snapshots).


May 12 05:09:32 ceph4 ceph-mon[485955]: 2020-05-12 05:09:32.727 7f80cb9c4700  0 
mon.ceph4@3(peon) e13 handle_command mon_command({"prefix": "status", "format": 
"json"} v 0) v1
May 12 05:09:33 ceph8 ceph-mon[256815]: 2020-05-12 05:09:33.431 7f2082335700  0 
log_channel(audit) log [DBG] : from='client.61723502 10.254.212.60:0/72439561' 
entity='client.admin' cmd=[{"prefix": "osd lspools", "format": "json"}]: 
dispatch
May 12 05:09:33 ceph8 ceph-mon[256815]: 2020-05-12 05:09:33.431 7f2082335700  0 
mon.ceph8@2(peon) e13 handle_command mon_command({"prefix": "osd lspools", 
"format": "json"} v 0) v1
May 12 05:09:34 ceph10 ceph-mgr[242781]: 2020-05-12 05:09:34.547 7f2813721700  
0 log_channel(cluster) log [DBG] : pgmap v848512: 896 pgs: 2 
active+clean+scrubbing+deep+repair, 894 active+clean; 10 TiB data, 32 TiB used, 
87 TiB / 120 TiB avail; 3.3 MiB/s rd, 4.8 MiB/s wr, 313 op/s
May 12 05:09:34 ceph1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 
94555972
May 12 05:09:34 ceph1 kernel: Call Trace:
May 12 05:09:34 ceph1 kernel:  __cancel_work_timer+0x10a/0x190
May 12 05:09:34 ceph1 kernel: Code: 69 2c 04 00 0f 0b e9 4a d3 ff ff 48 c7 c7 
d8 f9 83 be e8 56 2c 04 00 0f 0b e9 41 d6 ff ff 48 c7 c7 d8 f9 83 be e8 43 2c 
04 00 <0f> 0b 45 31 ed e9 2b d6 ff ff 49 8d b4 24 b0 00 00 00 48 c7 c7 b8
May 12 05:09:34 ceph1 kernel:  core_tmr_abort_task+0xd6/0x130 [target_core_mod]
May 12 05:09:34 ceph1 kernel: CPU: 11 PID: 2448784 Comm: kworker/u32:0 Tainted: 
GW 4.19.0-8-amd64 #1 Debian 4.19.98-1
May 12 05:09:34 ceph1 kernel: CR2: 555de2e07000 CR3: 00048240a005 CR4: 
003606e0
May 12 05:09:34 ceph1 kernel:  ? create_worker+0x1a0/0x1a0
May 12 05:09:34 ceph1 kernel: CS:  0010 DS:  ES:  CR0: 80050033
May 12 05:09:34 ceph1 kernel: [ cut here ]
May 12 05:09:34 ceph1 kernel:  dca usbcore i2c_i801 scsi_mod mfd_core mdio 
usb_common nvme_core crc32c_intel i2c_algo_bit wmi button
May 12 05:09:34 ceph1 kernel: DR0:  DR1:  DR2: 

May 12 05:09:34 ceph1 kernel: DR3:  DR6: fffe0ff0 DR7: 
0400
May 12 05:09:34 ceph1 kernel: ---[ end trace 3835b5fe0aa98ff0 ]---
May 12 05:09:34 ceph1 kernel: FS:  () 
GS:94f77fac() knlGS:
May 12 05:09:34 ceph1 kernel:  ? get_work_pool+0x40/0x40
May 12 05:09:34 ceph1 kernel: Hardware name: Supermicro SYS-6018R-TD8/X10DDW-i, 
BIOS 3.2 12/16/2019
May 12 05:09:34 ceph1 kernel:  ? irq_work_queue+0x46/0x50
May 12 05:09:34 ceph1 kernel:  ? __irq_work_queue_local+0x50/0x60
May 12 05:09:34 ceph1 kernel:  kthread+0x112/0x130
May 12 05:09:34 ceph1 kernel:  ? kthread_bind+0x30/0x30
May 12 05:09:34 ceph1 kernel: Modules linked in: fuse cbc ceph libceph 
libcrc32c crc32c_generic fscache target_core_pscsi target_core_file 
target_core_iblock iscsi_target_mod target_core_user uio target_core_mod 
configfs binfmt_misc 8021q garp stp mrp llc bonding intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel ast cryptd ipmi_ssif ttm intel_cstate 
drm_kms_helper intel_uncore mei_me iTCO_wdt drm pcc_cpufreq intel_rapl_perf 
joydev pcspkr sg iTCO_vendor_support mei ioatdma ipmi_si ipmi_devintf 
ipmi_msghandler acpi_pad acpi_power_meter evdev dm_mod sunrpc ip_tables 
x_tables autofs4 squashfs zstd_decompress xxhash loop overlay hid_generic 
usbhid hid sd_mod ahci xhci_pci ehci_pci libahci ehci_hcd xhci_hcd libata ixgbe 
igb nvme mxm_wmi lpc_ich
May 12 05:09:34 ceph1 kernel:  ? printk+0x58/0x6f
May 12 05:09:34 ceph1 kernel:  process_one_work+0x1a7/0x3a0

[...]

> From the bit of log you sent, it looks like commands might have started
> to timeout on the vmware side. That then kicked off the target layer's
> error handler which we then barf in.
> 
> You used the ceph-iscsi tools to set this up right? You didn't use
> targetcli with tcmu-runner directly?

Actually this bit is managed by croit.io, so I'm not 100% sure
(we're using their software as we didn't want to deal with iscsi setup,
among other things). They said they hadn't seen this particular error
before, which is why I'm asking here (and totally ok if that means we're
on our own).

I've got the option of trying a 5.x kernel, if it helps.

> Just to double check you didn't see old/slow requests like descri

[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread Zhenshi Zhou
I test this command on a mimic cluster. I can set the args to
luminous and back to jewel as well. I think the root cause you
cannot set it back, is some client may set a flag which conflict
with jewe when you set that option to luminous. So you are not
permitted to set it to jewel again. I think you should find this
client and deal with it.

Zhenshi Zhou  于2020年5月14日周四 下午4:56写道:

> The doc says "This subcommand will fail if any connected daemon or client
> is not compatible with the features offered by the given ". The
> command
> could be done if the client is disconnected, I guess.
>
>  于2020年5月14日周四 下午4:50写道:
>
>> HI
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] why don't ceph daemon output their log to /var/log/ceph

2020-05-14 Thread 展荣臻(信泰)
Hi.all,
 when ceph run in container why not let ceph daemon output their log to 
/var/log/ceph,
I build ceph image with ceph-container and deploy ceph via ceph-ansible.
I found no logs under /var/log/ceph.why don't ceph daemon output logs to 
/var/log/ceph?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-14 Thread Janne Johansson
Den tors 14 maj 2020 kl 10:46 skrev Amudhan P :

> Will EC based write benefit from Public network and Cluster network?
>

I guess this depends on what parameters you use.

All in all I think using one network is probably better, and the cases
where I have seen missing heartbeats, it's not the network that prevents
packets from coming over, it is the OSDs that make themselves busy doing
something else instead of heartbeating that makes them flip out, so if you
can set it up with LACP or other kinds of bonding, you allow the OSD hosts
to use the network optimally regardless of what state they are in.


> On Thu, May 14, 2020 at 1:39 PM lin yunfan  wrote:
>
>> That is correct.I didn't explain it clearly. I said that is because in
>> some write only scenario  the public network and cluster network will
>> all be saturated the same time.
>> linyunfan
>>
>> Janne Johansson  于2020年5月14日周四 下午3:42写道:
>> >
>> > Den tors 14 maj 2020 kl 08:42 skrev lin yunfan :
>> >>
>> >> Besides the recoverry  scenario , in a write only scenario the cluster
>> >> network will use the almost the same bandwith as public network.
>>
>
-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs taking too much memory, for pglog

2020-05-14 Thread Wout van Heeswijk

Hi Harald,

Your cluster has a lot of objects per osd/pg and the pg logs will grow 
fast and large because of this. The pg_logs will keep growing as long as 
you're clusters pgs are not active+clean. This means you are now in a 
loop where you cannot get stable running OSDs because the pg_logs take 
too much memory, and therefore the OSDs cannot purge the pg_logs...


I suggest you lower the values for both the osd_min_pg_log_entries and 
the osd_max_pg_log_entries. Lowering these values will cause Ceph to go 
into backfilling much earlier, but the memory usage of the OSDs will go 
down significantly enabling them to run stable. The default is 3000 for 
both of these values.


You can lower them to 500 by executing:

ceph config set osd osd_min_pg_log_entries 500
ceph config set osd osd_max_pg_log_entries 500

When you lower these values, you will get more backfilling instead of 
recoveries but I think it will help you get through this situation.


kind regards,

Wout
42on

On 13-05-2020 07:27, Harald Staub wrote:

Hi Mark

Thank you for your feedback!

The maximum number of PGs per OSD is only 123. But we have PGs with a 
lot of objects. For RGW, there is an EC pool 8+3 with 1024 PGs with 
900M objects, maybe this is the problematic part. The OSDs are 510 
hdd, 32 ssd.


Not sure, do you suggest to use something like
ceph-objectstore-tool --op trim-pg-log ?

When done correctly, would the risk be a lot of backfilling? Or also 
data loss?


Also, to get up the cluster is one thing, to keep it running seems to 
be a real challenge right now (OOM killer) ...


Cheers
 Harry

On 13.05.20 07:10, Mark Nelson wrote:

Hi Herald,


Changing the bluestore cache settings will have no effect at all on 
pglog memory consumption.  You can try either reducing the number of 
PGs (you might want to check and see how many PGs you have and 
specifically how many PGs on that OSD), or decrease the number of 
pglog entries per PG.  Keep in mind that fewer PG log entries may 
impact recovery.  FWIW, 8.5GB of memory usage for pglog implies that 
you have a lot of PGs per OSD, so that's probably the first place to 
look.



Good luck!

Mark


On 5/12/20 5:10 PM, Harald Staub wrote:
Several OSDs of one of our clusters are down currently because RAM 
usage has increased during the last days. Now it is more than we can 
handle on some systems. Frequently OSDs get killed by the OOM 
killer. Looking at "ceph daemon osd.$OSD_ID dump_mempools", it shows 
that nearly all (about 8.5 GB) is taken by osd_pglog, e.g.


    "osd_pglog": {
    "items": 461859,
    "bytes": 8445595868
    },

We tried to reduce it, with "osd memory target" and even with 
"bluestore cache autotune = false" (together with "bluestore cache 
size hdd"), but there was no effect at all.


I remember the pglog_hardlimit parameter, but that is already set by 
default with Nautilus I read. I.e. this is on Nautilus, 14.2.8.


Is there a way to limit this pglog memory?

Cheers
 Harry
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread Zhenshi Zhou
The doc says "This subcommand will fail if any connected daemon or client
is not compatible with the features offered by the given ". The
command
could be done if the client is disconnected, I guess.

 于2020年5月14日周四 下午4:50写道:

> HI
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Kees Meijs
Thanks all, I'm going to investigate rbd-mirror further.

K.

On 14-05-2020 09:30, Anthony D'Atri wrote:
> It’s entirely possible — and documented — to mirror individual images.  Your 
> proposal to use snapshots is reinventing the wheel, but with less efficiency.
>
> https://docs.ceph.com/docs/nautilus/rbd/rbd-mirroring/#image-configuration
>
>
> ISTR that in Octopus the need for RBD journals is gone, but am not positive.
>
> For done 1-2 volumes at a time you’ll want to increase two tunables to avoid 
> protracted syncing.  Without these I’ve experienced a volume of just a few TB 
> take multiple hours to converge, and some that got increasingly behind over 
> time.
>
>
>   rbd_mirror_journal_max_fetch_bytes:
> section: "client"
> value: "33554432"
>
>   rbd_journal_max_payload_bytes:
> section: "client"
> value: "8388608"
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread luuvuong91
HI
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-14 Thread Amudhan P
Will EC based write benefit from Public network and Cluster network?


On Thu, May 14, 2020 at 1:39 PM lin yunfan  wrote:

> That is correct.I didn't explain it clearly. I said that is because in
> some write only scenario  the public network and cluster network will
> all be saturated the same time.
> linyunfan
>
> Janne Johansson  于2020年5月14日周四 下午3:42写道:
> >
> > Den tors 14 maj 2020 kl 08:42 skrev lin yunfan :
> >>
> >> Besides the recoverry  scenario , in a write only scenario the cluster
> >> network will use the almost the same bandwith as public network.
> >
> >
> > That would depend on the replication factor. If it is high, I would
> assume every MB from the client network would make (repl-factor - 1) times
> the data on the private network to send replication requests to the other
> OSD hosts with the same amount of data.
> >
> > --
> > May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Remove or recreate damaged PG in erasure coding pool

2020-05-14 Thread Francois Legrand

Hello,
We run nautilus 14.2.8 ceph cluster.
After a big crash in which we lost some disks we had a PG down (Erasure 
coding 3+2 pool) and trying to fix it we followed this 
https://medium.com/opsops/recovering-ceph-from-reduced-data-availability-3-pgs-inactive-3-pgs-incomplete-b97cbcb4b5a1
As the PG was reported with 0 objects we first marked a shard as 
complete with ceph-objectstore-tool and restart the osd.

The pg thus went active but reported lost objects !
As we consider the datas on this pg as lost, we try to get rid of this 
with ceph pg 30.3 mark_unfound_lost delete.


This produced some logs like (~3 lines/hour):

2020-05-12 14:45:05.251830 osd.103 (osd.103) 886 : cluster [ERR] 30.3s0 
Unexpected Error: recovery ending with 41: 
{30:c000e27d:::rbd_data.34.c963b6314efb84.0
100:head=435293'2 flags = 
delete,30:c01f1248:::rbd_data.34.7f0c0d1df22f45.0325:head=435293'3 
flags = delete,30:c05e82b2:::rbd_data.34.674d063bdc66d2.0
015:head=435293'4 flags = 
delete,30:c0b2d8e7:::rbd_data.34.6bc88749c741cb.07d0:head=435293'5 
flags = delete,30:c0c3e20e:::rbd_data.34.674d063b
dc66d2.00fb:head=435293'6 flags = 
delete,30:c0c89740:::rbd_data.34.a7f2202210bb39.0bbc:head=435293'7 
flags = delete,30:c0e59ffa:::rbd_data.34.
7f0c0d1df22f45.02fb:head=435293'8 flags = 
delete,30:c0e72bf4:::rbd_data.34.7f0c0d1df22f45.00fa:head=435293'9 
flags = delete,30:c10ab507:::rbd_
data.34.80695c646d9535.0327:head=435293'10 flags = 
delete,30:c219e412:::rbd_data.34.a7f2202210bb39.0fa0:head=435293'11 
flags = delete,30:c29ae
ba3:::rbd_data.34.8038585a0eb9f6.0eb2:head=435293'12 flags = 
delete,30:c29fae09:::rbd_data.34.674d063bdc66d2.148a:head=435293'13 
flags = delet
e,30:c2b77a99:::rbd_data.34.7f0c0d1df22f45.031d:head=435293'14 
flags = 
delete,30:c2c8598f:::rbd_data.34.674d063bdc66d2.02f5:head=435293'15 
fla
gs = 
delete,30:c2dd39fe:::rbd_data.34.6494fb1b0f88bf.030b:head=435293'16 
flags = 
delete,30:c2f6ce39:::rbd_data.34.806ab864459ae5.0109:head=435
293'17 flags = 
delete,30:c2f8a62f:::rbd_data.34.ed0c58ebdc770f.002a:head=435293'18 
flags = delete,30:c306cd86:::rbd_data.34.ed0c58ebdc770f.020
5:head=435293'19 flags = 
delete,30:c30f5230:::rbd_data.34.7f0c0d1df22f45.02f5:head=435293'20 
flags = delete,30:c32b81df:::rbd_data.34.c79f6d1f78a707.0
100:head=435293'21 flags = 
delete,30:c3374080:::rbd_data.34.7f217e33dd742c.07d0:head=435293'22 
flags = delete,30:c3cdbeb5:::rbd_data.34.674dcefe97
f606.0109:head=435293'23 flags = 
delete,30:c3cdd149:::rbd_data.34.674dcefe97f606.0019:head=435293'24 
flags = delete,30:c40946c0:::rbd_data.34.
ded8d21a9d3d8f.02a8:head=435293'25 flags = 
delete,30:c42ed4fd:::rbd_data.34.a6985314ad8dad.0200:head=435293'26 
flags = delete,30:c483a99b:::rb
d_data.34.ed0c58ebdc770f.0a00:head=435293'27 flags = 
delete,30:c49f09d6:::rbd_data.34.7e1c1abf436885.0bb8:head=435293'28 
flags = delete,30:c51
5a4e8:::rbd_data.34.ed0c58ebdc770f.0106:head=435293'29 flags 
= 
delete,30:c5181a8e:::rbd_data.34.9385d45172fa0f.020c:head=435293'30 
flags = del
ete,30:c531de44:::rbd_data.34.6bc88749c741cb.0102:head=435293'31 
flags = 
delete,30:c5427518:::rbd_data.34.806ab864459ae5.06db:head=435293'32 
f
lags = 
delete,30:c5693b53:::rbd_data.34.6494fb1b0f88bf.148a:head=435293'33 
flags = 
delete,30:c5804bc9:::rbd_data.34.ed0cb8730e020c.0105:head=4
35293'34 flags = 
delete,30:c598117e:::rbd_data.34.7f0811fbac0b9d.0327:head=435293'35 
flags = delete,30:c5a64fbd:::rbd_data.34.c963b6314efb84.0
010:head=435293'36 flags = 
delete,30:c5f9e0e5:::rbd_data.34.ed0c58ebdc770f.0f01:head=435293'37 
flags = delete,30:c5ffe1d8:::rbd_data.34.6bc88749c741cb.000
00abe:head=435293'38 flags = 
delete,30:c6ecfaa1:::rbd_data.34.9385d45172fa0f.0002:head=435293'39 
flags = delete,30:c70f:::rbd_data.34.6494fb1b
0f88bf.0106:head=435293'40 flags = 
delete,30:c7a730f4:::rbd_data.34.7f217e33dd742c.06e1:head=435293'41 
flags = delete,30:c7aa79f7:::rbd_data.3

4.674dcefe97f606.0108:head=435293'42 flags = delete}

But yesterday it started to flood the logs (~9 GB of logs/day !) with 
lines like :


2020-05-14 10:36:03.851258 osd.29 [ERR] Error -2 reading object 
30:c24a0173:::rbd_data.34.806ab864459ae5.022d:head
2020-05-14 10:36:03.851333 osd.29 [ERR] Error -2 reading object 
30:c4a41972:::rbd_data.34.6bc88749c741cb.0320:head
2020-05-14 10:36:03.851382 osd.29 [ERR] Error -2 reading object 
30:c543da6f:::rbd_data.34.80695c646d9535.0dce:head
2020-05-14 10:36:03.859900 osd.29 [ERR] Error -2 reading object 
30:c24a0173:::rbd_data.34.806ab864459ae5.

[ceph-users] Re: ACL for user in another teant

2020-05-14 Thread Vishwas Bm
When I tried as below also, similar error is coming:

[root@vishwas-test cluster]# s3cmd --access_key=GY40PHWVK40A2G4XQH2D
--secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls
s3://tenant2/jerry-bucket
ERROR: Bucket 'tenant2' does not exist
ERROR: S3 error: 404 (NoSuchBucket)


[root@vishwas-test cluster]# s3cmd  --access_key=GY40PHWVK40A2G4XQH2D
--secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls
s3://tenant2:jerry-bucket
ERROR: S3 error: 403 (SignatureDoesNotMatch)


*Thanks & Regards,*

*Vishwas *


On Thu, May 14, 2020 at 1:54 PM Vishwas Bm  wrote:

> Hi Pritha,
>
> Thanks for the reply. Please find the user list, bucket list and also the
> command which I have used.
>
> [root@vishwas-test cluster]# radosgw-admin user list
> [
> "tenant2$Jerry",
> "tenant1$Tom"
> ]
>
> [root@vishwas-test cluster]# radosgw-admin bucket list
> [
> "tenant2/jerry-bucket"
> ]
>
> [root@vishwas-test cluster]# s3cmd info --access_key=HVTKORMH8LLDF76TKQGI
> --secret_key=9XFcvgMm4yBncA8D9SguEMVSBsUkhuuRLSbyuUPp s3://jerry-bucket
> s3://jerry-bucket/ (bucket):
>Location:  default
>Payer: BucketOwner
>Expiration Rule: none
>Policy:{
>   "Version": "2012-10-17",
>   "Statement": [
> {
>   "Principal": {"AWS": ["arn:aws:iam::tenant1:user/Tom"]},
>   "Action": ["s3:ListBucket"],
>   "Effect": "Allow",
>   "Resource": "s3://tenant2/jerry-bucket"
> }
>   ]
> }
>CORS:  none
>ACL:   Jerry: FULL_CONTROL
>
>
> When I try to list using Tom access keys, I get below error:
> [root@vishwas-test cluster]# s3cmd --access_key=GY40PHWVK40A2G4XQH2D
> --secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls s3://jerry-bucket
>
> *ERROR: Bucket 'jerry-bucket' does not existERROR: S3 error: 404
> (NoSuchBucket)*
>
>
> *Thanks & Regards,*
>
> *Vishwas *
>
>
> On Thu, May 14, 2020 at 11:54 AM Pritha Srivastava 
> wrote:
>
>> Hi Vishwas,
>>
>> Bucket policy should let you access buckets in another tenant.
>> What exact command are you using?
>>
>> Thanks,
>> Pritha
>>
>> On Thursday, May 14, 2020, Vishwas Bm  wrote:
>>
>>> > Hi,
>>> >
>>> > I have two users both belong to different tenant.
>>> >
>>> > Can I give permission for the user in another tenant to access the
>>> bucket
>>> > using setacl or setPolicy command ?
>>> > I tried the setacl command and setpolicy command, but it was not
>>> working ?
>>> > It used to say bucket not found, when the grantee tried to access.
>>> >
>>> > Is this supported ?
>>> >
>>> > *Thanks & Regards,*
>>> > *Vishwas *
>>> >
>>>
>>> >
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ACL for user in another teant

2020-05-14 Thread Vishwas Bm
Hi Pritha,

Thanks for the reply. Please find the user list, bucket list and also the
command which I have used.

[root@vishwas-test cluster]# radosgw-admin user list
[
"tenant2$Jerry",
"tenant1$Tom"
]

[root@vishwas-test cluster]# radosgw-admin bucket list
[
"tenant2/jerry-bucket"
]

[root@vishwas-test cluster]# s3cmd info --access_key=HVTKORMH8LLDF76TKQGI
--secret_key=9XFcvgMm4yBncA8D9SguEMVSBsUkhuuRLSbyuUPp s3://jerry-bucket
s3://jerry-bucket/ (bucket):
   Location:  default
   Payer: BucketOwner
   Expiration Rule: none
   Policy:{
  "Version": "2012-10-17",
  "Statement": [
{
  "Principal": {"AWS": ["arn:aws:iam::tenant1:user/Tom"]},
  "Action": ["s3:ListBucket"],
  "Effect": "Allow",
  "Resource": "s3://tenant2/jerry-bucket"
}
  ]
}
   CORS:  none
   ACL:   Jerry: FULL_CONTROL


When I try to list using Tom access keys, I get below error:
[root@vishwas-test cluster]# s3cmd --access_key=GY40PHWVK40A2G4XQH2D
--secret_key=bKq36rs5t1nZEL3MedAtDY3JCfBoOs1DEou0xfOk ls s3://jerry-bucket

*ERROR: Bucket 'jerry-bucket' does not existERROR: S3 error: 404
(NoSuchBucket)*


*Thanks & Regards,*

*Vishwas *


On Thu, May 14, 2020 at 11:54 AM Pritha Srivastava 
wrote:

> Hi Vishwas,
>
> Bucket policy should let you access buckets in another tenant.
> What exact command are you using?
>
> Thanks,
> Pritha
>
> On Thursday, May 14, 2020, Vishwas Bm  wrote:
>
>> > Hi,
>> >
>> > I have two users both belong to different tenant.
>> >
>> > Can I give permission for the user in another tenant to access the
>> bucket
>> > using setacl or setPolicy command ?
>> > I tried the setacl command and setpolicy command, but it was not
>> working ?
>> > It used to say bucket not found, when the grantee tried to access.
>> >
>> > Is this supported ?
>> >
>> > *Thanks & Regards,*
>> > *Vishwas *
>> >
>>
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-14 Thread lin yunfan
That is correct.I didn't explain it clearly. I said that is because in
some write only scenario  the public network and cluster network will
all be saturated the same time.
linyunfan

Janne Johansson  于2020年5月14日周四 下午3:42写道:
>
> Den tors 14 maj 2020 kl 08:42 skrev lin yunfan :
>>
>> Besides the recoverry  scenario , in a write only scenario the cluster
>> network will use the almost the same bandwith as public network.
>
>
> That would depend on the replication factor. If it is high, I would assume 
> every MB from the client network would make (repl-factor - 1) times the data 
> on the private network to send replication requests to the other OSD hosts 
> with the same amount of data.
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD weight on Luminous

2020-05-14 Thread jesper

unless uou have enabled some balancing - then this is very normal (actually 
pretty good normal)

Jesper


Sent from myMail for iOS


Thursday, 14 May 2020, 09.35 +0200 from Florent B.  :
>Hi,
>
>I have something strange on a Ceph Luminous cluster.
>
>All OSDs have the same size, the same weight, and one of them is used at
>88% by Ceph (osd.3) while others are around 40 to 50% usage :
>
>#  ceph osd df
>ID CLASS WEIGHT  REWEIGHT SIZE    USE DATA    OMAP    META   
>AVAIL   %USE  VAR  PGS
> 2   hdd 0.49179  1.0  504GiB  264GiB  263GiB 63.7MiB  960MiB 
>240GiB 52.34 1.14  81
>13   hdd 0.49179  1.0  504GiB  267GiB  266GiB 55.7MiB 1.37GiB 
>236GiB 53.09 1.16  94
>20   hdd 0.49179  1.0  504GiB  235GiB  234GiB 62.5MiB  962MiB 
>268GiB 46.70 1.02  99
>21   hdd 0.49179  1.0  504GiB  306GiB  305GiB 65.2MiB  991MiB 
>198GiB 60.75 1.32  87
>22   hdd 0.49179  1.0  504GiB  185GiB  184GiB 51.9MiB  972MiB 
>318GiB 36.83 0.80  73
>23   hdd 0.49179  1.0  504GiB  167GiB  166GiB 60.9MiB  963MiB 
>337GiB 33.07 0.72  80
>24   hdd 0.49179  1.0  504GiB  235GiB  234GiB 67.5MiB  956MiB 
>268GiB 46.74 1.02  90
>25   hdd 0.49179  1.0  504GiB  183GiB  182GiB 68.8MiB  955MiB 
>321GiB 36.32 0.79 100
> 3   hdd 0.49179  1.0  504GiB  442GiB  440GiB 77.5MiB 1.15GiB
>61.9GiB  87.70 1.91 103
>26   hdd 0.49179  1.0  504GiB  220GiB  219GiB 61.2MiB  963MiB 
>283GiB 43.78 0.95  80
>29   hdd 0.49179  1.0  504GiB  298GiB  296GiB 77.4MiB 1013MiB 
>206GiB 59.09 1.29 106
>30   hdd 0.49179  1.0  504GiB  183GiB  182GiB 60.2MiB  964MiB 
>321GiB 36.32 0.79  88
>10   hdd 0.49179  1.0  504GiB  176GiB  175GiB 56.5MiB  968MiB 
>327GiB 35.02 0.76  85
>11   hdd 0.49179  1.0  504GiB  209GiB  208GiB 62.5MiB  961MiB 
>295GiB 41.42 0.90  89
> 0   hdd 0.49179  1.0  504GiB  253GiB  252GiB 55.7MiB  968MiB 
>251GiB 50.18 1.09  76
> 1   hdd 0.49179  1.0  504GiB  199GiB  198GiB 60.4MiB  964MiB 
>305GiB 39.51 0.86  92
>16   hdd 0.49179  1.0  504GiB  219GiB  218GiB 58.2MiB  966MiB 
>284GiB 43.51 0.95  85
>17   hdd 0.49179  1.0  504GiB  231GiB  230GiB 69.0MiB  955MiB 
>272GiB 45.97 1.00  97
>14   hdd 0.49179  1.0  504GiB  210GiB  209GiB 61.0MiB  963MiB 
>293GiB 41.72 0.91  74
>15   hdd 0.49179  1.0  504GiB  182GiB  181GiB 50.7MiB  973MiB 
>322GiB 36.10 0.79  72
>18   hdd 0.49179  1.0  504GiB  297GiB  296GiB 53.7MiB  978MiB 
>206GiB 59.03 1.29  87
>19   hdd 0.49179  1.0  504GiB  125GiB  124GiB 61.9MiB  962MiB 
>379GiB 24.81 0.54  82
>    TOTAL 10.8TiB 4.97TiB 4.94TiB 1.33GiB 21.4GiB
>5.85TiB 45.91 
>MIN/MAX VAR: 0.54/1.91  STDDEV: 12.80
>
>
>Is it a normal situation ? Is there any way to let Ceph handle this
>alone or am I forced to reweight the OSD manually ?
>
>Thank you.
>
>Florent
>___
>ceph-users mailing list --  ceph-users@ceph.io
>To unsubscribe send an email to  ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Memory usage of OSD

2020-05-14 Thread Rafał Wądołowski
Mark, good news!

Adam, if you need some more information or debug, feel free to contact me on 
IRC: xelexin
I can confirm that this issue exist in luminous (12.2.12)


Regards,

Rafał Wądołowski

CloudFerro sp. z o.o.
ul. Fabryczna 5A
00-446 Warszawa
www.cloudferro.com



From: Janne Johansson 
Sent: Thursday, May 14, 2020 9:39 AM
To: Amudhan P 
Cc: Mark Nelson ; Rafał Wądołowski 
; ceph-users@ceph.io ; Adam 
Kupczyk 
Subject: Re: [ceph-users] Re: Memory usage of OSD



Den tors 14 maj 2020 kl 03:52 skrev Amudhan P 
mailto:amudha...@gmail.com>>:
For Ceph  release before nautilus  to effect osd_memory_target changes need
to restart OSD service.
I had similar issue in mimic I did the same in my test setup.
Before restarting OSD service ensure you set osd nodown and osd noout
similar commands to ensure it doesn't trigger OSD down and recovery.

Noout and norebalance seems like a good option to set before rebooting a host 
or restarting OSDs.

Nodown is kind of evil, since it will make clients send IO against the OSD 
thinking it is still up which it isn't, so client IO can stall.
Also, with nodown, it will get bad if some failure elsewhere occurs while you 
are doing maintenance, since the cluster will send IO to that part too.

Noout is ok, that means the cluster waits for it to come back, but sends 
requests to the other replicas in the meantime without starting to rebuild a 
new replica, and norebalance to prevent balancing while you are doing this.
The PGs will be degraded (since they are missing one replica) but the cluster 
goes on.

--
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-14 Thread Janne Johansson
Den tors 14 maj 2020 kl 08:42 skrev lin yunfan :

> Besides the recoverry  scenario , in a write only scenario the cluster
> network will use the almost the same bandwith as public network.
>

That would depend on the replication factor. If it is high, I would assume
every MB from the client network would make (repl-factor - 1) times the
data on the private network to send replication requests to the other OSD
hosts with the same amount of data.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread luuvuong91
Hi,
Out put of comand ceph 
root@ceph05:~# ceph features
{
"mon": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 4
}
},
"osd": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 27
}
},
"client": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 7
}
}
}
root@ceph05:~#
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread luuvuong91
Hi,
I already reboot server of my cluster and reboot compute node but not fix

I have update kerbel 1 compute node to 4.4 but not fix

I have run command ceph osd crush tunables legacy but not fix
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Memory usage of OSD

2020-05-14 Thread Janne Johansson
Den tors 14 maj 2020 kl 03:52 skrev Amudhan P :

> For Ceph  release before nautilus  to effect osd_memory_target changes need
> to restart OSD service.
> I had similar issue in mimic I did the same in my test setup.
> Before restarting OSD service ensure you set osd nodown and osd noout
> similar commands to ensure it doesn't trigger OSD down and recovery.
>

Noout and norebalance seems like a good option to set before rebooting a
host or restarting OSDs.

Nodown is kind of evil, since it will make clients send IO against the OSD
thinking it is still up which it isn't, so client IO can stall.
Also, with nodown, it will get bad if some failure elsewhere occurs while
you are doing maintenance, since the cluster will send IO to that part too.

Noout is ok, that means the cluster waits for it to come back, but sends
requests to the other replicas in the meantime without starting to rebuild
a new replica, and norebalance to prevent balancing while you are doing
this.
The PGs will be degraded (since they are missing one replica) but the
cluster goes on.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread luuvuong91
Hi,

Output of command above

root@ceph07:~# ceph osd dump | grep min_compat_client
require_min_compat_client luminous
min_compat_client luminous
root@ceph07:~#


I try reduce to jewel but not success
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Zhenshi Zhou
rbd-mirror can work on a single image in the pool.
and I did a test on image copy from 13.2 to 14.2.
however, the data new in the source image didn't
copy to the destination image. I'm not sure if this
is normal.

Kees Meijs  于2020年5月14日周四 下午3:24写道:

> I need to mirror single RBDs while rbd-mirror: "mirroring is configured
> on a per-pool basis" (according documentation).
>
> On 14-05-2020 09:13, Anthony D'Atri wrote:
> > So?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is a pgmap?

2020-05-14 Thread Janne Johansson
Den ons 13 maj 2020 kl 22:37 skrev Bryan Henderson :

> I'm surprised I couldn't find this explained anywhere (I did look), but ...
> What is the pgmap and why does it get updated every few seconds on a tiny
> cluster that's mostly idle?
>
>
I was sure it was updated exactly once per second.


> I do know what a placement group (PG) is and that when documentation talks
> about placement group maps, it is talking about something else -- mapping
> of
> PGs to OSDs by CRUSH and OSD maps.
>

I thought it was a method (the method?) to know if a PG comes back from a
crashed OSD/host, to know if it was up-to-date or old since it would have
an older timestamp.

Since using computer time and date is fraught with peril, having the whole
cluster just bump that single number every second (and writing it to the PG
on each write) would allow a mostly idle PG that comes back after an hour
of unexpected downtime to easily know if it needs no recovery, a little bit
of delta to get up-to-date or a full copy from the primary in order to
become a part of the replica set for that PG.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Konstantin Shalygin

On 5/14/20 1:27 PM, Kees Meijs wrote:

Thank you very much. That's a good question.

The implementations of OpenStack and Ceph and "the other" OpenStack and
Ceph are, apart from networking, completely separate.


Actually I was thinking you perform OpenStack and Ceph upgrade, not 
migration to other OpenStack and other Ceph.


If downtime is acceptable I will suggest to use `qemu-img convert` 
instead rbd import/export, because:


* this copy from one cluster to another;

* speed up-to 3Gbit/s (my maximum from qemu network stack in 2018);

qemu-img convert -m 16 -W -p -n -f raw -O raw \

rbd:/:id=cinder:key=:mon_host=172.16.16.2 
\


rbd:/:id=cinder:key=:mon_host=172.16.17.2




k

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Anthony D'Atri
It’s entirely possible — and documented — to mirror individual images.  Your 
proposal to use snapshots is reinventing the wheel, but with less efficiency.

https://docs.ceph.com/docs/nautilus/rbd/rbd-mirroring/#image-configuration


ISTR that in Octopus the need for RBD journals is gone, but am not positive.

For done 1-2 volumes at a time you’ll want to increase two tunables to avoid 
protracted syncing.  Without these I’ve experienced a volume of just a few TB 
take multiple hours to converge, and some that got increasingly behind over 
time.


  rbd_mirror_journal_max_fetch_bytes:
section: "client"
value: "33554432"

  rbd_journal_max_payload_bytes:
section: "client"
value: "8388608"

> On May 14, 2020, at 12:23 AM, Kees Meijs  wrote:
> 
> I need to mirror single RBDs while rbd-mirror: "mirroring is configured
> on a per-pool basis" (according documentation).
> 
> On 14-05-2020 09:13, Anthony D'Atri wrote:
>> So?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread Zhenshi Zhou
What the command "ceph osd dump | grep min_compat_client"
and "ceph features" output

Eugen Block  于2020年5月14日周四 下午3:17写道:

> Can you share what you have tried so far? It's unclear at which point
> it's failing so I'd suggest to stop the instances, restart
> nova-compute.service and then start instances again.
>
>
> Zitat von luuvuon...@gmail.com:
>
> > Dear,
> > in compute node, i have update ceph to version luminus 12.2.13 but not
> fix
> > Plaese detail away fix it help me
> > Thanks
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Eugen Block

You can also mirror on a per-image basis.


Zitat von Kees Meijs :


I need to mirror single RBDs while rbd-mirror: "mirroring is configured
on a per-pool basis" (according documentation).

On 14-05-2020 09:13, Anthony D'Atri wrote:

So?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Kees Meijs
I need to mirror single RBDs while rbd-mirror: "mirroring is configured
on a per-pool basis" (according documentation).

On 14-05-2020 09:13, Anthony D'Atri wrote:
> So?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread Eugen Block
Can you share what you have tried so far? It's unclear at which point  
it's failing so I'd suggest to stop the instances, restart  
nova-compute.service and then start instances again.



Zitat von luuvuon...@gmail.com:


Dear,
in compute node, i have update ceph to version luminus 12.2.13 but not fix
Plaese detail away fix it help me
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Anthony D'Atri
So?

> 
> Hi Anthony,
> 
> Thanks as well.
> 
> Well, it's a one-time job.
> 
> K.
> 
> On 14-05-2020 09:10, Anthony D'Atri wrote:
>> Why not use rbd-mirror to handle the volumes?
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Kees Meijs
Hi Anthony,

Thanks as well.

Well, it's a one-time job.

K.

On 14-05-2020 09:10, Anthony D'Atri wrote:
> Why not use rbd-mirror to handle the volumes?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-14 Thread Brad Hubbard
On Wed, May 13, 2020 at 6:00 PM Lomayani S. Laizer  wrote:
>
> Hello,
>
> Below is full debug log of 2 minutes before crash of virtual machine. 
> Download from below url
>
> https://storage.habari.co.tz/index.php/s/31eCwZbOoRTMpcU

This log has rbd debug output, but not rados :(

I guess you'll need to try and capture a coredump if you can't get a backtrace.

I'd also suggest opening a tracker in case one of the rbd devs has any
ideas on this, or has seen something similar. Without a backtrace or
core it will be impossible to definitively identify the issue though.

>
>
> apport.log
>
> Wed May 13 09:35:30 2020: host pid 4440 crashed in a separate mount 
> namespace, ignoring
>
> kernel.log
> May 13 09:35:30 compute5 kernel: [123071.373217] fn-radosclient[4485]: 
> segfault at 0 ip 7f4c8c85d7ed sp 7f4c66ffc470 error 4 in 
> librbd.so.1.12.0[7f4c8c65a000+5cb000]
> May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81 c3 
> d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 89 02 48 8b 03 48 89 
> 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 8b 44 24 08 48 8b 0b 
> 48 21 f8 48 39 0c
> May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port 
> 1(tap33511c4d-2c) entered disabled state
> May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c left 
> promiscuous mode
> May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port 
> 1(tap33511c4d-2c) entered disabled state
>
> syslog
> compute5 kernel: [123071.373217] fn-radosclient[4485]: segfault at 0 ip 
> 7f4c8c85d7ed sp 7f4c66ffc470 error 4 i
> n librbd.so.1.12.0[7f4c8c65a000+5cb000]
> May 13 09:35:30 compute5 kernel: [123071.373228] Code: 8d 44 24 08 48 81 c3 
> d8 3e 00 00 49 21 f9 48 c1 e8 30 83 c0 01 48 c1 e0 30 48 8
> 9 02 48 8b 03 48 89 04 24 48 8b 34 24 48 21 fe <48> 8b 06 48 89 44 24 08 48 
> 8b 44 24 08 48 8b 0b 48 21 f8 48 39 0c
> May 13 09:35:30 compute5 libvirtd[1844]: internal error: End of file from 
> qemu monitor
> May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Link DOWN
> May 13 09:35:33 compute5 systemd-networkd[1326]: tap33511c4d-2c: Lost carrier
> May 13 09:35:33 compute5 kernel: [123074.832700] brqa72d845b-e9: port 
> 1(tap33511c4d-2c) entered disabled state
> May 13 09:35:33 compute5 kernel: [123074.838520] device tap33511c4d-2c left 
> promiscuous mode
> May 13 09:35:33 compute5 kernel: [123074.838527] brqa72d845b-e9: port 
> 1(tap33511c4d-2c) entered disabled state
> May 13 09:35:33 compute5 networkd-dispatcher[1614]: Failed to request link: 
> No such device
>
> On Fri, May 8, 2020 at 5:40 AM Brad Hubbard  wrote:
>>
>> On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer  
>> wrote:
>> >
>> > Hello,
>> > On my side at point of vm crash these are logs below. At the moment my 
>> > debug is at 10 value. I will rise to 20 for full debug. these crashes are 
>> > random and so far happens on very busy vms. Downgrading clients in host to 
>> > Nautilus these crashes disappear
>>
>> You could try adding debug_rados as well but you may get a very large
>> log so keep an eye on things.
>>
>> >
>> > Qemu is not shutting down in general because other vms on the same host 
>> > continues working
>>
>> A process can not reliably continue after encountering a segfault so
>> the qemu-kvm process must be ending and therefore it should be
>> possible to capture a coredump with the right configuration.
>>
>> In the following example, if you were to search for pid 6060 you would
>> find it is no longer running.
>> >> > [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip 
>> >> > 7f8165cc0a50 sp 7f81397f6490 error 4 in 
>> >> > librbd.so.1.12.0[7f8165ab4000+537000]
>>
>> Without a backtrace at a minimum it may be very difficult to work out
>> what's going on with certainty. If you open a tracker for the issue
>> though maybe one of the devs specialising in rbd may have some
>> feedback.
>>
>> >
>> > 2020-05-07T13:02:12.121+0300 7f88d57fa700 10 librbd::io::ReadResult: 
>> > 0x7f88c80bfbf0 finish:  got {} for [0,24576] bl 24576
>> > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: 
>> > 0x7f88c80f9330 finish: C_ObjectReadRequest: r=0
>> > 2020-05-07T13:02:12.193+0300 7f88d57fa700 10 librbd::io::ReadResult: 
>> > 0x7f88c80f9330 finish:  got {} for [0,16384] bl 16384
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: 
>> > 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageState: 
>> > 0x5569b5da9bb0 0x5569b5da9bb0 send_close_unlock
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image::CloseRequest: 
>> > 0x7f88c8175fd0 send_block_image_watcher
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::ImageWatcher: 
>> > 0x7f88c400dfe0 block_notifies
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500  5 librbd::Watcher: 
>> > 0x7f88c400dfe0 block_notifies: blocked_count=1
>> > 2020-05-07T13:02:28.694+0300 7f890ba90500 10 librbd::image

[ceph-users] Re: Migrating clusters (and versions)

2020-05-14 Thread Anthony D'Atri
Why not use rbd-mirror to handle the volumes?

> On May 13, 2020, at 11:27 PM, Kees Meijs  wrote:
> 
> Hi Konstantin,
> 
> Thank you very much. That's a good question.
> 
> The implementations of OpenStack and Ceph and "the other" OpenStack and
> Ceph are, apart from networking, completely separate.
> 
> In terms of OpenStack I can recreate the compute instances and storage
> volumes but obviously need to copy the data as well. This is fine for
> small to medium volumes but now I stumble into some large ones: up to 10
> TiBs in size.
> 
> My idea (and hope) is to be able to use Ceph's snapshot mechanismn.
> First: take and copy a snapshot from a certain RBD volume to the other
> cluster. Then, another snapshot and copy the delta. And then once again
> but with the compute instance shut off.
> 
> K.
> 
> On 14-05-2020 06:24, Konstantin Shalygin wrote:
>> Why just not migrate your VM's?
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: all VMs in compute node openstack connecting to this ceph cluster error connect after run command ceph osd set-require-min-compat-client luminus

2020-05-14 Thread luuvuong91
Dear,
in compute node, i have update ceph to version luminus 12.2.13 but not fix
Plaese detail away fix it help me
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io