date:20210917

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Patrick Donnelly

On Fri, Sep 17, 2021 at 11:30 AM Robert Sander
 wrote:
>
> On 17.09.21 16:40, Patrick Donnelly wrote:
>
> > Stopping NFS should not have been necessary. But, yes, reducing
> > max_mds to 1 and disabling allow_standby_replay is required. See:
> > https://docs.ceph.com/en/pacific/cephfs/upgrading/#upgrading-the-mds-cluster
>
> I do no read upgrade notes any more because I just run
>
> ceph orch upgrade start
>
> Why does the orchestrator not run the necessary steps?

With cephadm and the automatic deployment of standby-replay daemons,
the logic to reduce max_mds in cephadm was not enough. Unfortunately,
we (I) forgot to update cephadm to also disable standby-replay (which
was only recently added to the upgrade procedure). Sorry about that. I
am working on a fix [1]. I will also ensure testing is in place so the
community doesn't trip on this again.

[1] https://github.com/ceph/ceph/pull/43214

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Patrick Donnelly

On Fri, Sep 17, 2021 at 6:57 PM Eric Dold  wrote:
>
> Hi Patrick
>
> Here's the output of ceph fs dump:
>
> e226256
> enable_multiple, ever_enabled_multiple: 0,1
> default compat: compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> anchor table,9=file layout v2,10=snaprealm v2}
> legacy client fscid: 2
>
> Filesystem 'cephfs' (2)
> fs_name cephfs
> epoch   226254
> flags   12
> created 2019-03-20T14:06:32.588328+0100
> modified2021-09-17T14:47:08.513192+0200
> tableserver 0
> root0
> session_timeout 60
> session_autoclose   300
> max_file_size   1099511627776
> required_client_features{}
> last_failure0
> last_failure_osd_epoch  91941
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
> table,9=file layout v2,10=snaprealm v2}
> max_mds 1
> in  0,1
> up  {}
> failed  0,1

Run:

ceph fs compat add_incompat cephfs 7 "mds uses inline data"


It's interesting you're in the same situation (two ranks). Are you
using cephadm? If not, were you not aware of the MDS upgrade procedure
[1]?

[1] https://docs.ceph.com/en/pacific/cephfs/upgrading/

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Eric Dold

Hi Patrick

Here's the output of ceph fs dump:

e226256
enable_multiple, ever_enabled_multiple: 0,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 2

Filesystem 'cephfs' (2)
fs_name cephfs
epoch   226254
flags   12
created 2019-03-20T14:06:32.588328+0100
modified2021-09-17T14:47:08.513192+0200
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
required_client_features{}
last_failure0
last_failure_osd_epoch  91941
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
table,9=file layout v2,10=snaprealm v2}
max_mds 1
in  0,1
up  {}
failed  0,1
damaged
stopped
data_pools  [3]
metadata_pool   4
inline_data disabled
balancer
standby_count_wanted1


Standby daemons:

[mds.ceph3{-1:4694171} state up:standby seq 1 addr [v2:
192.168.1.72:6800/2991378711,v1:192.168.1.72:6801/2991378711] compat
{c=[1],r=[1],i=[7ff]}]
dumped fsmap epoch 226256

On Fri, Sep 17, 2021 at 4:41 PM Patrick Donnelly 
wrote:

> On Fri, Sep 17, 2021 at 8:54 AM Eric Dold  wrote:
> >
> > Hi,
> >
> > I get the same after upgrading to 16.2.6. All mds daemons are standby.
> >
> > After setting
> > ceph fs set cephfs max_mds 1
> > ceph fs set cephfs allow_standby_replay false
> > the mds still wants to be standby.
> >
> > 2021-09-17T14:40:59.371+0200 7f810a58f600  0 ceph version 16.2.6
> > (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable), process
> > ceph-mds, pid 7113
> > 2021-09-17T14:40:59.371+0200 7f810a58f600  1 main not setting numa
> affinity
> > 2021-09-17T14:40:59.371+0200 7f810a58f600  0 pidfile_write: ignore empty
> > --pid-file
> > 2021-09-17T14:40:59.375+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map
> to
> > version 226251 from mon.0
> > 2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map
> to
> > version 226252 from mon.0
> > 2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Monitors have
> > assigned me to become a standby.
> >
> > setting add_incompat 1 does also not work:
> > # ceph fs compat cephfs add_incompat 1
> > Error EINVAL: adding a feature requires a feature string
> >
> > Any ideas?
>
> Please share `ceph fs dump`.
>
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat Sunnyvale, CA
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread 胡玮文

Thanks again. Now my CephFS is back online!

I ended up build ceph-mon from source myself, with the following patch applied. 
and only replacing the mon leader seems sufficient.

Now I’m interested in why such a routine automated minor version upgrade could 
get the cluster into such a state in the first place.

diff --git a/src/mon/MDSMonitor.cc b/src/mon/MDSMonitor.cc
index 4373938..786f227 100644
--- a/src/mon/MDSMonitor.cc
+++ b/src/mon/MDSMonitor.cc
@@ -1526,7 +1526,7 @@ int MDSMonitor::filesystem_command(
 ss << "removed mds gid " << gid;
 return 0;
 }
-  } else if (prefix == "mds rmfailed") {
+  } else if (prefix == "mds addfailed") {
 bool confirm = false;
 cmd_getval(cmdmap, "yes_i_really_mean_it", confirm);
 if (!confirm) {
@@ -1554,10 +1554,10 @@ int MDSMonitor::filesystem_command(
 role.fscid,
 [role](std::shared_ptr fs)
 {
-  fs->mds_map.failed.erase(role.rank);
+  fs->mds_map.failed.insert(role.rank);
 });

-ss << "removed failed mds." << role;
+ss << "added failed mds." << role;
 return 0;
 /* TODO: convert to fs commands to update defaults */
   } else if (prefix == "mds compat rm_compat") {
diff --git a/src/mon/MonCommands.h b/src/mon/MonCommands.h
index 463419b..5c6a927 100644
--- a/src/mon/MonCommands.h
+++ b/src/mon/MonCommands.h
@@ -334,7 +334,7 @@ COMMAND("mds repaired name=role,type=CephString",
COMMAND("mds rm "
"name=gid,type=CephInt,range=0",
"remove nonactive mds", "mds", "rw")
-COMMAND_WITH_FLAG("mds rmfailed name=role,type=CephString "
+COMMAND_WITH_FLAG("mds addfailed name=role,type=CephString "
 "name=yes_i_really_mean_it,type=CephBool,req=false",
"remove failed rank", "mds", "rw", FLAG(HIDDEN))
COMMAND_WITH_FLAG("mds cluster_down", "take MDS cluster down", "mds", "rw", 
FLAG(OBSOLETE))

发件人: Patrick Donnelly
发送时间: 2021年9月18日 5:06
收件人: 胡 玮文
抄送: Eric Dold; ceph-users
主题: Re: Cephfs - MDS all up:standby, not becoming up:active

On Fri, Sep 17, 2021 at 3:17 PM 胡 玮文  wrote:
>
> > Did you run the command I suggested before or after you executed `rmfailed` 
> > below?
>
>
>
> I run “rmfailed” before reading your mail. Then I got MON crashed. I fixed 
> the crash by setting max_mds=2. Then I tried the command you suggested.
>
>
>
> By reading the code[1], I think I really need to undo the “rmfailed” to get 
> my MDS out of standby state.

Exactly. If you install the repositories from (available in about ~1 hour):

https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fshaman.ceph.com%2Frepos%2Fceph%2Fceph-mds-addfailed-pacific%2F9a1ccf41c32446e1b31328e7d01ea8e4aaea8cbb%2F&data=04%7C01%7C%7C997ad71e82e84d125e4108d97a1ef2f0%7C84df9e7fe9f640afb435%7C1%7C0%7C637675095612004570%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wPaJ8yc5vFyMh%2BjqBtFjXCgCpQPqqbENrQ5K8n6EhO8%3D&reserved=0

for the monitors (only), and then run:

for i in 0 1; do ceph mds addfailed :$i --yes-i-really-mean-it ; done

it should fix it for you.

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Patrick Donnelly

On Fri, Sep 17, 2021 at 3:17 PM 胡 玮文  wrote:
>
> > Did you run the command I suggested before or after you executed `rmfailed` 
> > below?
>
>
>
> I run “rmfailed” before reading your mail. Then I got MON crashed. I fixed 
> the crash by setting max_mds=2. Then I tried the command you suggested.
>
>
>
> By reading the code[1], I think I really need to undo the “rmfailed” to get 
> my MDS out of standby state.

Exactly. If you install the repositories from (available in about ~1 hour):

https://shaman.ceph.com/repos/ceph/ceph-mds-addfailed-pacific/9a1ccf41c32446e1b31328e7d01ea8e4aaea8cbb/

for the monitors (only), and then run:

for i in 0 1; do ceph mds addfailed :$i --yes-i-really-mean-it ; done

it should fix it for you.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: HEALTH_WARN: failed to probe daemons or devices after upgrade to 16.2.6

2021-09-17 Thread Fyodor Ustinov

Hi!

> Was there a MON running previously on that host? Do you see the daemon
> when running 'cephadm ls'? If so, remove it with 'cephadm rm-daemon
> --name mon.s-26-9-17'

Hmm. 'cephadm ls' running directly on the node does show that there is mon. I 
don't quite understand where it came from and I don't understand why 'ceph orch 
ps' didn't show this service.

Thank you very much for your help.

P.S. Perhaps you know - service in this state, is it normal?
{
"style": "cephadm:v1",
"name": "node-exporter.s-26-9-17",
"fsid": "46e2b13c-dab7-11eb-810b-a5ea707f1ea1",
"systemd_unit": 
"ceph-46e2b13c-dab7-11eb-810b-a5ea707f1ea1@node-exporter.s-26-9-17",
"enabled": true,
"state": "error",
"service_name": "node-exporter",
"ports": [
9100
],
"ip": null,
"deployed_by": [

"docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949"
],
"memory_request": null,
"memory_limit": null,
"container_id": null,
"container_image_name": "docker.io/prom/node-exporter:v0.18.1",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2021-07-03T01:50:48.371104Z",
"deployed": "2021-07-03T01:50:47.855103Z",
"configured": "2021-07-03T01:50:48.371104Z"
},
{
"style": "cephadm:v1",
"name": "node-exporter.s-26-9-17",
"fsid": "1ef45b26-dbac-11eb-a357-616c355f48cb",
"systemd_unit": 
"ceph-1ef45b26-dbac-11eb-a357-616c355f48cb@node-exporter.s-26-9-17",
"enabled": true,
"state": "running",
"service_name": "node-exporter",
"ports": [
9100
],
"ip": null,
"deployed_by": [

"quay.io/ceph/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530",

"quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac"
],
"rank": null,
"rank_generation": null,
"memory_request": null,
"memory_limit": null,
"container_id": 
"73d4fb20f2fddf9aa5738b5e3c7c9b098862702989a088c32bad528275f90c19",
"container_image_name": "quay.io/prometheus/node-exporter:v0.18.1",
"container_image_id": 
"e5a616e4b9cf68dfcad7782b78e118be4310022e874d52da85c55923fb615f87",
"container_image_digests": [

"docker.io/prom/node-exporter@sha256:a2f29256e53cc3e0b64d7a472512600b2e9410347d53cdc85b49f659c17e02ee",

"docker.io/prom/node-exporter@sha256:b630fb29d99b3483c73a2a7db5fc01a967392a3d7ad754c8eccf9f4a67e7ee31",

"quay.io/prometheus/node-exporter@sha256:a2f29256e53cc3e0b64d7a472512600b2e9410347d53cdc85b49f659c17e02ee",

"quay.io/prometheus/node-exporter@sha256:b630fb29d99b3483c73a2a7db5fc01a967392a3d7ad754c8eccf9f4a67e7ee31"
],
"memory_usage": 26843545,
"version": "0.18.1",
"started": "2021-09-17T10:04:31.495483Z",
"created": "2021-07-03T03:37:26.519462Z",
"deployed": "2021-09-17T08:12:12.116009Z",
"configured": "2021-09-17T08:12:15.887998Z"
},

 
WBR,
Fyodor.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] 回复: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread 胡玮文

> Did you run the command I suggested before or after you executed `rmfailed` 
> below?

I run “rmfailed” before reading your mail. Then I got MON crashed. I fixed the 
crash by setting max_mds=2. Then I tried the command you suggested.

By reading the code[1], I think I really need to undo the “rmfailed” to get my 
MDS out of standby state.

> I will compile an addfailed command in a branch but you'll need to download 
> the packages and run it.

Recompile can be hard, I’m not familiar with the procedure. Now I’m going to 
modify the logic of[2] a little bit with gdb to insert the failed rank.

> Please be careful running hidden/debugging commands.

I will definitely be more careful in the future. Thanks again for your help.

[1]: https://github.com/ceph/ceph/blob/v16.2.6/src/mon/MDSMonitor.cc#L2238
[2]: https://github.com/ceph/ceph/blob/v16.2.6/src/mds/FSMap.cc#L1031

> What was the crash?

ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
1: /lib64/libpthread.so.0(+0x12b20) [0x7f7e9e32cb20]
2: gsignal()
3: abort()
4: /lib64/libstdc++.so.6(+0x9009b) [0x7f7e9d94509b]
5: /lib64/libstdc++.so.6(+0x9653c) [0x7f7e9d94b53c]
6: /lib64/libstdc++.so.6(+0x96597) [0x7f7e9d94b597]
7: /lib64/libstdc++.so.6(+0x967f8) [0x7f7e9d94b7f8]
8: /lib64/libstdc++.so.6(+0x9204b) [0x7f7e9d94704b]
9: (MDSMonitor::maybe_resize_cluster(FSMap&, int)+0xb7f) [0x558c2f6d72ff]
10: (MDSMonitor::tick()+0x161) [0x558c2f6d9cd1]
11: (MDSMonitor::on_active()+0x2c) [0x558c2f6c34bc]
12: (PaxosService::_active()+0x1f5) [0x558c2f5fd865]
13: (Context::complete(int)+0xd) [0x558c2f4eaead]
14: (void finish_contexts 
> >(ceph::common::CephContext*, …
15: (Paxos::finish_round()+0x169) [0x558c2f5f4139]
16: (Paxos::commit_finish()+0x8c0) [0x558c2f5f6c60]
17: (C_Committed::finish(int)+0x45) [0x558c2f5fa885]
18: (Context::complete(int)+0xd) [0x558c2f4eaead]
19: (MonitorDBStore::C_DoTransaction::finish(int)+0x98) [0x558c2f5fa5a8]
20: (Context::complete(int)+0xd) [0x558c2f4eaead]
21: (Finisher::finisher_thread_entry()+0x1a5) [0x7f7ea068d6d5]
22: /lib64/libpthread.so.0(+0x814a) [0x7f7e9e32214a]
23: clone()
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Patrick Donnelly

On Fri, Sep 17, 2021 at 2:32 PM 胡 玮文  wrote:
>
> Thank you very much. But the mds still don’t go active.

Did you run the command I suggested before or after you executed
`rmfailed` below?

> While trying to resolve this, I run:
>
> ceph mds rmfailed 0 --yes-i-really-mean-it
>
> ceph mds rmfailed 1 --yes-i-really-mean-it

Oh, that's not good! ...

> Then 3 out of 5 MONs crashed.

What was the crash?

> I was able to keep MON up by making MDSMonitor::maybe_resize_cluster return 
> false directly with gdb. Then I set max_mds back to 2. Now my MONs does not 
> crash.
>
>
>
> I’ve really learnt a lesson from this..
>
> Now I suppose I need to figure out how to undo the “mds rmfailed” command?

There's no CLI to add the ranks back into the failed set. You may be
able to reset your FSMap using `ceph fs reset` but this should be a
last resort as it's not well tested with multiple ranks (you have rank
0 and 1). It's likely you'd lose metadata.

I will compile an addfailed command in a branch but you'll need to
download the packages and run it. Please be careful running
hidden/debugging commands.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Replacing swift with RGW

2021-09-17 Thread Michel Niyoyita

Hello Eugen

Thank you very much for your guidance and support , now everything is
working fine , RGW has been replaced swift as I wanted.

Michel

On Thu, 9 Sep 2021, 13:59 Michel Niyoyita,  wrote:

> Hello Eugen,
>
> Are there other config done on the OpenStack side except creating
> endpoints?I would like to check maybe if I miss something in authentication
> because up to now the dashboard has the same proble. once click on the
> containers it automatically disconnect .
>
> Thanks for your  usual support
>
> On Mon, Sep 6, 2021 at 4:50 PM Eugen Block  wrote:
>
>> It's hard to tell what is wrong with your setup, I don't really use
>> mine but this was the last working config I had to be able to create
>> swift containers directly in RGW:
>>
>> ---snip---
>> # ceph.conf:
>>
>> [client.rgw.ses6-mon1]
>> rgw frontends = "beast port=80"
>> rgw dns name = ses6-mon1.example.com
>> rgw enable usage log = true
>>
>> rgw thread pool size = 512
>> rgw keystone api version = 3
>> rgw keystone url = http://control-node.example.com:5000
>>
>> rgw keystone admin user = rgw
>> rgw keystone admin password = 
>> rgw keystone admin domain = default
>> rgw keystone admin project = service
>> rgw keystone accepted roles = admin,Member,_member_,member
>> rgw keystone verify ssl = false
>> rgw s3 auth use keystone = true
>> rgw keystone revocation interval = 0
>>
>>
>> # User role (I don't think admin is required)
>>
>> openstack role add --user rgw --project 9e8a67da237a4b26afb2819d2dea2219
>> admin
>>
>>
>> # Create keystone endpoints
>>
>> openstack endpoint create --region RegionOne swift admin
>> "http://ses6-mon1.example.com:80/swift/v1";
>> openstack endpoint create --region RegionOne swift internal
>> "http://ses6-mon1.example.com:80/swift/v1";
>> openstack endpoint create --region RegionOne swift public
>> "http://ses6-mon1.example.com:80/swift/v1";
>>
>>
>> # Create container and files
>>
>> openstack container create swift1
>>
>> +-+---+---+
>> | account | container | x-trans-id
>> |
>>
>> +-+---+---+
>> | v1  | swift1| tx1-0060b4ba48-d724dc-default
>> |
>>
>> +-+---+---+
>>
>> openstack object create --name file1 swift1 chef-client.log
>> ++---+--+
>> | object | container | etag |
>> ++---+--+
>> | file1  | swift1| 56a1ed3b201c1e753bcbe80c640349f7 |
>> ++---+--+
>> ---snip---
>>
>>
>> You are mixing dns names and IP addresses, I can't tell if that's a
>> problem but it probably should work, I'm not sure. Compared to my
>> ceph.conf these are the major differences:
>>
>> rgw keystone verify ssl = false
>> rgw s3 auth use keystone = true
>> rgw keystone revocation interval = 0
>>
>> And I don't use rgw_keystone_token_cache_size. Maybe try again with
>> the options I use.
>>
>>
>> Zitat von Michel Niyoyita :
>>
>> > Hello,
>> >
>> > I am trying to replace swift by RGW as backend storage but I failed
>> once I
>> > try to post a container in the OpenStack side however, all interfaces
>> are
>> > configured (admin, public and internal). but Once I post from RGW host
>> it
>> > is created .  Another issue is that object storage does not appear on
>> the
>> > horizon dashboard .  I have deployed openstack all-in-one using
>> > kolla-ansible and Os is ubuntu
>> >
>> > (kolla-open1) stack@kolla-open1:~$ swift -v post myswift
>> > Container POST failed: http://ceph-osd3:8080/swift/v1/myswift 401
>> > Unauthorized   b'AccessDenied'
>> > Failed Transaction ID: tx8-006135dcbd-87d63-default
>> >
>> > (kolla-open1) stack@kolla-open1:~$ swift list
>> > Account GET failed: http://ceph-osd3:8080/swift/v1?format=json 401
>> > Unauthorized  [first 60 chars of response]
>> > b'{"Code":"AccessDenied","RequestId":"txc-'
>> > Failed Transaction ID: txc-006135de42-87d63-default
>> >
>> > Kindly help to solve the issue
>> >
>> > Michel
>> >
>> > On Thu, Sep 2, 2021 at 4:28 PM Alex Schultz 
>> wrote:
>> >
>> >> The swift docs are a bit out of date as they still reference python2
>> >> despite python3 being supported for some time now.  Replace python-
>> with
>> >> python3- and try again.
>> >>
>> >>
>> >> On Thu, Sep 2, 2021 at 7:35 AM Michel Niyoyita 
>> wrote:
>> >>
>> >>>
>> >>>
>> >>> -- Forwarded message -
>> >>> From: Michel Niyoyita 
>> >>> Date: Thu, Sep 2, 2021 at 12:17 PM
>> >>> Subject: Fwd: [ceph-users] Re: Replacing swift with RGW
>> >>> To: 
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> -- Forwarded message -
>> >>> From: Eugen Block 
>> >>> Date: Thu, Sep 2, 2021 at 10:39 AM
>> >>> Subject: Re: [ceph-users] Re: Replacing swift with RGW
>> >>> To: Miche

[ceph-users] Re: Ceph Community Ambassador Sync

2021-09-17 Thread Michel Niyoyita

Hello Mike

Where can we find a list  of ambassadors and their perspective region? I
ask that  to know if our region has someone who present us.

Thank you


On Fri, 17 Sep 2021, 19:25 Mike Perez,  wrote:

> Hi everyone,
>
> We first introduced the Ceph Community Ambassador program in Ceph
> Month back in June. The group is planning to meet for the first time
> on September 23rd at 6:00 UTC to sync on ideas and what's going on in
> their particular region. This is an open event on the Ceph community
> calendar:
>
>
> https://calendar.google.com/calendar/b/1?cid=OXRzOWM3bHQ3dTF2aWMyaWp2dnFxbGZwbzBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
>
> Our agenda can be found on this pad:
>
> https://pad.ceph.com/p/community-ambassadors
>
> Please join us if you're interested in becoming an ambassador for a
> region that isn't listed on the etherpad, or you would just like to
> listen in and provide feedback.
>
> --
> Mike Perez
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread 胡玮文

Thank you very much. But the mds still don’t go active.

While trying to resolve this, I run:
ceph mds rmfailed 0 --yes-i-really-mean-it
ceph mds rmfailed 1 --yes-i-really-mean-it

Then 3 out of 5 MONs crashed. I was able to keep MON up by making 
MDSMonitor::maybe_resize_cluster return false directly with gdb. Then I set 
max_mds back to 2. Now my MONs does not crash.

I’ve really learnt a lesson from this..
Now I suppose I need to figure out how to undo the “mds rmfailed” command?

Current “ceph fs dump”: (note 7 is added to “incompat”, “max_mds” is 2, 
“failed” is cleared)

e41448
enable_multiple, ever_enabled_multiple: 0,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
legacy client fscid: 2

Filesystem 'cephfs' (2)
fs_name cephfs
epoch   41442
flags   12
created 2020-09-15T04:10:53.585782+
modified2021-09-17T17:51:57.582372+
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
required_client_features{}
last_failure0
last_failure_osd_epoch  43315
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no 
anchor table,9=file layout v2,10=snaprealm v2}
max_mds 2
in  0,1
up  {}
failed
damaged
stopped
data_pools  [5,13,16]
metadata_pool   4
inline_data disabled
balancer
standby_count_wanted1


Standby daemons:

[mds.cephfs.gpu024.rpfbnh{-1:7918294} state up:standby seq 1 join_fscid=2 addr 
[v2:202.38.247.187:6800/94739959,v1:202.38.247.187:6801/94739959] compat 
{c=[1],r=[1],i=[7ff]}]
dumped fsmap epoch 41448

发件人: Patrick Donnelly
发送时间: 2021年9月18日 0:24
收件人: 胡 玮文
抄送: Eric Dold; ceph-users
主题: Re: [ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

On Fri, Sep 17, 2021 at 11:11 AM 胡 玮文  wrote:
>
> We are experiencing the same when upgrading to 16.2.6 with cephadm.
>
>
>
> I tried
>
>
>
> ceph fs set cephfs max_mds 1
>
> ceph fs set cephfs allow_standby_replay false
>
>
>
> , but still all MDS goes to standby. It seems all ranks are marked failed. Do 
> we have a way to clear this flag?
> [...]
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds 
> uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file 
> layout v2,10=snaprealm v2}

Please run:

ceph fs compat add_incompat cephfs 7 "mds uses inline data"

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph Community Ambassador Sync

2021-09-17 Thread Mike Perez

Hi everyone,

We first introduced the Ceph Community Ambassador program in Ceph
Month back in June. The group is planning to meet for the first time
on September 23rd at 6:00 UTC to sync on ideas and what's going on in
their particular region. This is an open event on the Ceph community
calendar:

https://calendar.google.com/calendar/b/1?cid=OXRzOWM3bHQ3dTF2aWMyaWp2dnFxbGZwbzBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

Our agenda can be found on this pad:

https://pad.ceph.com/p/community-ambassadors

Please join us if you're interested in becoming an ambassador for a
region that isn't listed on the etherpad, or you would just like to
listen in and provide feedback.

-- 
Mike Perez

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Patrick Donnelly

On Fri, Sep 17, 2021 at 11:11 AM 胡 玮文  wrote:
>
> We are experiencing the same when upgrading to 16.2.6 with cephadm.
>
>
>
> I tried
>
>
>
> ceph fs set cephfs max_mds 1
>
> ceph fs set cephfs allow_standby_replay false
>
>
>
> , but still all MDS goes to standby. It seems all ranks are marked failed. Do 
> we have a way to clear this flag?
> [...]
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds 
> uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file 
> layout v2,10=snaprealm v2}

Please run:

ceph fs compat add_incompat cephfs 7 "mds uses inline data"

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] aws-sdk-cpp-s3 alternative for ceph

2021-09-17 Thread Marc



I was wondering if there some patched aws-sdk that allows it to be used with 
ceph rgw. For instance removes such things:

:EC2MetadataClient: Can not retrieve resource from 
http://169.254.169.254/latest/meta-data/placement/availability-zone




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: debugging radosgw sync errors

2021-09-17 Thread Boris Behrens

While searching for other things I came across this:
[root ~]# radosgw-admin metadata list bucket | grep www1
"www1",
[root ~]# radosgw-admin metadata list bucket.instance | grep www1
"www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103",
"www1.company.dev",
[root ~]# radosgw-admin bucket list | grep www1
"www1",
[root ~]# radosgw-admin metadata rm bucket.instance:www1.company.dev
ERROR: can't remove key: (22) Invalid argument

Maybe this is part of the problem.

Did somebody saw this and know what to do?
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Marc

> > Stopping NFS should not have been necessary. But, yes, reducing
> > max_mds to 1 and disabling allow_standby_replay is required. See:
> > https://docs.ceph.com/en/pacific/cephfs/upgrading/#upgrading-the-mds-
> cluster
> 
> I do no read upgrade notes any more because I just run
> 
> ceph orch upgrade start
> 
> Why does the orchestrator not run the necessary steps?
> 

Indeed! I can remember that this was an argument in the discussion for using 
the automation with containers.




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Robert Sander


On 17.09.21 16:40, Patrick Donnelly wrote:


Stopping NFS should not have been necessary. But, yes, reducing
max_mds to 1 and disabling allow_standby_replay is required. See:
https://docs.ceph.com/en/pacific/cephfs/upgrading/#upgrading-the-mds-cluster


I do no read upgrade notes any more because I just run

ceph orch upgrade start

Why does the orchestrator not run the necessary steps?

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: radosgw find buckets which use the s3website feature

2021-09-17 Thread Boris Behrens

Found it:

for bucket in `radosgw-admin metadata list bucket.instance | jq .[] | cut
-f2 -d\"`; do
  if radosgw-admin metadata get --metadata-key=bucket.instance:$bucket |
grep --silent website_conf; then
echo $bucket
  fi
done

Am Do., 16. Sept. 2021 um 09:49 Uhr schrieb Boris Behrens :

> Hi people,
>
> is there a way to find bucket that use the s3website feature?
>
> Cheers
>  Boris
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] 回复: Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread 胡玮文

We are experiencing the same when upgrading to 16.2.6 with cephadm.

I tried

ceph fs set cephfs max_mds 1
ceph fs set cephfs allow_standby_replay false

, but still all MDS goes to standby. It seems all ranks are marked failed. Do 
we have a way to clear this flag?

Please help. Our cluster is down. Thanks.

# ceph fs status
cephfs - 0 clients
==
RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
0failed
1failed
  POOL  TYPE USED  AVAIL
   cephfs.cephfs.meta metadata   114G   404G
   cephfs.cephfs.data   data84.9T  17.6T
cephfs.cephfs.data_ssd data   0606G
cephfs.cephfs.data_mixeddata9879G   404G
STANDBY MDS
cephfs.gpu023.aetiph
cephfs.gpu018.ovxvoz
cephfs.gpu006.ddpekw
cephfs.gpu024.rpfbnh
MDS version: ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) 
pacific (stable)

# ceph fs dump
e41422
enable_multiple, ever_enabled_multiple: 0,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
legacy client fscid: 2

Filesystem 'cephfs' (2)
fs_name cephfs
epoch   41422
flags   12
created 2020-09-15T04:10:53.585782+
modified2021-09-17T15:05:26.239956+
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
required_client_features{}
last_failure0
last_failure_osd_epoch  43315
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
max_mds 1
in  0,1
up  {}
failed  0,1
damaged
stopped
data_pools  [5,13,16]
metadata_pool   4
inline_data disabled
balancer
standby_count_wanted1


Standby daemons:

[mds.cephfs.gpu023.aetiph{-1:7908668} state up:standby seq 1 join_fscid=2 addr 
[v2:202.38.247.186:6800/3495351337,v1:202.38.247.186:6801/3495351337] compat 
{c=[1],r=[1],i=[7ff]}]
[mds.cephfs.gpu018.ovxvoz{:78cdb8} state up:standby seq 1 join_fscid=2 
addr [v2:202.38.247.181:1a90/94680caa,v1:202.38.247.181:1a91/94680caa] compat 
{c=[1],r=[1],i=[7ff]}]
[mds.cephfs.gpu006.ddpekw{:78f84f} state up:standby seq 1 join_fscid=2 
addr [v2:202.38.247.175:1a90/fdd0fd1a,v1:202.38.247.175:1a91/fdd0fd1a] compat 
{c=[1],r=[1],i=[7ff]}]
[mds.cephfs.gpu024.rpfbnh{:78fc4e} state up:standby seq 1 join_fscid=2 
addr [v2:202.38.247.187:1a90/4e2e69dc,v1:202.38.247.187:1a91/4e2e69dc] compat 
{c=[1],r=[1],i=[7ff]}]
dumped fsmap epoch 41422

发件人: Patrick Donnelly
发送时间: 2021年9月17日 22:42
收件人: Eric Dold
抄送: ceph-users
主题: [ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

On Fri, Sep 17, 2021 at 8:54 AM Eric Dold  wrote:
>
> Hi,
>
> I get the same after upgrading to 16.2.6. All mds daemons are standby.
>
> After setting
> ceph fs set cephfs max_mds 1
> ceph fs set cephfs allow_standby_replay false
> the mds still wants to be standby.
>
> 2021-09-17T14:40:59.371+0200 7f810a58f600  0 ceph version 16.2.6
> (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable), process
> ceph-mds, pid 7113
> 2021-09-17T14:40:59.371+0200 7f810a58f600  1 main not setting numa affinity
> 2021-09-17T14:40:59.371+0200 7f810a58f600  0 pidfile_write: ignore empty
> --pid-file
> 2021-09-17T14:40:59.375+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map to
> version 226251 from mon.0
> 2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map to
> version 226252 from mon.0
> 2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Monitors have
> assigned me to become a standby.
>
> setting add_incompat 1 does also not work:
> # ceph fs compat cephfs add_incompat 1
> Error EINVAL: adding a feature requires a feature string
>
> Any ideas?

Please share `ceph fs dump`.


--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Patrick Donnelly

On Fri, Sep 17, 2021 at 8:19 AM Joshua West  wrote:
>
> Thanks Patrick,
>
> Similar to Robert, when trying that, I simply receive "Error EINVAL:
> adding a feature requires a feature string" 10x times.
>
> I attempted to downgrade, but wasn't able to successfully get my mons
> to come back up, as they had quincy specific "mon data structure
> changes" or something like that.
> So, I've settled into "17.0.0-6762-g0ff2e281889" on my cluster.
>
> cephfs is still down all this time later. (Good thing this is a
> learning cluster not in production, haha)

Yes, sorry the command should have been (note for other readers,
please do not blindly do this):

ceph fs compat add_incompat 1 "base v0.20"
ceph fs compat add_incompat 2 "client writeable ranges"
ceph fs compat add_incompat 3 "default file layouts on dirs"
ceph fs compat add_incompat 4 "dir inode in separate object"
ceph fs compat add_incompat 5 "mds uses versioned encoding"
ceph fs compat add_incompat 6 "dirfrag is stored in omap"
ceph fs compat add_incompat 7 "mds uses inline data"
ceph fs compat add_incompat 8 "no anchor table"
ceph fs compat add_incompat 9 "file layout v2"
ceph fs compat add_incompat 10 "snaprealm v2"

> I began to feel more and more that the issue was related to a damaged
> cephfs, from a recent set of server malfunctions on a single node
> causing mayhem on the cluster.

No, it's not related. The fs was not damaged in any way from this situation.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Patrick Donnelly

On Fri, Sep 17, 2021 at 8:54 AM Eric Dold  wrote:
>
> Hi,
>
> I get the same after upgrading to 16.2.6. All mds daemons are standby.
>
> After setting
> ceph fs set cephfs max_mds 1
> ceph fs set cephfs allow_standby_replay false
> the mds still wants to be standby.
>
> 2021-09-17T14:40:59.371+0200 7f810a58f600  0 ceph version 16.2.6
> (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable), process
> ceph-mds, pid 7113
> 2021-09-17T14:40:59.371+0200 7f810a58f600  1 main not setting numa affinity
> 2021-09-17T14:40:59.371+0200 7f810a58f600  0 pidfile_write: ignore empty
> --pid-file
> 2021-09-17T14:40:59.375+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map to
> version 226251 from mon.0
> 2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map to
> version 226252 from mon.0
> 2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Monitors have
> assigned me to become a standby.
>
> setting add_incompat 1 does also not work:
> # ceph fs compat cephfs add_incompat 1
> Error EINVAL: adding a feature requires a feature string
>
> Any ideas?

Please share `ceph fs dump`.


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Patrick Donnelly

On Fri, Sep 17, 2021 at 5:54 AM Robert Sander
 wrote:
>
> Hi,
>
> I had to run
>
> ceph fs set cephfs max_mds 1
> ceph fs set cephfs allow_standby_replay false
>
> and stop all MDS and NFS containers and start one after the other again
> to clear this issue.

Stopping NFS should not have been necessary. But, yes, reducing
max_mds to 1 and disabling allow_standby_replay is required. See:
https://docs.ceph.com/en/pacific/cephfs/upgrading/#upgrading-the-mds-cluster

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v16.2.6 Pacific released

2021-09-17 Thread Adrian Nicolae


Hi,

Does the 16.2.6 version fixed the following bug :

https://github.com/ceph/ceph/pull/42690

?

It's not listed in the changelog.




Message: 3
Date: Thu, 16 Sep 2021 15:48:42 -0400
From: David Galloway 
Subject: [ceph-users] v16.2.6 Pacific released
To: ceph-annou...@ceph.io, ceph-users@ceph.io, d...@ceph.io,
ceph-maintain...@ceph.io
Message-ID: <1d402d62-5b3e-b62e-c68c-3fb2b30f1...@redhat.com>
Content-Type: text/plain; charset=utf-8

We're happy to announce the 6th backport release in the Pacific series.
We recommend users to update to this release. For a detailed release
notes with links & changelog please refer to the official blog entry at
https://ceph.io/en/news/blog/2021/v16-2-6-pacific-released

Notable Changes
---

* MGR: The pg_autoscaler has a new default 'scale-down' profile which
provides more performance from the start for new pools (for newly
created clusters). Existing clusters will retain the old behavior, now
called the 'scale-up' profile. For more details, see:
https://docs.ceph.com/en/latest/rados/operations/placement-groups/

* CephFS: the upgrade procedure for CephFS is now simpler. It is no
longer necessary to stop all MDS before upgrading the sole active MDS.
After disabling standby-replay, reducing max_mds to 1, and waiting for
the file systems to become stable (each fs with 1 active and 0 stopping
daemons), a rolling upgrade of all MDS daemons can be performed.

* Dashboard: now allows users to set up and display a custom message
(MOTD, warning, etc.) in a sticky banner at the top of the page. For
more details, see:
https://docs.ceph.com/en/pacific/mgr/dashboard/#message-of-the-day-motd

* Several fixes in BlueStore, including a fix for the deferred write
regression, which led to excessive RocksDB flushes and compactions.
Previously, when bluestore_prefer_deferred_size_hdd was equal to or more
than bluestore_max_blob_size_hdd (both set to 64K), all the data was
deferred, which led to increased consumption of the column family used
to store deferred writes in RocksDB. Now, the
bluestore_prefer_deferred_size parameter independently controls deferred
writes, and only writes smaller than this size use the deferred write path.

* The default value of osd_client_message_cap has been set to 256, to
provide better flow control by limiting maximum number of in-flight
client requests.

* PGs no longer show a active+clean+scrubbing+deep+repair state when
osd_scrub_auto_repair is set to true, for regular deep-scrubs with no
repair required.

* ceph-mgr-modules-core debian package does not recommend ceph-mgr-rook
anymore. As the latter depends on python3-numpy which cannot be imported
in different Python sub-interpreters multi-times if the version of
python3-numpy is older than 1.19. Since apt-get installs the Recommends
packages by default, ceph-mgr-rook was always installed along with
ceph-mgr debian package as an indirect dependency. If your workflow
depends on this behavior, you might want to install ceph-mgr-rook
separately.

* This is the first release built for Debian Bullseye.


Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-16.2.6.tar.gz
* Containers at https://hub.docker.com/r/ceph/ceph/tags?name=v16.2.6
* For packages, see https://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: ee28fb57e47e9f88813e24bbf4c14496ca299d31


--


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] debugging radosgw sync errors

2021-09-17 Thread Boris Behrens

Hello again,

as my tests with some fresh clusters answerd most of my config questions, I
now wanted to start with our production cluster and the basic setup looks
good, but the sync does not work:

[root@3cecef5afb05 ~]# radosgw-admin sync status
  realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (company)
  zonegroup f6f3f550-89f0-4c0d-b9b0-301a06c52c16 (bc01)
   zone a7edb6fe-737f-4a1c-a333-0ba0566bb3dd (bc01)
  metadata sync preparing for full sync
full sync: 64/64 shards
full sync: 0 entries to sync
failed to fetch master sync status: (5) Input/output error

[root@3cecef5afb05 ~]# radosgw-admin metadata sync run
2021-09-17 16:23:08.346 7f6c83c63840  0 meta sync: ERROR: failed to fetch
metadata sections
ERROR: sync.run() returned ret=-5
2021-09-17 16:23:08.474 7f6c83c63840  0 RGW-SYNC:meta: ERROR: failed to
fetch all metadata keys (r=-5)

And when I check "radosgw-admin period get", the sync_status is just an
array of empty strings:
[root@3cecef5afb05 ~]# radosgw-admin period get
{
"id": "e8fc96f1-ae86-4dc1-b432-470b0772fded",
"epoch": 71,
"predecessor_uuid": "5349ac85-3d6d-4088-993f-7a1d4be3835a",
"sync_status": [
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",

How can I debug what is going wrong?
I triet to dig into the logs and see a lot of these messages:
2021-09-17 14:06:04.144 7f755b4e7700  1 civetweb: 0x5641a22b33a8:
IPV6_OF_OUR_HAPROXY - - [17/Sep/2021:14:06:04 +] "GET
/admin/log/?type=metadata&status&rgwx-zonegroup=da651dc1-2663-4e1b-af2e-ac4454f24c9d
HTTP/1.1" 403 439 - -
2021-09-17 14:06:11.646 7f755f4ef700  1 civetweb: 0x5641a22ae4e8:
IPV6_OF_OUR_HAPROXY - - [17/Sep/2021:14:06:11 +] "POST
/admin/realm/period?period=e8fc96f1-ae86-4dc1-b432-470b0772fded&epoch=71&rgwx-zonegroup=da651dc1-2663-4e1b-af2e-ac4454f24c9d
HTTP/1.1" 403 439 - -

The 403 status makes me think I might have an access problem, but pulling
the realm/period from the master was successful. Also the period commit
from the new cluster worked fine.
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-17 Thread Eugen Block

Since I'm trying to test different erasure encoding plugin and  
technique I don't want the balancer active.
So I tried setting it to none as Eguene suggested, and to my  
surprise I did not get any degraded messages at all, and the cluster  
was in HEALTH_OK the whole time.


Interesting, maybe the balancer works differently now? Or it works  
differently under heavy load?


The logs you provided indeed mention the balancer many times in lines  
like these:


 Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug  
2021-09-17T06:30:01.322+ 7f66afb28700  0 mon.pech-mon-1@0(leader)  
e7 handle_command mon_command({"prefix": "osd pg-upmap-items",  
"format": "json", "pgid": "12.309", "id": [311, 344]} v 0) v1


The only suspicious lines I see are these:

 Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug  
2021-09-17T06:30:01.402+ 7f66b0329700  1 heartbeat_map  
reset_timeout 'Monitor::cpu_tp thread 0x7f66b0329700' had timed out  
after 0.0s


But I'm not sure if this is related. The out OSDs shouldn't have any  
impact on this test.


Did you monitor the network saturation during these tests with iftop  
or something similar?



Zitat von Kai Stian Olstad :


On 16.09.2021 15:51, Josh Baergen wrote:

I assume it's the balancer module. If you write lots of data quickly
into the cluster the distribution can vary and the balancer will try
to even out the placement.


The balancer won't cause degradation, only misplaced objects.


Since I'm trying to test different erasure encoding plugin and  
technique I don't want the balancer active.
So I tried setting it to none as Eguene suggested, and to my  
surprise I did not get any degraded messages at all, and the cluster  
was in HEALTH_OK the whole time.




   Degraded data redundancy: 260/11856050 objects degraded
(0.014%), 1 pg degraded


That status definitely indicates that something is wrong. Check your
cluster logs on your mons (/var/log/ceph/ceph.log) for the cause; my
guess is that you have OSDs flapping (rapidly going down and up again)
due to either overload (disk or network) or some sort of
misconfiguration.


So I enabled the balancer and run the rados bench again and the  
degraded messages is back.


I guess the equivalent log to /var/log/ceph/ceph.log in Cephadm is
  journalctl -u  
ceph-b321e76e-da3a-11eb-b75c-4f948441...@mon.pech-mon-1.service


There are no messages about osd being marked down, so I don't  
understand why this is happening.

I probably need to raise some verbose value.

I have attach the log from journalctl, it start at 06:30:00 when I  
started the rados bench and included a few lines after the first  
degrade message at 06:31.06.
Just be aware that 15 OSD is set to out, since I have some problem  
with the a HBA on one host, all test has been done with those 15 OSD  
in status out.


--
Kai Stian Olstad




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CentOS Linux 8 EOL

2021-09-17 Thread Konstantin Shalygin

Currently, we on CentOS8 Stream use usual Ceph repo: 

[root@k8s-prod-worker0 /]# dnf info ceph-osd
Last metadata expiration check: 0:00:06 ago on Fri 17 Sep 2021 08:44:30 PM +07.
Available Packages
Name : ceph-osd
Epoch: 2
Version  : 16.2.5
Release  : 0.el8
Architecture : x86_64
Size : 18 M
Source   : ceph-16.2.5-0.el8.src.rpm
Repository   : ceph
Summary  : Ceph Object Storage Daemon
URL  : http://ceph.com/
License  : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 
and BSD-3-Clause and MIT
Description  : ceph-osd is the object storage daemon for the Ceph distributed 
file
 : system.  It is responsible for storing objects on a local file 
system
 : and providing access to them over the network.

[root@k8s-prod-worker0 /]# uname -a
Linux k8s-prod-worker0 5.13.12-1.el8.elrepo.x86_64 #1 SMP Tue Aug 17 10:51:25 
EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@k8s-prod-worker0 /]# cat /etc/os-release
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/";
BUG_REPORT_URL="https://bugzilla.redhat.com/";
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
[root@k8s-prod-worker0 /]#

Also Ceph almost ready for EL9, AFAIK

k

> On 7 Sep 2021, at 16:29, Dan van der Ster  wrote:
> 
> We wanted to clarify the plans / expectations for when CentOS Linux 8
> reaches EOL at the end of this year.
> The default plan for our prod clusters is to upgrade servers in place
> from Linux 8.4 to Stream 8. (We already started this a couple months
> ago, ran into a couple minor fixable issues, and will resume these
> upgrades soon).
> 
> Currently the el8 octopus/pacific builds on download.ceph.com 
>  are
> built on CentOS Linux 8 (AFAICT) -- which OS will those be built on
> when Linux 8 is no more?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] September Ceph Science Virtual User Group Meeting

2021-09-17 Thread Kevin Hrpcek


Hey all,

We will be having a Ceph science/research/big cluster call on Wednesday 
September 22nd. If anyone wants to discuss something specific they can 
add it to the pad linked below. If you have questions or comments you 
can contact me.


This is an informal open call of community members mostly from 
hpc/htc/research environments where we discuss whatever is on our minds 
regarding ceph. Updates, outages, features, maintenance, etc...there is 
no set presenter but I do attempt to keep the conversation lively.


https://pad.ceph.com/p/Ceph_Science_User_Group_20210922 



We try to keep it to an hour or less.

Ceph calendar event details:

September 22, 2021
14:00 UTC
4pm Central European
9am Central US

Description: Main pad for discussions: 
https://pad.ceph.com/p/Ceph_Science_User_Group_Index

Meetings will be recorded and posted to the Ceph Youtube channel.
To join the meeting on a computer or mobile phone: 
https://bluejeans.com/908675367?src=calendarLink

To join from a Red Hat Deskphone or Softphone, dial: 84336.
Connecting directly from a room system?
    1.) Dial: 199.48.152.152 or bjn.vc
    2.) Enter Meeting ID: 908675367
Just want to dial in on your phone?
    1.) Dial one of the following numbers: 408-915-6466 (US)
    See all numbers: https://www.redhat.com/en/conference-numbers
    2.) Enter Meeting ID: 908675367
    3.) Press #
Want to test your video connection? https://bluejeans.com/111


Kevin

--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v16.2.6 Pacific released

2021-09-17 Thread Konstantin Shalygin

Thanks Cory,

Adrian, FYI


k

> On 17 Sep 2021, at 16:15, Cory Snyder  wrote:
> 
> Orchestrator issues don't get their own backport trackers because the team 
> lead handles these backports and does them in batches. This patch did make it 
> into the 16.2.6 release via this batch backport PR:
> 
> https://github.com/ceph/ceph/pull/43029 
> 
> 
> -Cory

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v16.2.6 Pacific released

2021-09-17 Thread Cory Snyder

Hi Konstantin,

Orchestrator issues don't get their own backport trackers because the team
lead handles these backports and does them in batches. This patch did make
it into the 16.2.6 release via this batch backport PR:

https://github.com/ceph/ceph/pull/43029

-Cory




On Fri, Sep 17, 2021 at 6:43 AM Konstantin Shalygin  wrote:

> Hi,
>
> For some reason backport bot is't created backport issue for this, then
> ticket just closed without pacific backport
>
>
> k
>
> > On 17 Sep 2021, at 13:34, Adrian Nicolae 
> wrote:
> >
> > Hi,
> >
> > Does the 16.2.6 version fixed the following bug :
> >
> > https://github.com/ceph/ceph/pull/42690 <
> https://github.com/ceph/ceph/pull/42690>
> >
> > ?
> >
> > It's not listed in the changelog.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Eric Dold

Hi,

I get the same after upgrading to 16.2.6. All mds daemons are standby.

After setting
ceph fs set cephfs max_mds 1
ceph fs set cephfs allow_standby_replay false
the mds still wants to be standby.

2021-09-17T14:40:59.371+0200 7f810a58f600  0 ceph version 16.2.6
(ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable), process
ceph-mds, pid 7113
2021-09-17T14:40:59.371+0200 7f810a58f600  1 main not setting numa affinity
2021-09-17T14:40:59.371+0200 7f810a58f600  0 pidfile_write: ignore empty
--pid-file
2021-09-17T14:40:59.375+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map to
version 226251 from mon.0
2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map to
version 226252 from mon.0
2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Monitors have
assigned me to become a standby.

setting add_incompat 1 does also not work:
# ceph fs compat cephfs add_incompat 1
Error EINVAL: adding a feature requires a feature string

Any ideas?

On Fri, Sep 17, 2021 at 2:19 PM Joshua West  wrote:

> Thanks Patrick,
>
> Similar to Robert, when trying that, I simply receive "Error EINVAL:
> adding a feature requires a feature string" 10x times.
>
> I attempted to downgrade, but wasn't able to successfully get my mons
> to come back up, as they had quincy specific "mon data structure
> changes" or something like that.
> So, I've settled into "17.0.0-6762-g0ff2e281889" on my cluster.
>
> cephfs is still down all this time later. (Good thing this is a
> learning cluster not in production, haha)
>
> I began to feel more and more that the issue was related to a damaged
> cephfs, from a recent set of server malfunctions on a single node
> causing mayhem on the cluster.
> (I went away for a bit, came back and one node had been killing itself
> every hour for 2 weeks, as it went on strike from the heat in the
> garage where it was living.)
>
> Recently went through the cephfs disaster recovery steps per the docs,
> with breaks per the docs to check if things were working in between
> some steps:
> cephfs-journal-tool --rank=cephfs:0 journal inspect
> cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
> cephfs-journal-tool --rank=cephfs:0 journal reset
> ceph fs reset cephfs --yes-i-really-mean-it
> #Check if working
> cephfs-table-tool all reset session
> cephfs-table-tool all reset snap
> cephfs-table-tool all reset inode
> #Check if working
> cephfs-data-scan init
>
> for ID in `seq 511`; do cephfs-data-scan scan_extents --worker_n $ID
> --worker_m 512 cephfs_data & done
> for ID in `seq 511`; do cephfs-data-scan scan_inodes --worker_n $ID
> --worker_m 512 cephfs_data & done
> (If anyone here can update the docs, cephfs-data-scan scan_extents,
> and scan_inodes, could use a for loop with many workers as I had to
> abandon running with 4 workers per the docs after over a week, but
> running 512 finished in a day)
>
> cephfs-data-scan scan_links
> cephfs-data-scan cleanup cephfs_data
>
> But mds still fail to come up, though the error has changed.
>
> ceph fs set cephfs max_mds 1
> ceph fs set cephfs allow_standby_replay false
>
> systemctl start ceph-mds@rog
> SEE ATTACHED LOGS
>
>
>
>
> Any guidance that can be offered would be greatly appreciated, as I've
> been without my cephfs data for almost 3 months now.
>
> Joshua
>
> On Fri, Sep 17, 2021 at 3:53 AM Robert Sander
>  wrote:
> >
> > Hi,
> >
> > I had to run
> >
> > ceph fs set cephfs max_mds 1
> > ceph fs set cephfs allow_standby_replay false
> >
> > and stop all MDS and NFS containers and start one after the other again
> > to clear this issue.
> >
> > Regards
> > --
> > Robert Sander
> > Heinlein Consulting GmbH
> > Schwedter Str. 8/9b, 10119 Berlin
> >
> > https://www.heinlein-support.de
> >
> > Tel: 030 / 405051-43
> > Fax: 030 / 405051-19
> >
> > Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> > Geschäftsführer: Peer Heinlein - Sitz: Berlin
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: HEALTH_WARN: failed to probe daemons or devices after upgrade to 16.2.6

2021-09-17 Thread Eugen Block

Was there a MON running previously on that host? Do you see the daemon  
when running 'cephadm ls'? If so, remove it with 'cephadm rm-daemon  
--name mon.s-26-9-17'



Zitat von Fyodor Ustinov :


Hi!

After upgrading to version 16.2.6, my cluster is in this state:

root@s-26-9-19-mon-m1:~# ceph -s
  cluster:
id: 1ef45b26-dbac-11eb-a357-616c355f48cb
health: HEALTH_WARN
failed to probe daemons or devices

In logs:

9/17/21 1:30:40 PM[ERR]cephadm exited with an error code: 1,  
stderr:Inferring config  
/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config  
ERROR: [Errno 2] No such file or directory:  
'/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config'  
Traceback (most recent call last): File  
"/usr/share/ceph/mgr/cephadm/serve.py", line 1366, in  
_remote_connection yield (conn, connr) File  
"/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in _run_cephadm  
code, '\n'.join(err))) orchestrator._interface.OrchestratorError:  
cephadm exited with an error code: 1, stderr:Inferring config  
/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config  
ERROR: [Errno 2] No such file or directory:  
'/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config'


9/17/21 1:30:39 PM[WRN]ERROR: [Errno 2] No such file or directory:  
'/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config'


9/17/21 1:30:39 PM[WRN] host s-26-9-17 `cephadm ceph-volume` failed:  
cephadm exited with an error code: 1, stderr:Inferring config  
/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config


9/17/21 1:30:39 PM[WRN][WRN] CEPHADM_REFRESH_FAILED: failed to probe  
daemons or devices


But:
1. There is no mon service on the s-26-9-17 node.
2. All services in the cluster work normally.

What could be causing this and how can it be fixed? Thanks in  
advance for your help!



WBR,
Fyodor.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Joshua West

Thanks Patrick,

Similar to Robert, when trying that, I simply receive "Error EINVAL:
adding a feature requires a feature string" 10x times.

I attempted to downgrade, but wasn't able to successfully get my mons
to come back up, as they had quincy specific "mon data structure
changes" or something like that.
So, I've settled into "17.0.0-6762-g0ff2e281889" on my cluster.

cephfs is still down all this time later. (Good thing this is a
learning cluster not in production, haha)

I began to feel more and more that the issue was related to a damaged
cephfs, from a recent set of server malfunctions on a single node
causing mayhem on the cluster.
(I went away for a bit, came back and one node had been killing itself
every hour for 2 weeks, as it went on strike from the heat in the
garage where it was living.)

Recently went through the cephfs disaster recovery steps per the docs,
with breaks per the docs to check if things were working in between
some steps:
cephfs-journal-tool --rank=cephfs:0 journal inspect
cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
cephfs-journal-tool --rank=cephfs:0 journal reset
ceph fs reset cephfs --yes-i-really-mean-it
#Check if working
cephfs-table-tool all reset session
cephfs-table-tool all reset snap
cephfs-table-tool all reset inode
#Check if working
cephfs-data-scan init

for ID in `seq 511`; do cephfs-data-scan scan_extents --worker_n $ID
--worker_m 512 cephfs_data & done
for ID in `seq 511`; do cephfs-data-scan scan_inodes --worker_n $ID
--worker_m 512 cephfs_data & done
(If anyone here can update the docs, cephfs-data-scan scan_extents,
and scan_inodes, could use a for loop with many workers as I had to
abandon running with 4 workers per the docs after over a week, but
running 512 finished in a day)

cephfs-data-scan scan_links
cephfs-data-scan cleanup cephfs_data

But mds still fail to come up, though the error has changed.

ceph fs set cephfs max_mds 1
ceph fs set cephfs allow_standby_replay false

systemctl start ceph-mds@rog
SEE ATTACHED LOGS

Any guidance that can be offered would be greatly appreciated, as I've
been without my cephfs data for almost 3 months now.

Joshua

On Fri, Sep 17, 2021 at 3:53 AM Robert Sander
 wrote:
>
> Hi,
>
> I had to run
>
> ceph fs set cephfs max_mds 1
> ceph fs set cephfs allow_standby_replay false
>
> and stop all MDS and NFS containers and start one after the other again
> to clear this issue.
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v16.2.6 Pacific released

2021-09-17 Thread Francesco Piraneo G.


Hi,

Have you released a procedure to safely upgrade the cluster?

Or just launching "apt upgrade" randomly we do the job?

Thanks.

F.


Il 16.09.21 21:48, David Galloway ha scritto:

We're happy to announce the 6th backport release in the Pacific series.
We recommend users to update to this release. For a detailed release
notes with links & changelog please refer to the official blog entry at
https://ceph.io/en/news/blog/2021/v16-2-6-pacific-released

Notable Changes
---

* MGR: The pg_autoscaler has a new default 'scale-down' profile which
provides more performance from the start for new pools (for newly
created clusters). Existing clusters will retain the old behavior, now
called the 'scale-up' profile. For more details, see:
https://docs.ceph.com/en/latest/rados/operations/placement-groups/

* CephFS: the upgrade procedure for CephFS is now simpler. It is no
longer necessary to stop all MDS before upgrading the sole active MDS.
After disabling standby-replay, reducing max_mds to 1, and waiting for
the file systems to become stable (each fs with 1 active and 0 stopping
daemons), a rolling upgrade of all MDS daemons can be performed.

* Dashboard: now allows users to set up and display a custom message
(MOTD, warning, etc.) in a sticky banner at the top of the page. For
more details, see:
https://docs.ceph.com/en/pacific/mgr/dashboard/#message-of-the-day-motd

* Several fixes in BlueStore, including a fix for the deferred write
regression, which led to excessive RocksDB flushes and compactions.
Previously, when bluestore_prefer_deferred_size_hdd was equal to or more
than bluestore_max_blob_size_hdd (both set to 64K), all the data was
deferred, which led to increased consumption of the column family used
to store deferred writes in RocksDB. Now, the
bluestore_prefer_deferred_size parameter independently controls deferred
writes, and only writes smaller than this size use the deferred write path.

* The default value of osd_client_message_cap has been set to 256, to
provide better flow control by limiting maximum number of in-flight
client requests.

* PGs no longer show a active+clean+scrubbing+deep+repair state when
osd_scrub_auto_repair is set to true, for regular deep-scrubs with no
repair required.

* ceph-mgr-modules-core debian package does not recommend ceph-mgr-rook
anymore. As the latter depends on python3-numpy which cannot be imported
in different Python sub-interpreters multi-times if the version of
python3-numpy is older than 1.19. Since apt-get installs the Recommends
packages by default, ceph-mgr-rook was always installed along with
ceph-mgr debian package as an indirect dependency. If your workflow
depends on this behavior, you might want to install ceph-mgr-rook
separately.

* This is the first release built for Debian Bullseye.


Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-16.2.6.tar.gz
* Containers at https://hub.docker.com/r/ceph/ceph/tags?name=v16.2.6
* For packages, see https://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: ee28fb57e47e9f88813e24bbf4c14496ca299d31

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v16.2.6 Pacific released

2021-09-17 Thread Konstantin Shalygin

Hi,

For some reason backport bot is't created backport issue for this, then ticket 
just closed without pacific backport


k

> On 17 Sep 2021, at 13:34, Adrian Nicolae  wrote:
> 
> Hi,
> 
> Does the 16.2.6 version fixed the following bug :
> 
> https://github.com/ceph/ceph/pull/42690 
> 
> 
> ?
> 
> It's not listed in the changelog.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] HEALTH_WARN: failed to probe daemons or devices after upgrade to 16.2.6

2021-09-17 Thread Fyodor Ustinov

Hi!

After upgrading to version 16.2.6, my cluster is in this state:

root@s-26-9-19-mon-m1:~# ceph -s
  cluster:
id: 1ef45b26-dbac-11eb-a357-616c355f48cb
health: HEALTH_WARN
failed to probe daemons or devices

In logs:

9/17/21 1:30:40 PM[ERR]cephadm exited with an error code: 1, stderr:Inferring 
config /var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config 
ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config' 
Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", 
line 1366, in _remote_connection yield (conn, connr) File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in _run_cephadm code, 
'\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with 
an error code: 1, stderr:Inferring config 
/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config ERROR: 
[Errno 2] No such file or directory: 
'/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config'

9/17/21 1:30:39 PM[WRN]ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config'

9/17/21 1:30:39 PM[WRN] host s-26-9-17 `cephadm ceph-volume` failed: cephadm 
exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/1ef45b26-dbac-11eb-a357-616c355f48cb/mon.s-26-9-17/config

9/17/21 1:30:39 PM[WRN][WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or 
devices

But:
1. There is no mon service on the s-26-9-17 node.
2. All services in the cluster work normally.

What could be causing this and how can it be fixed? Thanks in advance for your 
help!


WBR,
Fyodor.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Robert Sander


Hi,

I had to run

ceph fs set cephfs max_mds 1
ceph fs set cephfs allow_standby_replay false

and stop all MDS and NFS containers and start one after the other again 
to clear this issue.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread Robert Sander


Hi,

On 20.08.21 23:58, Patrick Donnelly wrote:


Your MDSMap compat is probably what's preventing promotion of
standbys. That's a new change in master (which is also being
backported to Pacific). Did you downgrade back to Pacific?

Try:

for i in $(seq 1 10); do ceph fs compat  add_incompat $i; done


I have a similar situation after upgrading to 16.2.6 on my test cluster.

The three MDSs will not become active but stay in standby. The log says:

Sep 17 11:12:02 cephtest23 bash[242706]: starting mds.cephfs.cephtest23.xbrgkk 
at
Sep 17 11:12:02 cephtest23 bash[242706]: debug 2021-09-17T09:12:02.913+ 
7fc3f9dbc700  1 mds.cephfs.cephtest23.xbrgkk Updating MDS map to version 498 
from mon.0
Sep 17 11:12:03 cephtest23 bash[242706]: debug 2021-09-17T09:12:03.585+ 
7fc3f9dbc700  1 mds.cephfs.cephtest23.xbrgkk Updating MDS map to version 499 
from mon.0
Sep 17 11:12:03 cephtest23 bash[242706]: debug 2021-09-17T09:12:03.585+ 
7fc3f9dbc700  1 mds.cephfs.cephtest23.xbrgkk Monitors have assigned me to 
become a standby.

Running your suggestion from above just returns:

# ceph fs compat cephfs add_incompat 1
Error EINVAL: adding a feature requires a feature string

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS optimizated for machine learning workload

2021-09-17 Thread Yan, Zheng

On Fri, Sep 17, 2021 at 12:14 AM Mark Nelson  wrote:
>
>
>
> On 9/15/21 11:05 PM, Yan, Zheng wrote:
> > On Wed, Sep 15, 2021 at 8:36 PM Mark Nelson  wrote:
> >>
> >> Hi Zheng,
> >>
> >>
> >> This looks great!  Have you noticed any slow performance during
> >> directory splitting?  One of the things I was playing around with last
> >> year was pre-fragmenting directories based on a user supplied hint that
> >> the directory would be big (falling back to normal behavior if it grows
> >> beyond the hint size).  That way you can create the dirfrags upfront and
> >> do the migration before they ever have any associated files.  Do you
> >> think that might be worth trying again given your PRs below?
> >>
> >
> > These PRs do not change directory splitting logic. It's unlikely they
> > will improve performance number of mdtest hard test.  But these PRs
> > remove overhead of journaling  subtreemap and distribute metadata more
> > evenly.  They should improve performance number of mdtest easy test.
> > So I think it's worth a retest.
> >
> > Yan, Zheng
>
>
> I was mostly thinking about:
>
> [3] https://github.com/ceph/ceph/pull/43125
>
> Shouldn't this allow workloads like mdtest hard where you have many
> clients performing file writes/reads/deletes inside a single directory
> (that is split into dirfrags randomly distributed across MDSes) to
> parallelize some of the work? (minus whatever needs to be synchronized
> on the authoritative mds)
>

The tiggers for dirfrags migration in this PR are mkdir and dirfrag
fetch.  Dirfrag first need to be split, then get migrated. I don't
hnow how long these events happen in mdtest hard and how the pause of
split/migration affect the test result.



> We discussed some of this in the performance standup today.  From what
> I've seen the real meat of the problem still rests in the distributed
> cache, locking, and cap revocation,

For performance of single thread or single MDS, yes. The purpose of PR
43125 is distribute metadata more evenly and improve aggregate
performance.

Yan, Zheng

> but it seems like anything we can do
> to reduce the overhead of dirfrag migration is a win.
>



> Mark
>
>
>
>
> >
> >>
> >> Mark
> >>
> >>
> >> On 9/15/21 2:21 AM, Yan, Zheng wrote:
> >>> Following PRs are optimization we (Kuaishou) made for machine learning
> >>> workloads (randomly read billions of small files) .
> >>>
> >>> [1] https://github.com/ceph/ceph/pull/39315
> >>> [2] https://github.com/ceph/ceph/pull/43126
> >>> [3] https://github.com/ceph/ceph/pull/43125
> >>>
> >>> The first PR adds an option that disables dirfrag prefetch. When files
> >>> are accessed randomly, dirfrag prefetch adds lots of useless files to
> >>> cache and causes cache thrash. Performance of MDS can be dropped below
> >>> 100 RPS. When dirfrag prefetch is disabled, MDS sends a getomapval
> >>> request to rados for cache missed lookup.  Single mds can handle about
> >>> 6k cache missed lookup requests per second (all ssd metadata pool).
> >>>
> >>> The second PR optimizes MDS performance for a large number of clients
> >>> and a large number of read-only opened files. It also can greatly
> >>> reduce mds recovery time for read-mostly wordload.
> >>>
> >>> The third PR makes MDS cluster randomly distribute all dirfrags.  MDS
> >>> uses consistent hash to calculate target rank for each dirfrag.
> >>> Compared to dynamic balancer and subtree pin, metadata can be
> >>> distributed among MDSs more evenly. Besides, MDS only migrates single
> >>> dirfrag (instead of big subtree) for load balancing. So MDS has
> >>> shorter pause when doing metadata migration.  The drawbacks of this
> >>> change are:  stat(2) directory can be slow; rename(2) file to
> >>> different directory can be slow. The reason is, with random dirfrag
> >>> distribution, these operations likely involve multiple MDS.
> >>>
> >>> Above three PRs are all merged into an integration branch
> >>> https://github.com/ukernel/ceph/tree/wip-mds-integration.
> >>>
> >>> We (Kuaishou) have run these codes for months, 16 active MDS cluster
> >>> serve billions of small files. In file random read test, single MDS
> >>> can handle about 6k ops,  performance increases linearly with the
> >>> number of active MDS.  In file creation test (mpirun -np 160 -host
> >>> xxx:160 mdtest -F -L -w 4096 -z 2 -b 10 -I 200 -u -d ...), 16 active
> >>> MDS can serve over 100k file creation per second.
> >>>
> >>> Yan, Zheng
> >>>
> >>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-17 Thread Kai Stian Olstad


On 16.09.2021 15:51, Josh Baergen wrote:

I assume it's the balancer module. If you write lots of data quickly
into the cluster the distribution can vary and the balancer will try
to even out the placement.


The balancer won't cause degradation, only misplaced objects.


Since I'm trying to test different erasure encoding plugin and technique 
I don't want the balancer active.
So I tried setting it to none as Eguene suggested, and to my surprise I 
did not get any degraded messages at all, and the cluster was in 
HEALTH_OK the whole time.




Degraded data redundancy: 260/11856050 objects degraded
(0.014%), 1 pg degraded


That status definitely indicates that something is wrong. Check your
cluster logs on your mons (/var/log/ceph/ceph.log) for the cause; my
guess is that you have OSDs flapping (rapidly going down and up again)
due to either overload (disk or network) or some sort of
misconfiguration.


So I enabled the balancer and run the rados bench again and the degraded 
messages is back.


I guess the equivalent log to /var/log/ceph/ceph.log in Cephadm is
  journalctl -u 
ceph-b321e76e-da3a-11eb-b75c-4f948441...@mon.pech-mon-1.service


There are no messages about osd being marked down, so I don't understand 
why this is happening.

I probably need to raise some verbose value.

I have attach the log from journalctl, it start at 06:30:00 when I 
started the rados bench and included a few lines after the first degrade 
message at 06:31.06.
Just be aware that 15 OSD is set to out, since I have some problem with 
the a HBA on one host, all test has been done with those 15 OSD in 
status out.


--
Kai Stian OlstadSep 17 06:30:00 pech-mon-1 conmon[1337]: debug 2021-09-17T06:29:59.994+ 
7f66b232d700  0 log_channel(cluster) log [INF] : overall HEALTH_OK
Sep 17 06:30:00 pech-mon-1 conmon[1337]: cluster 
2021-09-17T06:29:59.317530+ mgr.pech-mon-1.ptrsea
Sep 17 06:30:00 pech-mon-1 conmon[1337]:  (mgr.245802) 345745 : cluster [DBG] 
pgmap v347889: 1025 pgs: 1025 active+clean; 0 B data, 73 TiB used, 2.8 PiB / 
2.9 PiB avail
Sep 17 06:30:00 pech-mon-1 conmon[1337]: cluster 
2021-09-17T06:30:00.000143+ mon.pech-mon-1 (mon.0) 1166236 : 
Sep 17 06:30:00 pech-mon-1 conmon[1337]: cluster [INF] overall HEALTH_OK
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.318+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": "12.6d", 
"id": [293, 327]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.318+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.6d", "id": [293, 327]}]: dispatch
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.318+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": 
"12.144", "id": [307, 351]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.318+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.144", "id": [307, 351]}]: 
dispatch
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": 
"12.17d", "id": [144, 136]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.17d", "id": [144, 136]}]: 
dispatch
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": 
"12.1a2", "id": [199, 189]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.1a2", "id": [199, 189]}]: 
dispatch
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": 
"12.1e1", "id": [289, 344]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.1e1", "id": [289, 344]}]: 
dispatch
Sep 17 06:30:01

[ceph-users] Re: [Ceph-announce] Re: v16.2.6 Pacific released

2021-09-17 Thread Tom Siewert

Hi Fyodor,

> As I understand, command
>
> ceph orch upgrade start --ceph-version 16.2.6
>
> is broken and will not be able to update the ceph?

You should be able to use

ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6

Greetings,
Tom

On 9/17/21 8:40 AM, Fyodor Ustinov wrote:

Hi!

Correction: Containers live at https://quay.io/repository/ceph/ceph now.

As I understand, command

ceph orch upgrade start --ceph-version 16.2.6

is broken and will not be able to update the ceph?

root@s-26-9-19-mon-m1:~# ceph orch upgrade start --ceph-version 16.2.6
Initiating upgrade to docker.io/ceph/ceph:v16.2.6

root@s-26-9-19-mon-m1:~# ceph -s
 health: HEALTH_OK
[...]
   progress:
 Upgrade to docker.io/ceph/ceph:v16.2.6 (0s)
   []

And nothing else happens. I'm trying to upgrade from version 16.2.5

WBR,
 Fyodor.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

40 matches

Mail list logo