[ceph-users] Re: cephfs-top doesn't work

2022-04-19 Thread Jos Collin
This doesn't break anything, but the current version of cephfs-top 
cannot accommodate great number of clients. The workaround is to limit 
the number of clients (if that's possible) or reduce the terminal 
zoom/font size to accommodate 100 clients.

We have a tracker [1] to implement the limit also.

[1] https://tracker.ceph.com/issues/55121

On 19/04/22 20:30, Vladimir Brik wrote:

Yes, `ceph fs perf stats` works.

Reverting to older versions I get "exception: addwstr() returned ERR"

If I manually set self.height, self.width to something large in 
refresh_window_size I can see some data, but there is no way to 
scroll, so I'll probably need to write something myself


Vlad

On 4/18/22 21:20, Xiubo Li wrote:


On 4/19/22 3:43 AM, Vladimir Brik wrote:
Does anybody know why cephfs-top may only display header lines 
(date, client types, metric names) but no actual data?


When I run it, cephfs-top consumes quite a bit of the CPU and 
generates quite a bit of network traffic, but it doesn't actually 
display the data.


I poked around in the source code and it seems like it might be 
curses issue, but I am not sure.



Does there any data from `ceph fs perf stats` ?

Before I hit the same issue it was caused by the curses and windows 
issue, we have fixed that long time ago, you can try to enlarge your 
terminal size and try again.


If that still doesn't work please try to revert some recent commits 
of the cephfs-top to see whether will it work for you. Recently there 
have some new features supported.


-- Xiubo



Vlad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v17.2.0 Quincy released

2022-04-19 Thread Harry G. Coin
Great news!  Any notion when the many pending bug fixes will show up in 
Pacific?  It's been a while.


On 4/19/22 20:36, David Galloway wrote:
We're very happy to announce the first stable release of the Quincy 
series.


We encourage you to read the full release notes at 
https://ceph.io/en/news/blog/2022/v17-2-0-quincy-released/


Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-17.2.0.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see 
https://docs.ceph.com/docs/master/install/get-packages/

* Release git sha1: 43e2e60a7559d3f46c9d53f1ca875fd499a1e35e

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v17.2.0 Quincy released

2022-04-19 Thread David Galloway

We're very happy to announce the first stable release of the Quincy series.

We encourage you to read the full release notes at 
https://ceph.io/en/news/blog/2022/v17-2-0-quincy-released/


Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-17.2.0.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 43e2e60a7559d3f46c9d53f1ca875fd499a1e35e

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: globally disableradosgw lifecycle processing

2022-04-19 Thread Matt Benjamin
Hi Christopher,

Yes, you will need to restart the rgw instance(s).

Matt

On Tue, Apr 19, 2022 at 3:13 PM Christopher Durham  wrote:
>
>
> Hello,
> I am using radosgw with lifecycle processing on multiple buckets. I may have 
> need to globallydisable lifecycle processing and do some investigation.
> Can I do that by setting rgw_lc_max_worker to 0 on my radosgw server?
> I'd rather not push rules to for every bucket with Status: Disabled, or 
> delete them all.
>
> I am using pacific 16.2.7 on Rocky Linux
>
> Thanks
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: df shows wrong size of cephfs share when a subdirectory is mounted

2022-04-19 Thread Ryan Taylor
Thanks for the pointers! It does look like https://tracker.ceph.com/issues/55090
and I am not surprised Dan and I are hitting the same issue...


I am using the latest available Almalinux 8, 4.18.0-348.20.1.el8_5.x86_64

Installing kernel-debuginfo-common-x86_64
I see in 
/usr/src/debug/kernel-4.18.0-348.2.1.el8_5/linux-4.18.0-348.2.1.el8_5.x86_64/fs/ceph/quota.c
for example:

static inline bool ceph_has_realms_with_quotas(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(sb);
struct inode *root = d_inode(sb->s_root);

if (atomic64_read(&mdsc->quotarealms_count) > 0)
return true;
/* if root is the real CephFS root, we don't have quota realms */
if (root && ceph_ino(root) == CEPH_INO_ROOT)
return false;
/* otherwise, we can't know for sure */
return true;
}

So this EL8.5 kernel already has at least some of the patches from 
https://lore.kernel.org/all/20190301175752.17808-1-lhenriq...@suse.com/T/#u
for https://tracker.ceph.com/issues/38482
That does not mention a specific commit, just says "Merged into 5.2-rc1."

So it seems https://tracker.ceph.com/issues/55090  is either a new issue or a 
regression of the previous issue.

Thanks,
-rt

Ryan Taylor
Research Computing Specialist
Research Computing Services, University Systems
University of Victoria


From: Hendrik Peyerl 
Sent: April 19, 2022 6:05 AM
To: Ramana Venkatesh Raja
Cc: Ryan Taylor; ceph-users@ceph.io
Subject: Re: [ceph-users] df shows wrong size of cephfs share when a 
subdirectory is mounted

Notice: This message was sent from outside the University of Victoria email 
system. Please be cautious with links and sensitive information.


I did hit this issue aswell: https://tracker.ceph.com/issues/38482

you will need a kernel >= 5.2 that can handle the quotas on subdirectories.


> On 19. Apr 2022, at 14:47, Ramana Venkatesh Raja  wrote:
>
> On Sat, Apr 16, 2022 at 10:15 PM Ramana Venkatesh Raja  
> wrote:
>>
>> On Thu, Apr 14, 2022 at 8:07 PM Ryan Taylor  wrote:
>>>
>>> Hello,
>>>
>>>
>>> I am using cephfs via Openstack Manila (Ussuri I think).
>>>
>>> The cephfs cluster is v14.2.22 and my client has kernel  
>>> 4.18.0-348.20.1.el8_5.x86_64
>>>
>>>
>>> I have a Manila share
>>>
>>> /volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2
>>>
>>>
>>> that is 5000 GB in size. When I mount it the size is reported correctly:
>>>
>>>
>>> # df -h /cephfs
>>> Filesystem  
>>>Size  Used Avail Use% Mounted on
>>> 10.30.201.3:6789,10.30.202.3:6789,10.30.203.3:6789:/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2
>>>   4.9T  278G  4.7T   6% /cephfs
>>>
>>>
>>> However when I mount a subpath /test1 of my share, then both the size and 
>>> usage are showing the size of the whole cephfs filesystem rather than my 
>>> private share.
>>>
>>>
>>> # df -h /cephfs
>>> Filesystem  
>>>  Size  Used Avail Use% Mounted on
>>> 10.30.201.3:6789,10.30.202.3:6789,10.30.203.3:6789:/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2/test1
>>>   4.0P  277T  3.7P   7% /cephfs
>>>
>>
>> What are the capabilities of the ceph client user ID that you used to
>> mount "/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2/test1" ?
>> Maybe you're hitting this limitation in
>> https://docs.ceph.com/en/latest/cephfs/quota/#limitations ,
>> "Quotas must be configured carefully when used with path-based mount
>> restrictions. The client needs to have access to the directory inode
>> on which quotas are configured in order to enforce them. If the client
>> has restricted access to a specific path (e.g., /home/user) based on
>> the MDS capability, and a quota is configured on an ancestor directory
>> they do not have access to (e.g., /home), the client will not enforce
>> it. When using path-based access restrictions be sure to configure the
>> quota on the directory the client is restricted too (e.g., /home/user)
>> or something nested beneath it. "
>>
>
> Hi Ryan,
>
> I think you maybe actually hitting this
> https://tracker.ceph.com/issues/55090 . Are you facing this issue with
> the FUSE client?
>
> -Ramana
>
>>>
>>> I tried setting the  ceph.quota.max_bytes  xattr on a subdirectory but it 
>>> did not help.
>>>
>>
>> You can't set quota xattr if your ceph client user ID doesn't have 'p'
>> flag in its MDS capabilities,
>> https://docs.ceph.com/en/latest/cephfs/client-auth/#layout-and-quota-restriction-the-p-flag
>> .
>>
>> -Ramana
>>
>>> I'm not sure if the issue is in cephfs or Manila, but what would be 
>>> required to get the right size and usage stats to be reported by df when a 
>>> subpath of a share is mounted?
>>>
>>>
>>> Thanks!
>>>
>>> -rt
>>>
>>>
>>> Ryan Taylor
>>> Research Compu

[ceph-users] CephFS health warnings after deleting millions of files

2022-04-19 Thread David Turner
A rogue process wrote 38M files into a single CephFS directory that took
about a month to delete. We had to increase MDS cache sizes to handle the
increased file volume, but we've been able to reduce all of our settings
back to default.

Ceph cluster is 15.2.11. Cephfs clients are ceph-fuse either
version 14.2.16 or 15.2.11 depending if they've been upgraded yet. Nothing
has changed in the last ~6 months in regards to client versions or cluster
version.

We are currently dealing with 2 issues now that things seem to be cleaned
up.

1. MDSs report slow requests. [1] Dumping the blocked requests has the same
output for all of them. They seemingly get stuck AFTER the event succeeds
to acquire locks. I can't find any information about what's happening after
this or why things are getting stuck here.

2. Clients failing to advance oldest client/flush tid. There are 2 clients
that are the worst offenders for this, but a few other clients are having
this same issue. All of the clients having this issue are on 14.2.16, but
we also have a hundred clients on the same version that don't have this
issue at all. [2] The logs make it look like the clients just have a bad
integer/pointer somehow. We can clean up the error by remounting the
filesystem or rebooting the server, but these 2 clients in particular keep
ending up in this state. No other repeat offenders yet, but we've had 4
other servers in this state over the last couple weeks.

Are there any ideas what the next steps might be for diagnosing either of
these issues? Thank you.

-David Turner



[1] $ sudo ceph daemon mds.mon1 dump_blocked_ops
{
"ops": [
{
"description": "client_request(client.17709580:39254 open
#0x10001c99cd4 2022-02-22T16:25:40.231547+ caller_uid=0,
caller_gid=0{})",
"initiated_at": "2022-04-19T19:07:10.663552+",
"age": 90.920778446,
"duration": 90.92080624405,
"type_data": {
"flag_point": "acquired locks",
"reqid": "client.17709580:39254",
"op_type": "client_request",
"client_info": {
"client": "client.17709580",
"tid": 39254
},
"events": [
{
"time": "2022-04-19T19:07:10.663552+",
"event": "initiated"
},
{
"time": "2022-04-19T19:07:10.663549+",
"event": "throttled"
},
{
"time": "2022-04-19T19:07:10.663552+",
"event": "header_read"
},
{
"time": "2022-04-19T19:07:10.663555+",
"event": "all_read"
},
{
"time": "2022-04-19T19:07:10.665744+",
"event": "dispatched"
},
{
"time": "2022-04-19T19:07:10.773894+",
"event": "failed to xlock, waiting"
},
{
"time": "2022-04-19T19:07:10.807249+",
"event": "acquired locks"
}
]
}
},


[2] 2022-04-19 06:15:36.108 7fb28b7fe700  0 client.30095002
handle_cap_flush_ack mds.1 got unexpected flush ack tid 338611 expected is 0
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph mon issues

2022-04-19 Thread Ilhaan Rasheed
Hello Ceph users,

I have two issues affecting mon nodes in my ceph cluster.

1) mon store keeps growing
store.db directory (/var/lib/ceph/mon/ceph-v60/store.db/) has grown by
almost 20G the last two days. I've been clearing up space in /var and grew
/var a few times. I have compacted the mon store using ceph-monstore-tool a
few times as well, but after a few hours of running ceph-mon, /var becomes
full and I see that store.db is a bigger size than before.
ceph-monstore-tool compact doesn't show any clearing errors.

2) mon ceph_assert_fail
One of my three mon nodes fails to start and shows the following in logs. I
have tried injecting a monmap from mon node and still seeing the same.

Apr 19 10:44:33 v62 ceph-mon[1877692]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el7/BUILD/ceph-15.2.6/src/mon/AuthMonitor.cc:
279: FAILED ceph_assert(ret == 0)
Apr 19 10:44:33 v62 ceph-mon[1877692]: ceph version 15.2.6
(cb8c61a60551b72614257d632a574d420064c17a) octopus (stable)
Apr 19 10:44:33 v62 ceph-mon[1877692]: 1: (ceph::__ceph_assert_fail(char
const*, char const*, int, char const*)+0x14c) [0x7f519e1ec665]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 2: (()+0x26882d) [0x7f519e1ec82d]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 3:
(AuthMonitor::update_from_paxos(bool*)+0x2832) [0x559a623d6282]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 4:
(PaxosService::refresh(bool*)+0x103) [0x559a62474cc3]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 5:
(Monitor::refresh_from_paxos(bool*)+0x17c) [0x559a62355e4c]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 6: (Monitor::init_paxos()+0xfc)
[0x559a6235611c]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 7: (Monitor::preinit()+0xd5f)
[0x559a623773ef]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 8: (main()+0x2398) [0x559a6230e908]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 9: (__libc_start_main()+0xf5)
[0x7f519add9555]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 10: (()+0x2305d0) [0x559a6233f5d0]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 0> 2022-04-19T10:44:33.779-0700
7f51a6faa340 -1 *** Caught signal (Aborted) **
Apr 19 10:44:33 v62 ceph-mon[1877692]: in thread 7f51a6faa340
thread_name:ceph-mon
Apr 19 10:44:33 v62 ceph-mon[1877692]: ceph version 15.2.6
(cb8c61a60551b72614257d632a574d420064c17a) octopus (stable)
Apr 19 10:44:33 v62 ceph-mon[1877692]: 1: (()+0xf630) [0x7f519bffa630]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 2: (gsignal()+0x37) [0x7f519aded387]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 3: (abort()+0x148) [0x7f519adeea78]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 4: (ceph::__ceph_assert_fail(char
const*, char const*, int, char const*)+0x19b) [0x7f519e1ec6b4]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 5: (()+0x26882d) [0x7f519e1ec82d]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 6:
(AuthMonitor::update_from_paxos(bool*)+0x2832) [0x559a623d6282]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 7:
(PaxosService::refresh(bool*)+0x103) [0x559a62474cc3]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 8:
(Monitor::refresh_from_paxos(bool*)+0x17c) [0x559a62355e4c]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 9: (Monitor::init_paxos()+0xfc)
[0x559a6235611c]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 10: (Monitor::preinit()+0xd5f)
[0x559a623773ef]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 11: (main()+0x2398) [0x559a6230e908]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 12: (__libc_start_main()+0xf5)
[0x7f519add9555]
Apr 19 10:44:33 v62 ceph-mon[1877692]: 13: (()+0x2305d0) [0x559a6233f5d0]
Apr 19 10:44:33 v62 ceph-mon[1877692]: NOTE: a copy of the executable, or
`objdump -rdS ` is needed to interpret this.
Apr 19 10:44:33 v62 systemd[1]: ceph-mon@v62.service: main process exited,
code=killed, status=6/ABRT
Apr 19 10:44:33 v62 systemd[1]: Unit ceph-mon@v62.service entered failed
state.
Apr 19 10:44:33 v62 systemd[1]: ceph-mon@v62.service failed.
Apr 19 10:44:35 v62 ceph-mgr[1050242]: :::10.8.12.51 - -
[19/Apr/2022:10:44:35] "GET /metrics HTTP/1.1" 200 - "" "Prometheus/2.27.1"
Apr 19 10:44:43 v62 systemd[1]: ceph-mon@v62.service holdoff time over,
scheduling restart.
Apr 19 10:44:43 v62 systemd[1]: start request repeated too quickly for
ceph-mon@v62.service
Apr 19 10:44:43 v62 systemd[1]: Unit ceph-mon@v62.service entered failed
state.
Apr 19 10:44:43 v62 systemd[1]: ceph-mon@v62.service failed.


Please let me know what else I can provide to help debug these issues

Thanks,
Ilhaan Rasheed

-- 
***
This 
message was sent from RiskIQ, and is intended only for the designated 
recipient(s). It may contain confidential or proprietary information and 
may be subject to confidentiality protections. If you are not a designated 
recipient, you may not review, copy or distribute this message. If you 
receive this in error, please notify the sender by reply e-mail and delete 
this message. Thank you.

***
___
ceph-user

[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2022-04-19 Thread Kai Stian Olstad

On 18.04.2022 21:35, Wesley Dillingham wrote:
If you mark an osd "out" but not down / you dont stop the daemon do the 
PGs

go remapped or do they go degraded then as well?


First I made sure the balancer was active, then I marked one osd "out", 
"ceph osd out 34" and check status every 2 seconds for 2 minutes, no 
degraded messages.
The only new messages in ceph -s was 12 remapped pgs and "11 
active-remapped+backfilling" and "1 active+remapped+backfill_wait"


Previously I had to set all osd(15 disks) on a host to out and there was 
no issue with PG in degraded state.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RGW Multisite Multi Zonegroup Build Problems

2022-04-19 Thread Ulrich Klein
After a bunch of attempts to get multiple zonegroups with RGW multi-site to 
work I’d have a question:

Has anyone successfully created a working setup with multiple zonegroups  with 
RGW multi-site using a cephadm/ceph orch installation of pacific? 

Ciao, Uli

> On 19. 04 2022, at 14:33, Ulrich Klein  wrote:
> 
> Hi,
> 
> I'm trying to do the same as Mark. Basically the same problem. Can’t get it 
> to work.
> The —-master doesn’t make much of a difference for me.
> 
> 
> Any other idea, maybe?
> 
> Ciao, Uli
> 
> On Cluster #1 ("nceph"):
> 
> radosgw-admin realm create --rgw-realm=acme --default
> radosgw-admin zonegroup create --rgw-zonegroup=us --rgw-realm=acme --master 
> --default --endpoints=http://nceph00.uli.home:8080
> radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-west-1 --master 
> --default --endpoints=http://nceph00.uli.home:8080
> radosgw-admin user create --uid="sysuser" --display-name="System User" 
> --system --access-key=N7Y6CM8KIN45UY2J5NQA 
> --secret=a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX
> radosgw-admin zone modify --rgw-zone=us-west-1 
> --access-key=N7Y6CM8KIN45UY2J5NQA 
> --secret=a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX
> radosgw-admin period update --commit
> ceph orch host label add nceph00 rgw
> ceph orch apply rgw acme --realm=acme --zone=us-west-1 '--placement=label:rgw 
> count-per-host:1' --port=8080
> echo -n "N7Y6CM8KIN45UY2J5NQA" > ac
> echo -n "a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX" > sc
> ceph dashboard set-rgw-api-access-key -i ac
> ceph dashboard set-rgw-api-secret-key -i sc
> 
> radosgw-admin period update --commit
> {
>"id": "5b304997-e3ba-4cc2-9f80-af88a31827c3",
>"epoch": 2,
>"predecessor_uuid": "118ecc9a-3824-4560-afdf-98f901836fb2",
>"sync_status": [],
>"period_map": {
>"id": "5b304997-e3ba-4cc2-9f80-af88a31827c3",
>"zonegroups": [
>{
>"id": "1df9e729-8fa0-47fa-942f-b5159fad8360",
>"name": "us",
>"api_name": "us",
>"is_master": "true",
>"endpoints": [
>"http://nceph00.uli.home:8080";
>],
>"hostnames": [],
>"hostnames_s3website": [],
>"master_zone": "7ac5da6e-ea41-43ce-b6fc-f5b6e794933f",
>"zones": [
>{
>"id": "7ac5da6e-ea41-43ce-b6fc-f5b6e794933f",
>"name": "us-west-1",
>"endpoints": [
>"http://nceph00.uli.home:8080";
>],
>"log_meta": "false",
>"log_data": "false",
>"bucket_index_max_shards": 11,
>"read_only": "false",
>"tier_type": "",
>"sync_from_all": "true",
>"sync_from": [],
>"redirect_zone": ""
>}
>],
>"placement_targets": [
>{
>"name": "default-placement",
>"tags": [],
>"storage_classes": [
>"STANDARD"
>]
>}
>],
>"default_placement": "default-placement",
>"realm_id": "657b514d-be49-45c8-a69e-7ee474276c9a",
>"sync_policy": {
>"groups": []
>}
>}
>],
>"short_zone_ids": [
>{
>"key": "7ac5da6e-ea41-43ce-b6fc-f5b6e794933f",
>"val": 1454718312
>}
>]
>},
>"master_zonegroup": "1df9e729-8fa0-47fa-942f-b5159fad8360",
>"master_zone": "7ac5da6e-ea41-43ce-b6fc-f5b6e794933f",
>"period_config": {
>"bucket_quota": {
>"enabled": false,
>"check_on_raw": false,
>"max_size": -1,
>"max_size_kb": 0,
>"max_objects": -1
>},
>"user_quota": {
>"enabled": false,
>"check_on_raw": false,
>"max_size": -1,
>"max_size_kb": 0,
>"max_objects": -1
>}
>},
>"realm_id": "657b514d-be49-45c8-a69e-7ee474276c9a",
>"realm_name": "acme",
>"realm_epoch": 2
> }
> 
> Dashboard works, too
> 
> 
> On cluster #2 ("ceph")
> --
> radosgw-admin realm pull --url=http://nceph00.uli.home:8080 
> --access-key=N7Y6CM8KIN45UY2J5NQA 
> --secret=a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX
> radosgw-admin zonegroup create --rgw-realm=acme --rgw-zonegroup=eu 
> --endpoints=http://ceph00.uli.home:8080
> radosgw-admin zone create --rgw-zone=eu-west-1 --rgw-zonegroup=eu 
> --endpoints=http://ceph00.uli.home:8080
> (With or without —default makes no difference)
> 
> radosgw-admin zone modify --rgw-zone=eu-west-1 --rgw-zonegroup=

[ceph-users] OSD doesn't get marked out if other OSDs are already out

2022-04-19 Thread Julian Einwag
Hi,

I’m currently playing around with a little Ceph test cluster and I’m trying to 
understand why a down OSD won’t get marked out under certain conditions.
It’s a three node cluster with three OSDs in each node, 
mon_osd_down_out_interval is set to 120 seconds. I’m running version 16.2.7. 
There are only replicated pools with the default CRUSH rules.

When I shut down a server, its OSDs are first marked down and then out after 
two minutes, as expected.
But when I stop another OSD on one of the remaining nodes, it will never be 
marked out.

The tree will look like this:
ID  CLASS  WEIGHT   TYPE NAME  STATUS  REWEIGHT  PRI-AFF
-1 0.08817  root default
-5 0.02939  host ceph-test-01
 2hdd  0.00980  osd.2  up   1.0  1.0
 5hdd  0.00980  osd.5  up   1.0  1.0
 6hdd  0.00980  osd.6down   1.0  1.0
-3 0.02939  host ceph-test-02
 0hdd  0.00980  osd.0down 0  1.0
 3hdd  0.00980  osd.3down 0  1.0
 7hdd  0.00980  osd.7down 0  1.0
-7 0.02939  host ceph-test-03
 1hdd  0.00980  osd.1  up   1.0  1.0
 4hdd  0.00980  osd.4  up   1.0  1.0
 8hdd  0.00980  osd.8  up   1.0  1.0

When I bring ceph-test-02 up again, osd.6 is marked out immediately.

I also tried changing mon_osd_min_down_reporters to 1, but that didn’t change 
anything.

I feel like this is working as intended and I’m missing something, so I hope 
somebody can clarify…

Regards,
Julian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: df shows wrong size of cephfs share when a subdirectory is mounted

2022-04-19 Thread Hendrik Peyerl
I did hit this issue aswell: https://tracker.ceph.com/issues/38482

you will need a kernel >= 5.2 that can handle the quotas on subdirectories.


> On 19. Apr 2022, at 14:47, Ramana Venkatesh Raja  wrote:
> 
> On Sat, Apr 16, 2022 at 10:15 PM Ramana Venkatesh Raja  
> wrote:
>> 
>> On Thu, Apr 14, 2022 at 8:07 PM Ryan Taylor  wrote:
>>> 
>>> Hello,
>>> 
>>> 
>>> I am using cephfs via Openstack Manila (Ussuri I think).
>>> 
>>> The cephfs cluster is v14.2.22 and my client has kernel  
>>> 4.18.0-348.20.1.el8_5.x86_64
>>> 
>>> 
>>> I have a Manila share
>>> 
>>> /volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2
>>> 
>>> 
>>> that is 5000 GB in size. When I mount it the size is reported correctly:
>>> 
>>> 
>>> # df -h /cephfs
>>> Filesystem  
>>>Size  Used Avail Use% Mounted on
>>> 10.30.201.3:6789,10.30.202.3:6789,10.30.203.3:6789:/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2
>>>   4.9T  278G  4.7T   6% /cephfs
>>> 
>>> 
>>> However when I mount a subpath /test1 of my share, then both the size and 
>>> usage are showing the size of the whole cephfs filesystem rather than my 
>>> private share.
>>> 
>>> 
>>> # df -h /cephfs
>>> Filesystem  
>>>  Size  Used Avail Use% Mounted on
>>> 10.30.201.3:6789,10.30.202.3:6789,10.30.203.3:6789:/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2/test1
>>>   4.0P  277T  3.7P   7% /cephfs
>>> 
>> 
>> What are the capabilities of the ceph client user ID that you used to
>> mount "/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2/test1" ?
>> Maybe you're hitting this limitation in
>> https://docs.ceph.com/en/latest/cephfs/quota/#limitations ,
>> "Quotas must be configured carefully when used with path-based mount
>> restrictions. The client needs to have access to the directory inode
>> on which quotas are configured in order to enforce them. If the client
>> has restricted access to a specific path (e.g., /home/user) based on
>> the MDS capability, and a quota is configured on an ancestor directory
>> they do not have access to (e.g., /home), the client will not enforce
>> it. When using path-based access restrictions be sure to configure the
>> quota on the directory the client is restricted too (e.g., /home/user)
>> or something nested beneath it. "
>> 
> 
> Hi Ryan,
> 
> I think you maybe actually hitting this
> https://tracker.ceph.com/issues/55090 . Are you facing this issue with
> the FUSE client?
> 
> -Ramana
> 
>>> 
>>> I tried setting the  ceph.quota.max_bytes  xattr on a subdirectory but it 
>>> did not help.
>>> 
>> 
>> You can't set quota xattr if your ceph client user ID doesn't have 'p'
>> flag in its MDS capabilities,
>> https://docs.ceph.com/en/latest/cephfs/client-auth/#layout-and-quota-restriction-the-p-flag
>> .
>> 
>> -Ramana
>> 
>>> I'm not sure if the issue is in cephfs or Manila, but what would be 
>>> required to get the right size and usage stats to be reported by df when a 
>>> subpath of a share is mounted?
>>> 
>>> 
>>> Thanks!
>>> 
>>> -rt
>>> 
>>> 
>>> Ryan Taylor
>>> Research Computing Specialist
>>> Research Computing Services, University Systems
>>> University of Victoria
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: df shows wrong size of cephfs share when a subdirectory is mounted

2022-04-19 Thread Ramana Venkatesh Raja
On Sat, Apr 16, 2022 at 10:15 PM Ramana Venkatesh Raja  wrote:
>
> On Thu, Apr 14, 2022 at 8:07 PM Ryan Taylor  wrote:
> >
> > Hello,
> >
> >
> > I am using cephfs via Openstack Manila (Ussuri I think).
> >
> > The cephfs cluster is v14.2.22 and my client has kernel  
> > 4.18.0-348.20.1.el8_5.x86_64
> >
> >
> > I have a Manila share
> >
> > /volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2
> >
> >
> > that is 5000 GB in size. When I mount it the size is reported correctly:
> >
> >
> > # df -h /cephfs
> > Filesystem  
> >Size  Used Avail Use% Mounted on
> > 10.30.201.3:6789,10.30.202.3:6789,10.30.203.3:6789:/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2
> >   4.9T  278G  4.7T   6% /cephfs
> >
> >
> > However when I mount a subpath /test1 of my share, then both the size and 
> > usage are showing the size of the whole cephfs filesystem rather than my 
> > private share.
> >
> >
> > # df -h /cephfs
> > Filesystem  
> >  Size  Used Avail Use% Mounted on
> > 10.30.201.3:6789,10.30.202.3:6789,10.30.203.3:6789:/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2/test1
> >   4.0P  277T  3.7P   7% /cephfs
> >
>
> What are the capabilities of the ceph client user ID that you used to
> mount "/volumes/_nogroup/55e46a89-31ff-4878-9e2a-81b4226c3cb2/test1" ?
> Maybe you're hitting this limitation in
> https://docs.ceph.com/en/latest/cephfs/quota/#limitations ,
> "Quotas must be configured carefully when used with path-based mount
> restrictions. The client needs to have access to the directory inode
> on which quotas are configured in order to enforce them. If the client
> has restricted access to a specific path (e.g., /home/user) based on
> the MDS capability, and a quota is configured on an ancestor directory
> they do not have access to (e.g., /home), the client will not enforce
> it. When using path-based access restrictions be sure to configure the
> quota on the directory the client is restricted too (e.g., /home/user)
> or something nested beneath it. "
>

Hi Ryan,

I think you maybe actually hitting this
https://tracker.ceph.com/issues/55090 . Are you facing this issue with
the FUSE client?

-Ramana

> >
> > I tried setting the  ceph.quota.max_bytes  xattr on a subdirectory but it 
> > did not help.
> >
>
> You can't set quota xattr if your ceph client user ID doesn't have 'p'
> flag in its MDS capabilities,
> https://docs.ceph.com/en/latest/cephfs/client-auth/#layout-and-quota-restriction-the-p-flag
> .
>
> -Ramana
>
> > I'm not sure if the issue is in cephfs or Manila, but what would be 
> > required to get the right size and usage stats to be reported by df when a 
> > subpath of a share is mounted?
> >
> >
> > Thanks!
> >
> > -rt
> >
> >
> > Ryan Taylor
> > Research Computing Specialist
> > Research Computing Services, University Systems
> > University of Victoria
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RGW Multisite Multi Zonegroup Build Problems

2022-04-19 Thread Ulrich Klein
Hi,

I'm trying to do the same as Mark. Basically the same problem. Can’t get it to 
work.
The —-master doesn’t make much of a difference for me.


Any other idea, maybe?

Ciao, Uli

On Cluster #1 ("nceph"):

radosgw-admin realm create --rgw-realm=acme --default
radosgw-admin zonegroup create --rgw-zonegroup=us --rgw-realm=acme --master 
--default --endpoints=http://nceph00.uli.home:8080
radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-west-1 --master 
--default --endpoints=http://nceph00.uli.home:8080
radosgw-admin user create --uid="sysuser" --display-name="System User" --system 
--access-key=N7Y6CM8KIN45UY2J5NQA 
--secret=a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX
radosgw-admin zone modify --rgw-zone=us-west-1 
--access-key=N7Y6CM8KIN45UY2J5NQA 
--secret=a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX
radosgw-admin period update --commit
ceph orch host label add nceph00 rgw
ceph orch apply rgw acme --realm=acme --zone=us-west-1 '--placement=label:rgw 
count-per-host:1' --port=8080
echo -n "N7Y6CM8KIN45UY2J5NQA" > ac
echo -n "a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX" > sc
ceph dashboard set-rgw-api-access-key -i ac
ceph dashboard set-rgw-api-secret-key -i sc

radosgw-admin period update --commit
{
"id": "5b304997-e3ba-4cc2-9f80-af88a31827c3",
"epoch": 2,
"predecessor_uuid": "118ecc9a-3824-4560-afdf-98f901836fb2",
"sync_status": [],
"period_map": {
"id": "5b304997-e3ba-4cc2-9f80-af88a31827c3",
"zonegroups": [
{
"id": "1df9e729-8fa0-47fa-942f-b5159fad8360",
"name": "us",
"api_name": "us",
"is_master": "true",
"endpoints": [
"http://nceph00.uli.home:8080";
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "7ac5da6e-ea41-43ce-b6fc-f5b6e794933f",
"zones": [
{
"id": "7ac5da6e-ea41-43ce-b6fc-f5b6e794933f",
"name": "us-west-1",
"endpoints": [
"http://nceph00.uli.home:8080";
],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "657b514d-be49-45c8-a69e-7ee474276c9a",
"sync_policy": {
"groups": []
}
}
],
"short_zone_ids": [
{
"key": "7ac5da6e-ea41-43ce-b6fc-f5b6e794933f",
"val": 1454718312
}
]
},
"master_zonegroup": "1df9e729-8fa0-47fa-942f-b5159fad8360",
"master_zone": "7ac5da6e-ea41-43ce-b6fc-f5b6e794933f",
"period_config": {
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
},
"realm_id": "657b514d-be49-45c8-a69e-7ee474276c9a",
"realm_name": "acme",
"realm_epoch": 2
}

Dashboard works, too


On cluster #2 ("ceph")
--
radosgw-admin realm pull --url=http://nceph00.uli.home:8080 
--access-key=N7Y6CM8KIN45UY2J5NQA 
--secret=a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX
radosgw-admin zonegroup create --rgw-realm=acme --rgw-zonegroup=eu 
--endpoints=http://ceph00.uli.home:8080
radosgw-admin zone create --rgw-zone=eu-west-1 --rgw-zonegroup=eu 
--endpoints=http://ceph00.uli.home:8080
(With or without —default makes no difference)

radosgw-admin zone modify --rgw-zone=eu-west-1 --rgw-zonegroup=eu 
--access-key=N7Y6CM8KIN45UY2J5NQA 
--secret=a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX
ceph orch host label add ceph00 rgw
ceph orch apply rgw acme --realm=acme --zone=eu-west-1 '--placement=label:rgw 
count-per-host:1' --port=8080
echo -n "N7Y6CM8KIN45UY2J5NQA" > ac
echo -n "a8QvbAMGpDwPBk8E3t3jHTyTSNqMQi4PK04yN9GX" > sc
ceph dashboard set-rgw-api-access-key -i ac
ceph dashboard set-rgw-api-secret-key -i sc

radosgw-admin period update --commit
couldn't init storage provider

ceph orch ps -

[ceph-users] Re: Ceph RGW Multisite Multi Zonegroup Build Problems

2022-04-19 Thread Eugen Block

Hi,

unless there are copy/paste mistakes involved I believe you shouldn't  
specify '--master' for the secondary zone because you did that already  
for the first zone which is supposed to be the master zone. You  
specified '--rgw-zone=us-west-1' as the master zone within your realm,  
but then you run this command on the second cluster:


radosgw-admin zone create --rgw-zone=eu-west-1 \
  --rgw-zonegroup=eu \
  --default \
  --master \
  --endpoints=https://ceph2dev01.acme.com:443

That's the reason for this error when trying to commit on the second cluster:

 2022-04-16T09:16:20.345-0700 7faf98ab6380  1 Cannot find zone  
id=9f8a06eb-5a1c-4052-b04d-359f21c95371 (name=eu-west-1), switching to  
local zonegroup configuration


Because your master zone is this one:

"master_zone": "d7ceaa4f-06c0-4c21-bcec-efe90f55ecfd"

I'd recommend to purge the secondary zone and start over. If this is  
not the root cause, please update your post.


Regards,
Eugen



Zitat von Mark Selby :


I have been trying to build a multisite ceph rgw installation with a
single realm, multiple zonegroups, and a single zone per zonegroup. This
is a model around 3 locations spread over long distances. I have
sucessfully create an installation with a single realm, a single
zonegroup and multiple zones on that one zonegroup.

I have had no luck getting my multiple zonegroup installation even
setup. I have read the docs over and over but I still think that I am
doing something incorrect (or possibly a bug?)

I am running Pacific 16.2.7 in a containerized environment

I have created a github gist of all of the commands and output show
below as that may be easier to read for some.

https://gist.github.com/tokenrain/4edf85b0060ce5004f2003aa8a66e67d

Cluster 1 and Cluster 2 are separate ceph clusters. Cluster 1 commands
were run on a node in cluster1 and Cluster 2 commands were run on a node
in cluster2

All and any help is greatly appreciated.


# TOPOLOGY #


realm = acme.com
  zonegroup = us
zone = us-west-1
  zonegroup = eu
zone = eu-west-1
  zonegroup = as
zone = as-west-1

##
# CLUSTER 1 COMMANDS #
##

radosgw-admin realm create --rgw-realm=acme --default

radosgw-admin zonegroup create --rgw-zonegroup=us --rgw-realm=acme  
--master --default --endpoints=https://ceph1dev01.acme.com:443


radosgw-admin zone create --rgw-zonegroup=us \
  --rgw-zone=us-west-1 \
  --master \
  --default \
  --endpoints=https://ceph1dev01.acme.com:443

radosgw-admin user create --uid="sync-user"  
--display-name="Synchronization User" --system


radosgw-admin zone modify --rgw-zonegroup=us --rgw-zone=us-west-1  
--access-key= --secret=


radosgw-admin period update --commit

ceph orch apply -i rgw-us-west-1.yml

#
# Cluster 1 -- RGW Spec #
#
---
service_type: rgw
service_id: us-west-1
placement:
  hosts:
- ceph1dev01.acme.com
- ceph1dev02.acme.com
- ceph1dev03.acme.com
spec:
  ssl: true
  rgw_realm: acme
  rgw_zone: us-west-1
  rgw_frontend_port: 443
  rgw_frontend_type: beast
  rgw_frontend_ssl_certificate: |

##
# CLUSTER 2 COMMANDS #
##

radosgw-admin realm pull --rgw-realm=acme  
--url=https://ceph1dev01.acme.com:443 --access-key=  
--secret= --default


radosgw-admin zonegroup create --rgw-realm=acme --rgw-zonegroup=eu  
--endpoints=https://ceph2dev01.acme.com:443


radosgw-admin zone create --rgw-zone=eu-west-1 \
  --rgw-zonegroup=eu \
  --default \
  --master \
  --endpoints=https://ceph2dev01.acme.com:443

radosgw-admin zone modify --rgw-zone=eu-west-1 --rgw-zonegroup=eu  
--access-key= --secret=

radosgw-admin period update

radosgw-admin period update --commit

ceph orch apply -i rgw-eu-west1-2.yml

##
# CLUSTER 1 OUTPUT OF period update --commit #
##

{
"id": "b153187a-1d91-4bbf-a674-d3cad9fd23da",
"epoch": 1,
"predecessor_uuid": "740d6999-ce83-47ff-81f5-615a3a441a96",
"sync_status": [],
"period_map": {
"id": "b153187a-1d91-4bbf-a674-d3cad9fd23da",
"zonegroups": [
{
"id": "e39e0b42-43a8-47eb-b6cd-6c2524ff51d2",
"name": "us",
"api_name": "us",
"is_master": "true",
"endpoints": [
"https://ceph1dev01.acme.com:443";
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "d7ceaa4f-06c0-4c21-bcec-efe90f55ecfd",
"zones": [
{
"id": "d7c

[ceph-users] Re: Ceph Multisite Cloud Sync Module

2022-04-19 Thread Soumya Koduri

Hi,

On 4/19/22 09:47, Mark Selby wrote:

I am trying to get the Ceph Multisite Clous Sync module working with Amazon S3. 
The docs are not clear on how the sync module is actually configured. I just 
want a POC of the most simple config. Can anyone share the config and 
radosgw-admin commands that were invoked to create a simple sync setup. The 
closest docs that I have seen are 
https://croit.io/blog/setting-up-ceph-cloud-sync-module and to be honest they 
do not make a lot of sense.


I presume (from your last email) that you are familiar with setting up 
multisite between zones in a single zonegroup [1] . Cloud sync zone is 
similar to creating any other zone [2] but with additional option 
'--tier-type=cloud' specified. Below are the minimal steps you may need 
to get started with it -


1) To create cloud sync zone-
# radosgw-admin zone create --tier-type=cloud --rgw-zone= 
--rgw-zonegroup= --rgw-realm= 
--access-key= --secret= 
--endpoints=http://{fqdn}:80


eg., radosgw-admin zone create --rgw-zone=cloud-zone 
--rgw-zonegroup=default --rgw-realm=realm_default --tier-type=cloud 
--access-key= --secret= 
--endpoints=http://localhost:8002


2) Update cloud zone with the remote endpoint connection details -
# radosgw-admin zone modify --rgw-zone= 
--rgw-zonegroup=  --rgw-realm= \

 
--tier-config=connection.access_key=,connection.secret=,connection.endpoint=,connection.host_style=path,acls.source_id=,acls.dest_id=,target-path=

note: host_style, acls and target-path are optional. If not specified, 
the zone will create a default bucket on the cloud endpoint to sync 
entries to and maps the same rgw user to remote endpoint.


eg.,   radosgw-admin zone modify --rgw-zone=cloud-zone 
--rgw-zonegroup=default --rgw-realm=realm_default 
--tier-config=connection.access_key=L#R,connection.secret=Y##t,connection.endpoint=https://10.xx.xx.xx:80,acls.source_id=testid,acls.dest_id=aws-user,target_path=cloud-sync



3) radosgw-admin period update --commit

4) radosgw-admin sync status  // to fetch sync status across zones 
including cloud sync zone.



Hope this helps!

-Soumya

[1] https://docs.ceph.com/en/latest/radosgw/multisite/#multisite
[2] https://docs.ceph.com/en/latest/radosgw/cloud-sync-module/





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io