[ceph-users] Ceph Octopus 15.2.11 - rbd diff --from-snap lists all objects

2021-05-12 Thread David Herselman
Hi,

Has something change with 'rbd diff' in Octopus or have I hit a bug? I am no 
longer able to obtain the list of objects that have changed between two 
snapshots of an image, it always lists all allocated regions of the RBD image. 
This behaviour however only occurs when I add the '--whole-object' switch.

Using KRBD client with kernel 5.11.7 and Ceph Octopus 15.2.11 as part of 
Proxmox PVE 6.4 which is based on Debian 10. Images have the following features 
and I've performed offline object map checks and rebuilds (no errors reported).

To reproduce my issue I first create a new RBD image (default features are 63), 
map it using KRBD, write some data, create first snapshot, write a single 
object (4 MiB), create a second snapshot and then list the differences:

[admin@kvm1a ~]# rbd create rbd_hdd/test --size 40G
[admin@kvm1a ~]# rbd info rbd_hdd/test
rbd image 'test':
size 40 GiB in 10240 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 73363f8443987b
block_name_prefix: rbd_data.73363f8443987b
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Wed May 12 23:01:11 2021
access_timestamp: Wed May 12 23:01:11 2021
modify_timestamp: Wed May 12 23:01:11 2021
[admin@kvm1a ~]# rbd map rbd_hdd/test
/dev/rbd18
[admin@kvm1a ~]# dd if=/dev/zero of=/dev/rbd18 bs=64M count=1
1+0 records in
1+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.668701 s, 100 MB/s
[admin@kvm1a ~]# sync
[admin@kvm1a ~]# rbd snap create rbd_hdd/test@snap1
[admin@kvm1a ~]# dd if=/dev/zero of=/dev/rbd18 bs=4M count=1
1+0 records in
1+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.265691 s, 15.8 MB/s
[admin@kvm1a ~]# sync
[admin@kvm1a ~]# rbd snap create rbd_hdd/test@snap2
[admin@kvm1a ~]# rbd diff --from-snap snap1 rbd_hdd/test@snap2 --format=json
[{"offset":0,"length":4194304,"exists":"true"}]
[admin@kvm1b ~]# rbd diff --from-snap snap1 rbd_hdd/test@snap2 --format=json 
--whole-object
[{"offset":0,"length":4194304,"exists":"true"},{"offset":4194304,"length":4194304,"exists":"true"},{"offset":8388608,"length":4194304,"exists":"true"},{"offset":12582912,"length":4194304,"exists":"true"},{"offset":16777216,"length":4194304,"exists":"true"},{"offset":20971520,"length":4194304,"exists":"true"},{"offset":25165824,"length":4194304,"exists":"true"},{"offset":29360128,"length":4194304,"exists":"true"},{"offset":33554432,"length":4194304,"exists":"true"},{"offset":37748736,"length":4194304,"exists":"true"},{"offset":41943040,"length":4194304,"exists":"true"},{"offset":46137344,"length":4194304,"exists":"true"},{"offset":50331648,"length":4194304,"exists":"true"},{"offset":54525952,"length":4194304,"exists":"true"},{"offset":58720256,"length":4194304,"exists":"true"},{"offset":62914560,"length":4194304,"exists":"true"}]
[admin@kvm1a ~]# rbd du rbd_hdd/test
NAMEPROVISIONED  USED
test@snap1   40 GiB   64 MiB
test@snap2   40 GiB   64 MiB
test 40 GiB4 MiB
  40 GiB  132 MiB

My tests appear to confirm that adding the 'whole-object' option to rbd diff 
results in it listing every allocated extend instead of only the changes...


Regards
David Herselman
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Manager carries wrong information until killing it

2021-05-12 Thread Nico Schottelius



Reed Dier  writes:

> I don't have a solution to offer, but I've seen this for years with no 
> solution.
> Any time a MGR bounces, be it for upgrades, or a new daemon coming online, 
> etc, I'll see a scale spike like is reported below.

Interesting to read that we are not the only ones.

> Just out of curiosity, which MGR plugins are you using?

[22:11:05] black2.place6:~# ceph mgr module ls
{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator_cli",
"progress",
"rbd_support",
"status",
"volumes"
],
"enabled_modules": [
"iostat",
"pg_autoscaler",
"prometheus",
"restful"
],

> I have historically used the influx plugin for stats exports, and it shows up 
> in those values as well, throwing everything off.

So the problem is unlikely related to the prometheus plugin, but more to
a statistics error somewhere else.

> I don't see it in my Zabbix stats, albeit those are scraped at a
> longer interval that may not catch this.

For prometheus, we scrape every 10 or 15 seconds. But I wonder if this
really flattens out or whether the logic is actually different.

Out of curiosity from my side: the manager is a binary, but the plugins
are actually python modules. I had a quick look at
/usr/share/ceph/mgr/prometheus/module.py which seems to get the data
from a monitor - so I wonder if the problem lies more in the
architecture of ceph rather than the actual data export.

Cheers,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

2021-05-12 Thread Bryan Stillwell
I was able to figure out the solution with this rule:

step take default
step choose indep 0 type host
step chooseleaf indep 1 type osd
step emit
step take default
step choose indep 0 type host
step chooseleaf indep 1 type osd
step emit

Now the data is spread how I want it to be:

# for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r 
'.pg_stats[].pgid'); do
>   echo $pg
>   for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
> ceph osd find $osd | jq -r '.host'
>   done | sort | uniq -c | sort -n -k1
> done
8.0
  1 excalibur
  1 harrahs
  1 mandalaybay
  1 mirage
  2 aladdin
  2 paris
8.1
  1 aladdin
  1 excalibur
  1 harrahs
  1 mirage
  2 mandalaybay
  2 paris
8.2
  1 aladdin
  1 excalibur
  1 harrahs
  1 mirage
  2 mandalaybay
  2 paris
...

Hopefully someone else will find this useful.

Bryan

> On May 12, 2021, at 9:58 AM, Bryan Stillwell  wrote:
> 
> I'm trying to figure out a CRUSH rule that will spread data out across my 
> cluster as much as possible, but not more than 2 chunks per host.
> 
> If I use the default rule with an osd failure domain like this:
> 
> step take default
> step choose indep 0 type osd
> step emit
> 
> I get clustering of 3-4 chunks on some of the hosts:
> 
> # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r 
> '.pg_stats[].pgid'); do
>>  echo $pg
>>  for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
>>ceph osd find $osd | jq -r '.host'
>>  done | sort | uniq -c | sort -n -k1
> 8.0
>  1 harrahs
>  3 paris
>  4 aladdin
> 8.1
>  1 aladdin
>  1 excalibur
>  2 mandalaybay
>  4 paris
> 8.2
>  1 harrahs
>  2 aladdin
>  2 mirage
>  3 paris
> ...
> 
> However, if I change the rule to use:
> 
> step take default
> step choose indep 0 type host
> step chooseleaf indep 2 type osd
> step emit
> 
> I get the data spread across 4 hosts with 2 chunks per host:
> 
> # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r 
> '.pg_stats[].pgid'); do
>>  echo $pg
>>  for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
>>ceph osd find $osd | jq -r '.host'
>>  done | sort | uniq -c | sort -n -k1
>> done
> 8.0
>  2 aladdin
>  2 harrahs
>  2 mandalaybay
>  2 paris
> 8.1
>  2 aladdin
>  2 harrahs
>  2 mandalaybay
>  2 paris
> 8.2
>  2 harrahs
>  2 mandalaybay
>  2 mirage
>  2 paris
> ...
> 
> Is it possible to get the data to spread out over more hosts?  I plan on 
> expanding the cluster in the near future and would like to see more hosts get 
> 1 chunk instead of 2.
> 
> Also, before you recommend adding two more hosts and switching to a 
> host-based failure domain, the cluster is on a variety of hardware with 
> between 2-6 drives per host and drives that are 4TB-12TB in size (it's part 
> of my home lab).
> 
> Thanks,
> Bryan
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CRUSH rule for EC 6+2 on 6-node cluster

2021-05-12 Thread Bryan Stillwell
I'm trying to figure out a CRUSH rule that will spread data out across my 
cluster as much as possible, but not more than 2 chunks per host.

If I use the default rule with an osd failure domain like this:

step take default
step choose indep 0 type osd
step emit

I get clustering of 3-4 chunks on some of the hosts:

# for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r 
'.pg_stats[].pgid'); do
>   echo $pg
>   for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
> ceph osd find $osd | jq -r '.host'
>   done | sort | uniq -c | sort -n -k1
8.0
  1 harrahs
  3 paris
  4 aladdin
8.1
  1 aladdin
  1 excalibur
  2 mandalaybay
  4 paris
8.2
  1 harrahs
  2 aladdin
  2 mirage
  3 paris
...

However, if I change the rule to use:

step take default
step choose indep 0 type host
step chooseleaf indep 2 type osd
step emit

I get the data spread across 4 hosts with 2 chunks per host:

# for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r 
'.pg_stats[].pgid'); do
>   echo $pg
>   for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
> ceph osd find $osd | jq -r '.host'
>   done | sort | uniq -c | sort -n -k1
> done
8.0
  2 aladdin
  2 harrahs
  2 mandalaybay
  2 paris
8.1
  2 aladdin
  2 harrahs
  2 mandalaybay
  2 paris
8.2
  2 harrahs
  2 mandalaybay
  2 mirage
  2 paris
...

Is it possible to get the data to spread out over more hosts?  I plan on 
expanding the cluster in the near future and would like to see more hosts get 1 
chunk instead of 2.

Also, before you recommend adding two more hosts and switching to a host-based 
failure domain, the cluster is on a variety of hardware with between 2-6 drives 
per host and drives that are 4TB-12TB in size (it's part of my home lab).

Thanks,
Bryan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] May 10 Upstream Lab Outage

2021-05-12 Thread David Galloway
Hi all,

I wanted to provide an RCA for the outage you may have been affected by 
yesterday.  Some services that went down:

- All CI/testing
- quay.ceph.io
- telemetry.ceph.com (your cluster may have gone into HEALTH_WARN if you report 
telemetry data)
- lists.ceph.io (so all mailing lists)

All of our critical infra is running in a Red Hat Virtualization (RHV) instance 
backed by Red Hat Gluster Storage (RHGS) as the storage.  Before you go, 
"wait.. Gluster?"  Yes, this cluster was set up before Ceph was supported as 
backend storage for RHV/RHEV.

The root cause for the outage is the Gluster volumes got 100% full.  Once no 
writes were possible, RHV paused all the VMs.

Why didn't monitoring catch this?  I honestly don't know.

# grep ssdstore01 nagios-05-*2021* | grep Disk
nagios-05-01-2021-00.log:[1619740800] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-02-2021-00.log:[1619827200] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-03-2021-00.log:[1619913600] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-04-2021-00.log:[162000] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-05-2021-00.log:[1620086400] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-06-2021-00.log:[1620172800] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-07-2021-00.log:[1620259200] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-08-2021-00.log:[1620345600] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-09-2021-00.log:[1620432000] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-10-2021-00.log:[1620518400] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now
nagios-05-11-2021-00.log:[1620604800] CURRENT SERVICE STATE: ssdstore01;Disk 
Space;OK;HARD;1;Disks are OK now

Yet RHV knew we were running out of space.  I don't have e-mail notifications 
set up in RHV, however.

# zgrep "disk space" engine*202105*.gz | cut -d ',' -f4 | head -n 10
 Low disk space. hosted_storage domain has 24 GB of free space.
 Low disk space. hosted_storage domain has 24 GB of free space.
 Low disk space. hosted_storage domain has 23 GB of free space.
 Low disk space. hosted_storage domain has 23 GB of free space.
 Low disk space. hosted_storage domain has 23 GB of free space.
 Low disk space. hosted_storage domain has 23 GB of free space.
 Low disk space. hosted_storage domain has 23 GB of free space.
 Low disk space. hosted_storage domain has 21 GB of free space.
 Low disk space. hosted_storage domain has 20 GB of free space.
 Low disk space. hosted_storage domain has 11 GB of free space.

Our nagios instances runs this to check disk space: 
https://github.com/ceph/ceph-cm-ansible/blob/master/roles/common/files/libexec/diskusage.pl
You can ignore the comment about it only working for EXT2.

[root@ssdstore01 ~]# /usr/libexec/diskusage.pl 90 95
Disks are OK now

I ran this manually on one of the storage hosts and intentionally set the WARN 
level to a number lower than the current usage percentage.

[root@ssdstore01 ~]# df -h | grep 'Size\|gluster'
Filesystem  Size  Used Avail Use% Mounted on
/dev/md124  8.8T  6.7T  2.1T  77% /gluster

[root@ssdstore01 ~]# /usr/libexec/diskusage.pl 95 70
/gluster is at 77%
[root@ssdstore01 ~]# echo $?
2

When I logged in to the storage hosts yesterday morning, the /gluster mount was 
at 100%.  So nagios should have known.

How'd it get fixed?  I happened to have some large capacity drives that fit the 
storage nodes lying around.  They're being installed in a different project 
soon.  However, I was able to add these drives, add "bricks" to the Gluster 
storage, then rebalance the data.  Once that was done, I was able to restart 
all the VMs and delete old VMs and snapshots I no longer needed.

How do we keep this from happening again?  Well, as you may have been able to 
deduce... we were running out of space at a rate of 1-10 GB/day.  As you can 
see now, the Gluster volume has 2.1TB of space left.  So even if we grew by 
10GB/day again, we'd be okay for 200ish days.

I aim to have some (if not all) of these services moved off this platform and 
into an Openshift cluster backed by Ceph this year.  Sadly, I just don't think 
I have enough logging enabled to nail down exactly what happened.

-- 
David Galloway
Senior Systems Administrator
Ceph Engineering
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitor connection error

2021-05-12 Thread Tuffli, Chuck
> -Original Message-
> From: Eugen Block [mailto:ebl...@nde.ag]
> Sent: Tuesday, May 11, 2021 11:39 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: monitor connection error
> 
> Hi,
> 
> > What is this error trying to tell me? TIA
> 
> it tells you that the cluster is not reachable to the client, this can have 
> various
> reasons.
> 
> Can you show the output of your conf file?
> 
> cat /etc/ceph/es-c1.conf

[centos@cnode-01 ~]$ cat /etc/ceph/es-c1.conf
[global]
fsid = 3c5da069-2a03-4a5a-8396-53776286c858
mon_initial_members = cnode-01,cnode-02,cnode-03
mon_host = 192.168.122.39
public_network = 192.168.122.0/24
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_journal_size = 1024
osd_pool_default_size = 3
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 333
osd_pool_default_pgp_num = 333
osd_crush_chooseleaf_type = 1
[centos@cnode-01 ~]$

> Is the monitor service up running? I take it you don't use cephadm yet so 
> it's not
> a containerized environment?

Correct, this is bare metal and not a containerized environment. And I believe 
it is running:
[centos@cnode-01 ~]$ sudo systemctl --all | grep ceph
  ceph-crash.service
   loadedactive   running   Ceph crash dump collector
  ceph-mon@cnode-01.service 
   loadedactive   running   Ceph cluster monitor daemon
  system-ceph\x2dmon.slice  
   loadedactive   activesystem-ceph\x2dmon.slice
  ceph-mon.target   
   loadedactive   activeceph target allowing to start/stop all 
ceph-mon@.service instances at once
  ceph.target   
   loadedactive   activeceph target allowing to start/stop all 
ceph*@.service instances at once
[centos@cnode-01 ~]$

> Regards,
> Eugen
> 
> 
> Zitat von "Tuffli, Chuck" :
> 
> > Hi
> >
> > I'm new to ceph and have been following the Manual Deployment document
> > [1]. The process seems to work correctly until step 18 ("Verify that
> > the monitor is running"):
> >
> > [centos@cnode-01 ~]$ uname -a
> > Linux cnode-01 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50
> > UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> > [centos@cnode-01 ~]$ ceph -v
> > ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb)
> > octopus (stable)
> > [centos@cnode-01 ~]$ sudo ceph --cluster es-c1 -s [errno 2] RADOS
> > object not found (error connecting to the cluster)
> > [centos@cnode-01 ~]$
> >
> > What is this error trying to tell me? TIA
> >
> > [1]
> > INVALID URI REMOVED
> > nual-deployment/__;!!NpxR!1-v_Ql6E-l3P_E8DvIfk_YtknPrVFeZ5sFaPHLlsJVY8
> > PmzP7kySRbr1rYqbFiZ1$
> ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to
> ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Manager carries wrong information until killing it

2021-05-12 Thread Reed Dier
I don't have a solution to offer, but I've seen this for years with no solution.
Any time a MGR bounces, be it for upgrades, or a new daemon coming online, etc, 
I'll see a scale spike like is reported below.

Just out of curiosity, which MGR plugins are you using?
I have historically used the influx plugin for stats exports, and it shows up 
in those values as well, throwing everything off.

I don't see it in my Zabbix stats, albeit those are scraped at a longer 
interval that may not catch this.

Just looking for any common threads.

Reed

> On May 4, 2021, at 3:46 AM, Nico Schottelius  
> wrote:
> 
> 
> Hello,
> 
> we have a recurring, funky problem with managers on Nautilus (and
> probably also earlier versions): the manager displays incorrect
> information.
> 
> This is a recurring pattern and it also breaks the prometheus graphs, as
> the I/O is described insanely incorrectly: "recovery: 43 TiB/s, 3.62k
> keys/s, 11.40M objects/s" - which basically changes the scale of any
> related graph to unusable.
> 
> The latest example from today shows slow ops for an OSD
> that has been down for 17h:
> 
> 
> [09:50:31] black2.place6:~# ceph -s
>  cluster:
>id: 1ccd84f6-e362-4c50-9ffe-59436745e445
>health: HEALTH_WARN
>18 slow ops, oldest one blocked for 975 sec, osd.53 has slow ops
> 
>  services:
>mon: 5 daemons, quorum server9,server2,server8,server6,server4 (age 2w)
>mgr: server2(active, since 2w), standbys: server8, server4, server9, 
> server6, ciara3
>osd: 108 osds: 107 up (since 17h), 107 in (since 17h)
> 
>  data:
>pools:   4 pools, 2624 pgs
>objects: 42.52M objects, 162 TiB
>usage:   486 TiB used, 298 TiB / 784 TiB avail
>pgs: 2616 active+clean
> 8active+clean+scrubbing+deep
> 
>  io:
>client:   522 MiB/s rd, 22 MiB/s wr, 8.18k op/s rd, 689 op/s wr
> 
> 
> Killing the manager on server2 changes the status to another temporary
> incorrect status, because the rebalance finished hours ago, paired with
> the incorrect rebalance speed that we see from time to time:
> 
> 
> [09:51:59] black2.place6:~# ceph -s
>  cluster:
>id: 1ccd84f6-e362-4c50-9ffe-59436745e445
>health: HEALTH_OK
> 
>  services:
>mon: 5 daemons, quorum server9,server2,server8,server6,server4 (age 2w)
>mgr: server8(active, since 11s), standbys: server4, server9, server6, 
> ciara3
>osd: 108 osds: 107 up (since 17h), 107 in (since 17h)
> 
>  data:
>pools:   4 pools, 2624 pgs
>objects: 42.52M objects, 162 TiB
>usage:   486 TiB used, 298 TiB / 784 TiB avail
>pgs: 2616 active+clean
> 8active+clean+scrubbing+deep
> 
>  io:
>client:   214 TiB/s rd, 54 TiB/s wr, 4.86G op/s rd, 1.06G op/s wr
>recovery: 43 TiB/s, 3.62k keys/s, 11.40M objects/s
> 
>  progress:
>Rebalancing after osd.53 marked out
>  [..]
> 
> 
> Then a bit later, the status on the newly started manager is correct:
> 
> 
> [09:52:18] black2.place6:~# ceph -s
>  cluster:
>id: 1ccd84f6-e362-4c50-9ffe-59436745e445
>health: HEALTH_OK
> 
>  services:
>mon: 5 daemons, quorum server9,server2,server8,server6,server4 (age 2w)
>mgr: server8(active, since 47s), standbys: server4, server9, server6, 
> server2, ciara3
>osd: 108 osds: 107 up (since 17h), 107 in (since 17h)
> 
>  data:
>pools:   4 pools, 2624 pgs
>objects: 42.52M objects, 162 TiB
>usage:   486 TiB used, 298 TiB / 784 TiB avail
>pgs: 2616 active+clean
> 8active+clean+scrubbing+deep
> 
>  io:
>client:   422 MiB/s rd, 39 MiB/s wr, 7.91k op/s rd, 752 op/s wr
> 
> 
> Question: is this a know bug, is anyone else seeing it or are we doing
> something wrong?
> 
> Best regards,
> 
> Nico
> 
> --
> Sustainable and modern Infrastructures by ungleich.ch
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Write Ops on CephFS Increasing exponentially

2021-05-12 Thread Kyle Dean
Hi Partick,

Thanks for getting back to me. Looks like I found the issue. Its due to the 
fact that I had thought I had increased the max_file_size on ceph to 20TB turns 
out I missed a zero and set it to 1.89 TB.

I had originally tried to fallocate the space for the 8TB volume which kept 
erroring. I then tried DD and DD the entire space needed without errors. What I 
dont understand is, what happens to cephFS when you do this.

The files I'm writing into the pre-allocated volume in ceph are still there 
"luckily" but I thought that ceph would stop you from writing to cephFS if it 
hit the upper limit of max_file_size.

Kind regards,

Kyle

From: Patrick Donnelly 
Sent: 11 May 2021 03:14
To: Kyle Dean 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Write Ops on CephFS Increasing exponentially

Hi Kyle,

On Thu, May 6, 2021 at 7:56 AM Kyle Dean  wrote:
>
> Hi, hoping someone could help me get to the bottom of this particular issue 
> I'm having.
>
> I have ceph octopus installed using ceph-ansible.
>
> Currently, I have 3 MDS servers running, and one client connected to the 
> active MDS. I'm currently storing a very large encrypted container on the 
> CephFS file system, 8TB worth, and I'm writing data into it from the client 
> host.
>
> recently I have noticed a severe impact on performance, and the time take to 
> do processing on file within the container has increased from 1 minute to 11 
> minutes.
>
> in the ceph dashboard, when I take a look at the performance tab on the file 
> system page, the Write Ops are increasing exponentially over time.
>
> At the end of April around the 22nd I had 49 write Ops on the performance 
> page for the MDS deamons. This is now at 266467 Write Ops and increasing.
>
> Also the client requests have gone from 14 to 67 to 117 and is now at 283
>
> would someone be able to help me make sense of why the performance has 
> decreased and what is going on with the client requests and write operations.

I suggest you look at the "perf dump" statistics from the MDS  (via
ceph tell or admin socket) over a period of time to get an idea what
operations it's performing. It's probable your workload changed
somehow and that is the cause.

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Month June 2021 Event

2021-05-12 Thread Mike Perez
Hi everyone,

Today is the last day to get your proposal in for the Ceph June Month
event! The types of talks include:

* Lightning talk - 5 minutes
* Presentation - 20 minutes with q/a
* Unconference (Bof) - 40 minutes

We will be confirming with speakers for the date/time by May 16th.

https://ceph.io/events/ceph-month-june-2021/cfp

On Wed, Apr 21, 2021 at 6:30 AM Mike Perez  wrote:
>
> Hi everyone,
>
> We're looking for presentations, lightning talks, and BoFs to schedule
> for Ceph Month in June 2021. Please submit your proposals before May
> 12th:
>
> https://ceph.io/events/ceph-month-june-2021/cfp
>
> On Wed, Apr 14, 2021 at 12:35 PM Mike Perez  wrote:
> >
> > Hi everyone,
> >
> > In June 2021, we're hosting a month of Ceph presentations, lightning
> > talks, and unconference sessions such as BOFs. There is no
> > registration or cost to attend this event.
> >
> > The CFP is now open until May 12th.
> >
> > https://ceph.io/events/ceph-month-june-2021/cfp
> >
> > Speakers will receive confirmation that their presentation is accepted
> > and further instructions for scheduling by May 16th.
> >
> > The schedule will be available on May 19th.
> >
> > Join the Ceph community as we discuss how Ceph, the massively
> > scalable, open-source, software-defined storage system, can radically
> > improve the economics and management of data storage for your
> > enterprise.
> >
> > --
> > Mike Perez
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW federated user cannot access created bucket

2021-05-12 Thread Pritha Srivastava
The federated user will be allowed to perform only those s3 actions that
are explicitly allowed by the role's permission policy. The permission
policy is there for someone to exercise finer grained control over what s3
action is allowed and what is not, hence it differs from what regular users
are allowed to do.

Thanks,
Pritha

On Wed, May 12, 2021 at 4:04 PM Daniel Iwan  wrote:

> Hi all
>
> Scenario is as follows
> Federated user assumes a role via AssumeRoleWithWebIdentity, which gives
> permission to create a bucket.
> User creates a bucket and becomes an owner (this is visible in Ceph's web
> ui as Owner $oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b).
> User cannot list the content of the bucket however, because role's policy
> does not give access to the bucket.
> Later on when user re-authenticates and assumes the same role again.
> At this point user cannot access a bucket it owns for the reason as above
> I'm assuming.
> Bucket's ACL after creation
>
> radosgw-admin policy --bucket my-bucket
> {
> "acl": {
> "acl_user_map": [
> {
> "user": "$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b",
> "acl": 15
> }
> ],
> "acl_group_map": [],
> "grant_map": [
> {
> "id": "$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b",
> "grant": {
> "type": {
> "type": 0
> },
> "id": "$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b",
> "email": "",
> "permission": {
> "flags": 15
> },
> "name": "",
> "group": 0,
> "url_spec": ""
> }
> }
> ]
> },
> "owner": {
> "id": "$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b",
> "display_name": ""
> }
> }
>
> This seems inconsistent with buckets created by regular users
> Is this expected behaviour?
>
> Regards
> Daniel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using ID of a federated user in a bucket policy in RGW

2021-05-12 Thread Pritha Srivastava
Hi,

Can you try with the following ARN:

arn:aws:iam:::user/oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b

The format of the user id is: $$ , and in
$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b, the '$' before oidc is a
separator for a tenant which is empty here, and ARN for a user is of the
format: arn:aws:iam:::user/, and hence the ARN here will
be arn:aws:iam:::user/oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b
Thanks,
Pritha

On Wed, May 12, 2021 at 4:02 PM Daniel Iwan  wrote:

> Hi all
>
> I'm working on the following scenario
> User is authenticated with OIDC and tries to access a bucket which it does
> not own.
> How to specify user ID etc. to give access to such a user?
>
> By trial and error I found out that principal can be specified as
> "Principal": {"Federated":["arn:aws:sts:::assumed-role/MySession"]},
>
> but I want to use shadow user ID or something similar as the principal
>
> Docs
> https://docs.ceph.com/en/latest/radosgw/STS/
> states:
> 'A shadow user is created corresponding to every federated user. The user
> id is derived from the ‘sub’ field of the incoming web token. The user is
> created in a separate namespace - ‘oidc’ such that the user id doesn’t
> clash with any other user ids in rgw. The format of the user id is -
> $$ where user-namespace is ‘oidc’ for users
> that authenticate with oidc providers.'
>
> I see a shadow user in Web UI as e.g. 7f71c7c5-c24f-418e-87ac-aa8fe271289b
> but I cannot work out the syntax of a user id, I was expecting something
> like
>
> "arn:aws:iam:::user/$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b"
>
> but when trying to list content of a bucket I get AccessDenied.
> If bucket policy has Principal "*" the my authenticated user can access the
> bucket
>
> Is this possible?
> Regards
> Daniel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph stretch mode enabling

2021-05-12 Thread Eugen Block

Hi,

I just deployed a test cluster to try that out, too. I only deployed  
three MONs, but this should also apply.



I tried to create the third datacenter and put the tiebreaker there but got
the following error:

root@ceph-node-01:/home/clouduser# ceph mon enable_stretch_mode
ceph-node-05 stretch_rule datacenter
Error EINVAL: there are 3datacenter's in the cluster but stretch mode
currently only works with 2!


You don't create a third datacenter within the osd tree, you just tell  
ceph that your tie-breaker is in a different dc. For me it worked, I  
have two DCs and put the third (tie-breaker) into (virtual) dc3:


pacific1:~ # ceph mon set_location pacific3 datacenter=dc3
pacific1:~ # ceph mon enable_stretch_mode pacific3 stretch_rule datacenter

This automatically triggered pool size 4 and distributed the PGs  
evenly across both DCs.


Regards,
Eugen


Zitat von Felix O :


Hello,

I'm trying to deploy my test ceph cluster and enable stretch mode (
https://docs.ceph.com/en/latest/rados/operations/stretch-mode/). My problem
is enabling the stretch mode.

$ ceph mon enable_stretch_mode ceph-node-05 stretch_rule datacenter
Error EINVAL: Could not find location entry for datacenter on monitor
ceph-node-05

ceph-node-5 is the tiebreaker monitor

I tried to create the third datacenter and put the tiebreaker there but got
the following error:

root@ceph-node-01:/home/clouduser# ceph mon enable_stretch_mode
ceph-node-05 stretch_rule datacenter
Error EINVAL: there are 3datacenter's in the cluster but stretch mode
currently only works with 2!

An additional info:

Setup method: cephadm (https://docs.ceph.com/en/latest/cephadm/install/)

# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME  STATUS  REWEIGHT  PRI-AFF
 -1 0.03998  root default
-11 0.01999  datacenter site1
 -5 0.00999  host ceph-node-01
  0hdd  0.00999  osd.0  up   1.0  1.0
 -3 0.00999  host ceph-node-02
  1hdd  0.00999  osd.1  up   1.0  1.0
-12 0.01999  datacenter site2
 -9 0.00999  host ceph-node-03
  3hdd  0.00999  osd.3  up   1.0  1.0
 -7 0.00999  host ceph-node-04
  2hdd  0.00999  osd.2  up   1.0  1.0

stretch_rule is added to the crush

# ceph mon set_location ceph-node-01 datacenter=site1
# ceph mon set_location ceph-node-02 datacenter=site1
# ceph mon set_location ceph-node-03 datacenter=site2
# ceph mon set_location ceph-node-04 datacenter=site2

# ceph versions
{
"mon": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 5
},
"mgr": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 2
},
"osd": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 4
},
"mds": {},
"overall": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 11
}
}

Thank you for your support.

--
Best regards,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW segmentation fault on Pacific 16.2.1 with multipart upload

2021-05-12 Thread Daniel Iwan
Hi
I have started to see segfaults during multiplart upload to one of the
buckets
File is about 60MB in size
Upload of the same file to a brand new bucket works OK

Command used
aws --profile=tester --endpoint=$HOST_S3_API --region="" s3 cp
./pack-a9201afb4682b74c7c5a5d6070e661662bdfea1a.pack
s3://tester-bucket/pack-a9201afb4682b74c7c5a5d6070e661662bdfea1a.pack

For some reason log shows upload to  tester-bucket-2 ???
Bucket tester-bucket-2 is owned by the same user TESTER.

I'm using Ceph 16.2.1 (recently upgraded from Octopus).
Installed with cephadm in Docker
OS Ubuntu 18.04.5 LTS

Logs show as below

May 11 11:00:46 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:46.891+ 7ffb0e25e700  1 == starting new request
req=0x7ffa8e15d620 =
May 11 11:00:46 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:46.907+ 7ffb0b258700  1 == req done
req=0x7ffa8e15d620 op status=0 http_status=200 latency=0.011999841s ==
May 11 11:00:46 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:46.907+ 7ffb0b258700  1 beast: 0x7ffa8e15d620:
11.1.150.14 - TESTER [11/May/2021:11:00:46.891 +] "POST
/tester-bucket-2/pack-a9201afb4682b74c7c5a5d6070e661662bdfea1a.pack?uploads
HTTP/1.1" 200 296 - "aws-cli/2.1.23 Python/3.7.3
Linux/4.19.128-microsoft-standard exe/x86_64.ubuntu.18 p
May 11 11:00:47 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:47.055+ 7ffb09254700  1 == starting new request
req=0x7ffa8e15d620 =
May 11 11:00:47 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:47.355+ 7ffb51ae5700  1 == starting new request
req=0x7ffa8e0dc620 =
May 11 11:00:47 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:47.355+ 7ffb4eadf700  1 == starting new request
req=0x7ffa8e05b620 =
May 11 11:00:47 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:47.355+ 7ffb46acf700  1 == starting new request
req=0x7ffa8df59620 =
May 11 11:00:47 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:47.355+ 7ffb44acb700  1 == starting new request
req=0x7ffa8ded8620 =
May 11 11:00:47 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:47.355+ 7ffb3dabd700  1 == starting new request
req=0x7ffa8dfda620 =
May 11 11:00:47 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:47.359+ 7ffb1d27c700  1 == starting new request
req=0x7ffa8de57620 =
May 11 11:00:47 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:47.359+ 7ffb22a87700  1 == starting new request
req=0x7ffa8ddd6620 =
May 11 11:00:48 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:48.275+ 7ffb2d29c700  1 == req done
req=0x7ffa8e15d620 op status=0 http_status=200 latency=1.219983697s ==
May 11 11:00:48 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:48.275+ 7ffb2d29c700  1 beast: 0x7ffa8e15d620:
11.1.150.14 - TESTER [11/May/2021:11:00:47.055 +] "PUT
/tester-bucket-2/pack-a9201afb4682b74c7c5a5d6070e661662bdfea1a.pack?uploadId=2~JhGavMwngl_FH6-LcE2vFxMRjcf4qTF&partNumber=8
HTTP/1.1" 200 2485288 - "aws-cli/2.1.23 Python/3.7.3 Linux
May 11 11:00:54 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:54.695+ 7ffad89f3700  1 == req done
req=0x7ffa8ddd6620 op status=0 http_status=200 latency=7.335902214s ==
May 11 11:00:54 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:54.695+ 7ffad89f3700  1 beast: 0x7ffa8ddd6620:
11.1.150.14 - TESTER [11/May/2021:11:00:47.359 +] "PUT
/tester-bucket-2/pack-a9201afb4682b74c7c5a5d6070e661662bdfea1a.pack?uploadId=2~JhGavMwngl_FH6-LcE2vFxMRjcf4qTF&partNumber=6
HTTP/1.1" 200 8388608 - "aws-cli/2.1.23 Python/3.7.3 Linux
May 11 11:00:56 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:56.871+ 7ffb11a65700  1 == req done
req=0x7ffa8e0dc620 op status=0 http_status=200 latency=9.515872955s ==
May 11 11:00:56 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:56.871+ 7ffb11a65700  1 beast: 0x7ffa8e0dc620:
11.1.150.14 - TESTER [11/May/2021:11:00:47.355 +] "PUT
/tester-bucket-2/pack-a9201afb4682b74c7c5a5d6070e661662bdfea1a.pack?uploadId=2~JhGavMwngl_FH6-LcE2vFxMRjcf4qTF&partNumber=7
HTTP/1.1" 200 8388608 - "aws-cli/2.1.23 Python/3.7.3 Linux
May 11 11:00:59 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:59.491+ 7ffac89d3700  1 == req done
req=0x7ffa8dfda620 op status=0 http_status=200 latency=12.135838509s ==
May 11 11:00:59 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:00:59.491+ 7ffac89d3700  1 beast: 0x7ffa8dfda620:
11.1.150.14 - TESTER [11/May/2021:11:00:47.355 +] "PUT
/tester-bucket-2/pack-a9201afb4682b74c7c5a5d6070e661662bdfea1a.pack?uploadId=2~JhGavMwngl_FH6-LcE2vFxMRjcf4qTF&partNumber=2
HTTP/1.1" 200 8388608 - "aws-cli/2.1.23 Python/3.7.3 Linux
May 11 11:01:02 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:01:02.891+ 7ffb68312700  1 == req done
req=0x7ffa8e05b620 op status=0 http_status=200 latency=15.535793304s ==
May 11 11:01:02 ceph-om-vm-node1 bash[27881]: debug
2021-05-11T11:01:02.891+ 7ffb68312700  1 

[ceph-users] RGW federated user cannot access created bucket

2021-05-12 Thread Daniel Iwan
Hi all

Scenario is as follows
Federated user assumes a role via AssumeRoleWithWebIdentity, which gives
permission to create a bucket.
User creates a bucket and becomes an owner (this is visible in Ceph's web
ui as Owner $oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b).
User cannot list the content of the bucket however, because role's policy
does not give access to the bucket.
Later on when user re-authenticates and assumes the same role again.
At this point user cannot access a bucket it owns for the reason as above
I'm assuming.
Bucket's ACL after creation

radosgw-admin policy --bucket my-bucket
{
"acl": {
"acl_user_map": [
{
"user": "$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b",
"acl": 15
}
],
"acl_group_map": [],
"grant_map": [
{
"id": "$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b",
"grant": {
"type": {
"type": 0
},
"id": "$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b",
"email": "",
"permission": {
"flags": 15
},
"name": "",
"group": 0,
"url_spec": ""
}
}
]
},
"owner": {
"id": "$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b",
"display_name": ""
}
}

This seems inconsistent with buckets created by regular users
Is this expected behaviour?

Regards
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Using ID of a federated user in a bucket policy in RGW

2021-05-12 Thread Daniel Iwan
Hi all

I'm working on the following scenario
User is authenticated with OIDC and tries to access a bucket which it does
not own.
How to specify user ID etc. to give access to such a user?

By trial and error I found out that principal can be specified as
"Principal": {"Federated":["arn:aws:sts:::assumed-role/MySession"]},

but I want to use shadow user ID or something similar as the principal

Docs
https://docs.ceph.com/en/latest/radosgw/STS/
states:
'A shadow user is created corresponding to every federated user. The user
id is derived from the ‘sub’ field of the incoming web token. The user is
created in a separate namespace - ‘oidc’ such that the user id doesn’t
clash with any other user ids in rgw. The format of the user id is -
$$ where user-namespace is ‘oidc’ for users
that authenticate with oidc providers.'

I see a shadow user in Web UI as e.g. 7f71c7c5-c24f-418e-87ac-aa8fe271289b
but I cannot work out the syntax of a user id, I was expecting something
like

"arn:aws:iam:::user/$oidc$7f71c7c5-c24f-418e-87ac-aa8fe271289b"

but when trying to list content of a bucket I get AccessDenied.
If bucket policy has Principal "*" the my authenticated user can access the
bucket

Is this possible?
Regards
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io