[ceph-users] Re: ceph-dashboard python warning with new pyo3 0.17 lib (debian12)

2023-07-03 Thread David Fojtík
Hello.

Update to the latest versions of Ceph solves that.
See https://docs.ceph.com/en/quincy/install/get-packages/ and 
https://download.ceph.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Delete or move files from lost+found in cephfs

2023-07-03 Thread Thomas Widhalm

Hi,

I had some trouble in the past with my CephFS which I was able to 
resolve - mostly with your help.


Now I have about 150GB of data in lost+found in my CephFS. No matter 
what I try and how I change permissions, every time when I try to delete 
or move something from there I only get the reply: "mv: cannot remove 
'lost+found/12b047c': Read-only file system".


I searched the web and configuration items but I didn't find a way to 
get rid of these files. I copyied most of them to another place, 
identified them and have them back. So in lost+found there are mostly 
useless copies.


Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at


OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-03 Thread Ilya Dryomov
On Mon, Jul 3, 2023 at 6:58 PM Mark Nelson  wrote:
>
>
> On 7/3/23 04:53, Matthew Booth wrote:
> > On Thu, 29 Jun 2023 at 14:11, Mark Nelson  wrote:
> > This container runs:
> >  fio --rw=write --ioengine=sync --fdatasync=1
> > --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
> > --output-format=json --runtime=60 --time_based=1
> >
> > And extracts sync.lat_ns.percentile["99.00"]
>  Matthew, do you have the rest of the fio output captured?  It would be 
>  interesting to see if it's just the 99th percentile that is bad or the 
>  PWL cache is worse in general.
> >>> Sure.
> >>>
> >>> With PWL cache: https://paste.openstack.org/show/820504/
> >>> Without PWL cache: https://paste.openstack.org/show/b35e71zAwtYR2hjmSRtR/
> >>> With PWL cache, 'rbd_cache'=false:
> >>> https://paste.openstack.org/show/byp8ZITPzb3r9bb06cPf/
> >>
> >> Also, how's the CPU usage client side?  I would be very curious to see
> >> if unwindpmp shows anything useful (especially lock contention):
> >>
> >>
> >> https://github.com/markhpc/uwpmp
> >>
> >>
> >> Just attach it to the client-side process and start out with something
> >> like 100 samples (more are better but take longer).  You can run it like:
> >>
> >>
> >> ./unwindpmp -n 100 -p 
> > I've included the output in this gist:
> > https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d
> >
> > That gist contains 4 runs: 2 with PWL enabled and 2 without, and also
> > a markdown file explaining the collection method.
> >
> > Matt
>
>
> Thanks Matt!  I looked through the output.  Looks like the symbols might
> have gotten mangled.  I'm not an expert on the RBD client, but I don't
> think we would really be calling into
> rbd_group_snap_rollback_with_progress from
> librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl.  Was it possible
> you used the libdw backend for unwindpmp?  libdw sometimes gives
> strange/mangled callgraphs, but I haven't seen it before with
> libunwind.  Hopefully Congmin Yin or Ilya can confirm if it's garbage.

>
> So with that said, assuming we can trust these callgraphs at all, it
> looks like it might be worth looking at the latency of the
> AbstractWriteLog, librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl,
> and possibly usage of librados::v14_2_0::IoCtx::object_list.  On the

Hi Mark,

Both rbd_group_snap_rollback_with_progress and
librados::v14_2_0::IoCtx::object_list entries don't make sense to me,
so I'd say it's garbage.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Quarterly (CQ) - Issue #1

2023-07-03 Thread Zac Dover
The first issue of "Ceph Quarterly" is attached to this email. Ceph Quarterly 
(or "CQ") is an overview of the past three months of upstream Ceph development. 
We provide CQ in three formats: A4, letter, and plain text wrapped at 80 
columns.

Zac Dover
Upstream Documentation
Ceph FoundationCeph Quarterly 
July 2023 
Issue #1 

An overview of the past three months of Ceph upstream development. 

Pull request (PR) numbers are provided for many of the items in the list below.
To see the PR associated with a list item, append the PR number to the string
https://github.com/ceph/ceph/pull/. For example, to see the PR for the first
item in the left column below, append the string 48720 to the string
https://github.com/ceph/ceph/pull/ to make this string:
https://github.com/ceph/ceph/pull/48720.


CephFS

1. New option added—clients can be blocked from connecting to the MDS in order
to help in recovery scenarios where you want the MDS running without any client
workload: 48720

2. Clients are prevented from exceeding the xattrs key-value limits: 46357

3. snapdiff for cephfs—a building block for efficient disaster recovery: 43546


Cephadm

1. Support for NFS backed by virtual IP address: 47199

2. Ingress service for nfs (haproxy/keepalived) is now redeployed when its
host(s) go offline: 51120

3. Switch from scp to sftp for better security when transferring files: 50846

4. Ability added to preserve VG / LV for DB devices when replacing data
devices: 50838


Crimson 

1. Much bug squashing and test suite expansion, successful rbd and rgw
tests.

2. Groundwork for SMR HDD support for SeaStore: 48717

3. Full snapshot support (snapshot trimming, the last outstanding piece of
functionality needed to provide full snapshot support, has been merged.)

4. Simple multi-core design for SeaStore: 48717

5. Memory usage improved by linking BlueStore with tcmalloc: 46062


Dashboard

cephx user management 1. import/export users: 50927 2. delete users: 50918 3.
edit users: 50183

RGW multisite setup / config 

1. multisite config creation: 49953 

2. editing zonegroups: 50557 

3. editing zones: 50643 

4. editing realms: 50529 

5. migrate to multisite: 50806 

6. delete multisite: 50600 

7. RGW role creation: 50426

RADOS 

1. mClock improvements—a number of settings have been adjusted to fix
cases in which backfill or recovery ran slow. This means faster recovery for
restoring redundancy than with wpq while preserving the same client
performance. PRs are summarized in the reef backport: 51263

2. Scrub costs have been adjusted similarly for mclock: 51656

3. rocksdb updated to 7.9.2: 51737

4. rocksdb range deletes have been optimized and made tunable online: 49748 and
49870. These improve the performance of large omap deletes, e.g. backfilling
for RGW bucket indexes, or bucket deletion.

5. osd_op_thread_timeout and suicide_timeout can now be adjusted on the fly:
49628


RBD 

1. Alternate userspace block device (out-of-tree kernel module) support
added—like a better-performing rbd-nbd: 50341

2. A number of rbd-mirror–related bug fixes. Better handling of the 
rbd_support
mgr module being blocklisted. Failed connections to remote clusters are now
handled more gracefully. Specifically:

a. blocklist handling in the rbd_support mgr module: 49742 and 51454

b. Perform mirror snap removal from the local, not remote cluster: 51166

c. Remove previous incomplete primary snapshot after successfully creating a
new one: 50324

3. Switch to labeled perf counters (per-image stats to monitor) for rbd-mirror:
50302

4. rbd-wnbd: optionally handle wnbd adapter restart events: 49302

5. RADOS object map corruption in snapshots taken under I/O: 52109. See the
e4b1e0466354942c935e9eca2ab2858e75049415 commit message for a summary. Note
that this affects snap-diff operations, which means that incremental backups
and snapshot-based mirroring are affected.


RGW 

1. When RGW creates a new data pool, it now sets the bulk flag so that the
autoscaler sets an appropriate number of PGs for good parallelism: 51497 (see
also:
https://docs.ceph.com/en/latest/rados/operations/placement-groups/#automated-scaling)

2. Bucket policies can now allow access to notifications for users who do not
own a bucket: 50684

3. Improved Trino interoperability with S3 Select and RGW: 50471

4. An optional/zero API for benchmarking RGW was added: 50507

5. Experimental radosgw-admin command for recreating a lost bucket index: 50348

6. Cache for improving bucket notification CPU efficiency: 49807

7. D4N shared cache via Redis—initial version merged: 48879

8. Hardware cryptography acceleration improved with a batch mode: 47040

9. Read-only role for OpenStack Keystone integration added: 45469

10. Multisite sync speed increased by involving and coordinating work among
multiple RGW daemons: 45958


Akamai has joined the Ceph Foundation, taking over Linode’s membership because
Akamai acquired Linode.


Send all inquiries and comments to Zac Dover at 

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-03 Thread Mark Nelson


On 7/3/23 04:53, Matthew Booth wrote:

On Thu, 29 Jun 2023 at 14:11, Mark Nelson  wrote:

This container runs:
 fio --rw=write --ioengine=sync --fdatasync=1
--directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
--output-format=json --runtime=60 --time_based=1

And extracts sync.lat_ns.percentile["99.00"]

Matthew, do you have the rest of the fio output captured?  It would be 
interesting to see if it's just the 99th percentile that is bad or the PWL 
cache is worse in general.

Sure.

With PWL cache: https://paste.openstack.org/show/820504/
Without PWL cache: https://paste.openstack.org/show/b35e71zAwtYR2hjmSRtR/
With PWL cache, 'rbd_cache'=false:
https://paste.openstack.org/show/byp8ZITPzb3r9bb06cPf/


Also, how's the CPU usage client side?  I would be very curious to see
if unwindpmp shows anything useful (especially lock contention):


https://github.com/markhpc/uwpmp


Just attach it to the client-side process and start out with something
like 100 samples (more are better but take longer).  You can run it like:


./unwindpmp -n 100 -p 

I've included the output in this gist:
https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d

That gist contains 4 runs: 2 with PWL enabled and 2 without, and also
a markdown file explaining the collection method.

Matt



Thanks Matt!  I looked through the output.  Looks like the symbols might 
have gotten mangled.  I'm not an expert on the RBD client, but I don't 
think we would really be calling into 
rbd_group_snap_rollback_with_progress from 
librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl.  Was it possible 
you used the libdw backend for unwindpmp?  libdw sometimes gives 
strange/mangled callgraphs, but I haven't seen it before with 
libunwind.  Hopefully Congmin Yin or Ilya can confirm if it's garbage.


So with that said, assuming we can trust these callgraphs at all, it 
looks like it might be worth looking at the latency of the 
AbstractWriteLog, librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl, 
and possibly usage of librados::v14_2_0::IoCtx::object_list.  On the 
QEMU side, possibly the latency of rbd_aio_flush in both cases.  Also 
it's possible we might have md_config_t get_val/set_val in the hot path 
somewhere though it looks minor.  If the 
rbd_group_snap_rollback_with_progress usage is real, it's significantly 
more prevalent in the PWL callgraphs.  Without knowing more about how 
the PWL cache works, I'm not sure if any of this is meaningful or not 
though.


Mark


Best Regards,
Mark Nelson
Head of R (USA)

Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Quincy osd bench in order to define osd_mclock_max_capacity_iops_[hdd|ssd]

2023-07-03 Thread Rafael Diaz Maurin

Hello,

I've just upgraded a Pacific cluster into Quincy, and all my osd have 
the low value osd_mclock_max_capacity_iops_hdd : 315.00.


The manuel does not explain how to benchmark the OSD with fio or ceph 
bench with good options.
Can someone have the good ceph bench options or fio options in order to 
configure osd_mclock_max_capacity_iops_hdd for each osd ?


I ran this bench various times on the same OSD (class hdd) and I obtain 
different results.

ceph tell ${osd} cache drop
ceph tell ${osd} bench 12288000 4096 4194304 100

example :
osd.21 (hdd): osd_mclock_max_capacity_iops_hdd = 315.00
    bench 1 : 3006.2271379745534
    bench 2 : 819.503206458996
    bench 3 : 946.5406320134085

How can I succeed in getting the good values for the 
osd_mclock_max_capacity_iops_[hdd|ssd] options ?


Thank you for your help,

Rafael




smime.p7s
Description: Signature cryptographique S/MIME
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-03 Thread Yin, Congmin
Hi Matthew,

Due to the latency of rbd layers, the write latency of the pwl cache is more 
than ten times that of the Raw device.
I replied directly below the 2 questions.

Best regards.
Congmin Yin


-Original Message-
From: Matthew Booth  
Sent: Thursday, June 29, 2023 7:23 PM
To: Ilya Dryomov 
Cc: Giulio Fidente ; Yin, Congmin ; 
Tang, Guifeng ; Vikhyat Umrao ; 
Jdurgin ; John Fulton ; Francesco 
Pantano ; ceph-users@ceph.io
Subject: Re: [ceph-users] RBD with PWL cache shows poor performance compared to 
cache device

On Wed, 28 Jun 2023 at 22:44, Ilya Dryomov  wrote:
>> ** TL;DR
>>
>> In testing, the write latency performance of a PWL-cache backed RBD 
>> disk was 2 orders of magnitude worse than the disk holding the PWL 
>> cache.



PWL cache can use pmem or SSD as cache devices. Using PMEM, based on my test 
environment at that time, I can give specific data as follows: the write 
latency of the pmem Raw device is about 10+us, the write latency of the pwl 
cache is about 100us+(from the latency of the rbd layers), and the write 
latency of the ceph cluster is about 1000+us(from messengers and network). But 
for SSDs, there are many types, and I cannot provide a specific value, but it 
will definitely be worse than pmem. So, for a phenomenon that is 2 orders of 
magnitude lower, it is worse than expected. Can you provide detailed values of 
the three for analysis. (SSD, pwl cache, ceph cluster)
==

>>
>> ** Summary
>>
>> I was hoping that PWL cache might be a good solution to the problem 
>> of write latency requirements of etcd when running a kubernetes 
>> control plane on ceph. Etcd is extremely write latency sensitive and 
>> becomes unstable if write latency is too high. The etcd workload can 
>> be characterised by very small (~4k) writes with a queue depth of 1.
>> Throughput, even on a busy system, is normally very low. As etcd is 
>> distributed and can safely handle the loss of un-flushed data from a 
>> single node, a local ssd PWL cache for etcd looked like an ideal 
>> solution.
>
>
> Right, this is exactly the use case that the PWL cache is supposed to address.

Good to know!

>> My expectation was that adding a PWL cache on a local SSD to an 
>> RBD-backed would improve write latency to something approaching the 
>> write latency performance of the local SSD. However, in my testing 
>> adding a PWL cache to an rbd-backed VM increased write latency by 
>> approximately 4x over not using a PWL cache. This was over 100x more 
>> than the write latency performance of the underlying SSD.




When using image as the VM's disk, you may have used commands like the 
following. In many cases, using parameters such as writeback will force the 
start of rbd cache, which is a memory cache. It is normal for pwl cache to be 
several times slower than it. Please confirm. 
There is currently no parameter support for using only pwl cache instead of rbd 
cache. I have tested the latency of using pwl cache (pmem) by modifying the 
code myself, which is about twice as high as using rbd cache.

qemu -m 1024 -drive 
format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback
==


>>
>> My expectation was based on the documentation here:
>> https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/
>>
>> “The cache provides two different persistence modes. In 
>> persistent-on-write mode, the writes are completed only when they are 
>> persisted to the cache device and will be readable after a crash. In 
>> persistent-on-flush mode, the writes are completed as soon as it no 
>> longer needs the caller’s data buffer to complete the writes, but 
>> does not guarantee that writes will be readable after a crash. The 
>> data is persisted to the cache device when a flush request is received.”
>>
>> ** Method
>>
>> 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other 
>> running libvirt and mounting a VM’s disk with librbd (also 17.2.6) 
>> from the first node.
>>
>> All performance testing is from the libvirt system. I tested write 
>> latency performance:
>>
>> * Inside the VM without a PWL cache
>> * Of the PWL device directly from the host (direct to filesystem, no 
>> VM)
>> * Inside the VM with a PWL cache
>>
>> I am testing with fio. Specifically I am running a containerised 
>> test, executed with:
>>podman run --volume .:/var/lib/etcd:Z 
>> quay.io/openshift-scale/etcd-perf
>>
>> This container runs:
>>fio --rw=write --ioengine=sync --fdatasync=1 
>> --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf 
>> --output-format=json --runtime=60 --time_based=1
>>
>> And extracts sync.lat_ns.percentile["99.00"]
>
>
> Matthew, do you have the rest of the fio output captured?  It would be 
> interesting to see if it's just the 99th percentile that is bad or the PWL 
> cache is worse in general.

Sure.

With PWL cache: 

[ceph-users] Re: db/wal pvmoved ok, but gui show old metadatas

2023-07-03 Thread Christophe BAILLON
Up

I try to do for example

ceph orch daemon reconfig osd.26

The cephadm gui continu to show me the old nvme as part of this osd

device_ids
nvme1n1=SAMSUNG_MZVLW1T0HMLH-0_S2U3NX0JB00438,sdc=SEAGATE_ST18000NM004J_ZR52TT83C148JFSJ
device_paths
nvme1n1=/dev/disk/by-path/pci-:3b:00.0-nvme-1,sdc=/dev/disk/by-path/pci-:3d:00.0-sas-exp0x500304802094193f-phy0-lun-0
but
root@store-par2-node02:/# ceph daemon osd.26 list_devices
[
{
"device": "/dev/nvme0n1",
"device_id": "INTEL_SSDPEDME016T4S_CVMD516500851P6KGN"
},
{
"device": "/dev/sdc",
"device_id": "SEAGATE_ST18000NM004J_ZR52TT83C148JFSJ"
}
]


- Mail original -
> De: "Christophe BAILLON" 
> À: "ceph-users" 
> Envoyé: Vendredi 30 Juin 2023 15:33:41
> Objet: [ceph-users] db/wal pvmoved ok, but gui show old metadatas

> Hello,
> 
> we have a Ceph 17.2.5 cluster with a total of 26 nodes, where 15 nodes that 
> have
> faulty NVMe drives,
> where the db/wal resides (one NVMe for the first 6 OSDs and another for the
> remaining 6).
> 
> We replaced them with new drives and pvmoved it to avoid losing the OSDs.
> 
> So far, there are no issues, and the OSDs are functioning properly.
> 
> ceph see the correct news disks
> root@node02:/# ceph daemon osd.26 list_devices
> [
> {
> "device": "/dev/nvme0n1",
> "device_id": "INTEL_SSDPEDME016T4S_CVMD516500851P6KGN"
> },
> {
> "device": "/dev/sdc",
> "device_id": "SEAGATE_ST18000NM004J_ZR52TT83C148JFSJ"
> }
> ]
> 
> However, the Cephadm GUI still shows the old NVMe drives and hasn't recognized
> the device change.
> 
> How can we make the GUI and Cephadm recognize the new devices?
> 
> I tried restarting the managers, thinking that it would rescan the OSDs during
> startup, but it didn't work.
> 
> If you have any ideas, I would appreciate it.
> 
> Should I perform something like that: ceph orch daemon reconfig osd.*
> 
> Thank you for your help.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Christophe BAILLON
Mobile :: +336 16 400 522
Work :: https://eyona.com
Twitter :: https://twitter.com/ctof
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] What is the best way to use disks with different sizes

2023-07-03 Thread wodel youchi
Hi,

I will be deploying a Proxmox HCI cluster with 3 nodes. Each node has 3
nvme disks of 3.8Tb each and a 4th nvme disk of 7.6Tb. Technically I need
one pool.

Is it good practice to use all disks to create the one pool I need, or is
it better to create two pools, one on each group of disks?

If the former is good (use all disks and create one pool), should I take
into account the difference in disk size?

Regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Get bucket placement target

2023-07-03 Thread Casey Bodley
On Mon, Jul 3, 2023 at 6:52 AM mahnoosh shahidi  wrote:
>
> I think this part of the doc shows that LocationConstraint can override the
> placement and I can change the placement target with this field.
>
> When creating a bucket with the S3 protocol, a placement target can be
> > provided as part of the LocationConstraint to override the default
> > placement targets from the user and zonegroup.
>
>
>  I just want to get the value that I had set in the create bucket request.

thanks Mahnoosh, i opened a feature request for this at
https://tracker.ceph.com/issues/61887

>
> Best Regards,
> Mahnoosh
>
> On Mon, Jul 3, 2023 at 1:19 PM Konstantin Shalygin  wrote:
>
> > Hi,
> >
> > On 3 Jul 2023, at 12:23, mahnoosh shahidi  wrote:
> >
> > So clients can not get the value which they set in the LocationConstraint
> > field in the create bucket request as in this doc
> > ?
> >
> >
> > LocationConstraint in this case is a AZ [1], not the placement in Ceph
> > (OSD pool, compression settings)
> >
> >
> > [1] https://docs.openstack.org/neutron/rocky/admin/config-az.html
> > k
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: list of rgw instances in ceph status

2023-07-03 Thread Boris Behrens
Hi Mahnoosh,
that helped. Thanks a lot!

Am Mo., 3. Juli 2023 um 13:46 Uhr schrieb mahnoosh shahidi <
mahnooosh@gmail.com>:

> Hi Boris,
>
> You can list your rgw daemons with the following command
>
> ceph service dump -f json-pretty | jq '.services.rgw.daemons'
>
>
> The following command extract all their ids
>
> ceph service dump -f json-pretty | jq '.services.rgw.daemons' | egrep -e
>> 'gid' -e '\"id\"'
>>
>
> Best Regards,
> Mahnoosh
>
> On Mon, Jul 3, 2023 at 3:00 PM Boris Behrens  wrote:
>
>> Hi,
>> might be a dump question, but is there a way to list the rgw instances
>> that
>> are running in a ceph cluster?
>>
>> Before pacific it showed up in `ceph status` but now it only tells me how
>> many daemons are active, now which daemons are active.
>>
>> ceph orch ls tells me that I need to configure a backend but we are not at
>> the stage that we are going to implement the orchestrator yet.
>>
>> Cheers
>>  Boris
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: list of rgw instances in ceph status

2023-07-03 Thread mahnoosh shahidi
Hi Boris,

You can list your rgw daemons with the following command

ceph service dump -f json-pretty | jq '.services.rgw.daemons'


The following command extract all their ids

ceph service dump -f json-pretty | jq '.services.rgw.daemons' | egrep -e
> 'gid' -e '\"id\"'
>

Best Regards,
Mahnoosh

On Mon, Jul 3, 2023 at 3:00 PM Boris Behrens  wrote:

> Hi,
> might be a dump question, but is there a way to list the rgw instances that
> are running in a ceph cluster?
>
> Before pacific it showed up in `ceph status` but now it only tells me how
> many daemons are active, now which daemons are active.
>
> ceph orch ls tells me that I need to configure a backend but we are not at
> the stage that we are going to implement the orchestrator yet.
>
> Cheers
>  Boris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] list of rgw instances in ceph status

2023-07-03 Thread Boris Behrens
Hi,
might be a dump question, but is there a way to list the rgw instances that
are running in a ceph cluster?

Before pacific it showed up in `ceph status` but now it only tells me how
many daemons are active, now which daemons are active.

ceph orch ls tells me that I need to configure a backend but we are not at
the stage that we are going to implement the orchestrator yet.

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Get bucket placement target

2023-07-03 Thread mahnoosh shahidi
I think this part of the doc shows that LocationConstraint can override the
placement and I can change the placement target with this field.

When creating a bucket with the S3 protocol, a placement target can be
> provided as part of the LocationConstraint to override the default
> placement targets from the user and zonegroup.


 I just want to get the value that I had set in the create bucket request.

Best Regards,
Mahnoosh

On Mon, Jul 3, 2023 at 1:19 PM Konstantin Shalygin  wrote:

> Hi,
>
> On 3 Jul 2023, at 12:23, mahnoosh shahidi  wrote:
>
> So clients can not get the value which they set in the LocationConstraint
> field in the create bucket request as in this doc
> ?
>
>
> LocationConstraint in this case is a AZ [1], not the placement in Ceph
> (OSD pool, compression settings)
>
>
> [1] https://docs.openstack.org/neutron/rocky/admin/config-az.html
> k
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] dashboard for rgw NoSuchKey

2023-07-03 Thread farhad kh
 I deploy the rgw service and the default pool is created automatically But
I get an error in the dashboard
``
Error connecting to Object Gateway: RGW REST API request failed with
default 404 status code","HostId":"736528-default-default"}')

``
There is a dashboard user but I created the bucket manually

# radosgw-admin user info --uid=dashboard
{
"user_id": "dashboard",
"display_name": "Ceph Dashboard",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [],
"keys": [
{
"user": "dashboard",
"access_key": "C8YG708VBA3M3AAJW2U2",
"secret_key": "NpkmIZ5JJVnu3EFa0ytv5vO64NGttK9ks7A3gEQP"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"system": "true",
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw",
"mfa_ids": []
}

# radosgw-admin buckets list
[
"dashboard"
]

How can I solve the problem?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-03 Thread Matthew Booth
On Fri, 30 Jun 2023 at 08:50, Yin, Congmin  wrote:
>
> Hi Matthew,
>
> Due to the latency of rbd layers, the write latency of the pwl cache is more 
> than ten times that of the Raw device.
> I replied directly below the 2 questions.
>
> Best regards.
> Congmin Yin
>
>
> -Original Message-
> From: Matthew Booth 
> Sent: Thursday, June 29, 2023 7:23 PM
> To: Ilya Dryomov 
> Cc: Giulio Fidente ; Yin, Congmin 
> ; Tang, Guifeng ; Vikhyat 
> Umrao ; Jdurgin ; John Fulton 
> ; Francesco Pantano ; 
> ceph-users@ceph.io
> Subject: Re: [ceph-users] RBD with PWL cache shows poor performance compared 
> to cache device
>
> On Wed, 28 Jun 2023 at 22:44, Ilya Dryomov  wrote:
> >> ** TL;DR
> >>
> >> In testing, the write latency performance of a PWL-cache backed RBD
> >> disk was 2 orders of magnitude worse than the disk holding the PWL
> >> cache.
>
>
>
> PWL cache can use pmem or SSD as cache devices. Using PMEM, based on my test 
> environment at that time, I can give specific data as follows: the write 
> latency of the pmem Raw device is about 10+us, the write latency of the pwl 
> cache is about 100us+(from the latency of the rbd layers), and the write 
> latency of the ceph cluster is about 1000+us(from messengers and network). 
> But for SSDs, there are many types, and I cannot provide a specific value, 
> but it will definitely be worse than pmem. So, for a phenomenon that is 2 
> orders of magnitude lower, it is worse than expected. Can you provide 
> detailed values of the three for analysis. (SSD, pwl cache, ceph cluster)

I'm not entirely sure what you're asking for. Which values are you looking for?

I did provide 3 sets of test results below, is that what you mean?
* rbd no cache: 1417216 ns
* pwl cache device: 44288 ns
* rbd with pwl cache: 5210112 ns

These are all outputs from the benchmarking test. The first is
executing in the VM writing to a ceph RBD disk *without* PWL. The
second is executing on the host writing directly to the SSD which is
being used for the PWL cache. The third is execuing in the VM writing
to the same ceph RBD disk, but this time *with* PWL.

Incidentally, the client and server machines are identical, and the
SSD used by the client for PWL is the same model used on the server as
the OSDs. The SSDs are SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC
H730P Mini (Embedded).

> ==
>
> >>
> >> ** Summary
> >>
> >> I was hoping that PWL cache might be a good solution to the problem
> >> of write latency requirements of etcd when running a kubernetes
> >> control plane on ceph. Etcd is extremely write latency sensitive and
> >> becomes unstable if write latency is too high. The etcd workload can
> >> be characterised by very small (~4k) writes with a queue depth of 1.
> >> Throughput, even on a busy system, is normally very low. As etcd is
> >> distributed and can safely handle the loss of un-flushed data from a
> >> single node, a local ssd PWL cache for etcd looked like an ideal
> >> solution.
> >
> >
> > Right, this is exactly the use case that the PWL cache is supposed to 
> > address.
>
> Good to know!
>
> >> My expectation was that adding a PWL cache on a local SSD to an
> >> RBD-backed would improve write latency to something approaching the
> >> write latency performance of the local SSD. However, in my testing
> >> adding a PWL cache to an rbd-backed VM increased write latency by
> >> approximately 4x over not using a PWL cache. This was over 100x more
> >> than the write latency performance of the underlying SSD.
>
>
>
>
> When using image as the VM's disk, you may have used commands like the 
> following. In many cases, using parameters such as writeback will force the 
> start of rbd cache, which is a memory cache. It is normal for pwl cache to be 
> several times slower than it. Please confirm.
> There is currently no parameter support for using only pwl cache instead of 
> rbd cache. I have tested the latency of using pwl cache (pmem) by modifying 
> the code myself, which is about twice as high as using rbd cache.
>
> qemu -m 1024 -drive 
> format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback

I created the rbd disk by first installing the VM on a local qcow2
file, then copying the data from the qcow2 to rbd, converting to raw.
The command I used was:

`qemu-img convert -f qcow2 -O raw
/var/lib/libvirt/images/pwl-test.qcow2
rbd:libvirt-pool/pwl-test:id=libvirt`

I am configuring rbd options from the server by setting options on the
pool. I have been confirming that options are being set correctly with
`rbd status libvirt-pool/pwl-test` on the server.

The latest set of profiling data requested by Mark were generated
entirely with `rbd_cache=false`:
https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d
-- 
Matthew Booth
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-03 Thread Matthew Booth
On Thu, 29 Jun 2023 at 14:11, Mark Nelson  wrote:
> >>> This container runs:
> >>> fio --rw=write --ioengine=sync --fdatasync=1
> >>> --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
> >>> --output-format=json --runtime=60 --time_based=1
> >>>
> >>> And extracts sync.lat_ns.percentile["99.00"]
> >>
> >> Matthew, do you have the rest of the fio output captured?  It would be 
> >> interesting to see if it's just the 99th percentile that is bad or the PWL 
> >> cache is worse in general.
> > Sure.
> >
> > With PWL cache: https://paste.openstack.org/show/820504/
> > Without PWL cache: https://paste.openstack.org/show/b35e71zAwtYR2hjmSRtR/
> > With PWL cache, 'rbd_cache'=false:
> > https://paste.openstack.org/show/byp8ZITPzb3r9bb06cPf/
>
>
> Also, how's the CPU usage client side?  I would be very curious to see
> if unwindpmp shows anything useful (especially lock contention):
>
>
> https://github.com/markhpc/uwpmp
>
>
> Just attach it to the client-side process and start out with something
> like 100 samples (more are better but take longer).  You can run it like:
>
>
> ./unwindpmp -n 100 -p 

I've included the output in this gist:
https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d

That gist contains 4 runs: 2 with PWL enabled and 2 without, and also
a markdown file explaining the collection method.

Matt
-- 
Matthew Booth
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Get bucket placement target

2023-07-03 Thread Konstantin Shalygin
Hi,

> On 3 Jul 2023, at 12:23, mahnoosh shahidi  wrote:
> 
> So clients can not get the value which they set in the LocationConstraint
> field in the create bucket request as in this doc
> ?

LocationConstraint in this case is a AZ [1], not the placement in Ceph (OSD 
pool, compression settings)


[1] https://docs.openstack.org/neutron/rocky/admin/config-az.html
k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Get bucket placement target

2023-07-03 Thread mahnoosh shahidi
Thanks for your response,

So clients can not get the value which they set in the LocationConstraint
field in the create bucket request as in this doc
?

Best Regards,
Mahnoosh

On Mon, Jul 3, 2023 at 12:35 PM Konstantin Shalygin  wrote:

> Hi,
>
> On 2 Jul 2023, at 17:17, mahnoosh shahidi  wrote:
>
> Is there any way for clients (without rgw-admin access) to get the
> placement target of their S3 buckets? The "GetBucketLocation'' api returns
> "default" for all placement targets and I couldn't find any other S3 api
> for this purpose.
>
>
> From S3 side you can't know the internals of API. The client can operate
> with S3 STORAGE CLASS, for example the output
>
> √ ~ % s5cmd ls s3://restic/snapshots/ | head
> 2023/06/30 17:10:03   263
>  09c68acfe0cd536c3a21273a7adeee2911f370aa4f12fb9de5d13e1b8a93a7ef
> 2023/06/18 10:30:05   285
>  0e6d2982310da04e9c003087457286f5526481facfcd24b617604353af6a00fb
> 2023/06/30 01:00:02   270
>  133002c58d01afd187184bf4b25024d4247c173c01971f8f83409fb1bef8321a
> 2023/06/09 10:30:05   283
>  18a75ad87f240ad3e26c337f0e2b5b43008153c2a3e525c99a3f5cca404ba369
> 2023/06/28 17:05:06   264
>  19ad146ee7d6075d9450800f8b9bb920b30911c1812590409129eb5fcaa0aba5
> 2023/07/02 10:10:11   272
>  1d3adb612e90d6e6eef88d9f2d0d496f231be7dc6befd1da870966da22b42a8a
> 2023/06/07 10:30:05   282
>  1e676be243d7dd58bc39182ebb9767ffc8f8b9d49c8d812d343ed838fae76f4e
> 2023/06/05 01:00:03   268
>  226adc2d95c43a5c88c894fa93a93f263e1ae80a31b40e4b6f1ce28d50c64979
> 2023/07/02 15:10:12   274
>  2541bd2e646a78ab238675d8dc2eec6673cf4eb8354a7e6294e303c470facd07
> 2023/07/01 10:30:05   282
>  28d272ef897c18a8baf545a426b48121a085e458dc78f76989200567ce05739d
>
>
> You can add -s flag, to see the S3 STORAGE CLASS
>
> √ ~ % s5cmd ls -s s3://restic/snapshots/ | head
> 2023/06/30 17:10:03 STANDARD263
>  09c68acfe0cd536c3a21273a7adeee2911f370aa4f12fb9de5d13e1b8a93a7ef
> 2023/06/18 10:30:05 STANDARD285
>  0e6d2982310da04e9c003087457286f5526481facfcd24b617604353af6a00fb
> 2023/06/30 01:00:02 STANDARD270
>  133002c58d01afd187184bf4b25024d4247c173c01971f8f83409fb1bef8321a
> 2023/06/09 10:30:05 STANDARD283
>  18a75ad87f240ad3e26c337f0e2b5b43008153c2a3e525c99a3f5cca404ba369
> 2023/06/28 17:05:06 STANDARD264
>  19ad146ee7d6075d9450800f8b9bb920b30911c1812590409129eb5fcaa0aba5
> 2023/07/02 10:10:11 STANDARD272
>  1d3adb612e90d6e6eef88d9f2d0d496f231be7dc6befd1da870966da22b42a8a
> 2023/06/07 10:30:05 STANDARD282
>  1e676be243d7dd58bc39182ebb9767ffc8f8b9d49c8d812d343ed838fae76f4e
> 2023/06/05 01:00:03 STANDARD268
>  226adc2d95c43a5c88c894fa93a93f263e1ae80a31b40e4b6f1ce28d50c64979
> 2023/07/02 15:10:12 STANDARD274
>  2541bd2e646a78ab238675d8dc2eec6673cf4eb8354a7e6294e303c470facd07
> 2023/07/01 10:30:05 STANDARD282
>  28d272ef897c18a8baf545a426b48121a085e458dc78f76989200567ce05739d
>
> And, there we can see that S3 STORAGE CLASS is STANDARD
>
>
> k
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Get bucket placement target

2023-07-03 Thread Konstantin Shalygin
Hi,

> On 2 Jul 2023, at 17:17, mahnoosh shahidi  wrote:
> 
> Is there any way for clients (without rgw-admin access) to get the
> placement target of their S3 buckets? The "GetBucketLocation'' api returns
> "default" for all placement targets and I couldn't find any other S3 api
> for this purpose.

From S3 side you can't know the internals of API. The client can operate with 
S3 STORAGE CLASS, for example the output

√ ~ % s5cmd ls s3://restic/snapshots/ | head
2023/06/30 17:10:03   263  
09c68acfe0cd536c3a21273a7adeee2911f370aa4f12fb9de5d13e1b8a93a7ef
2023/06/18 10:30:05   285  
0e6d2982310da04e9c003087457286f5526481facfcd24b617604353af6a00fb
2023/06/30 01:00:02   270  
133002c58d01afd187184bf4b25024d4247c173c01971f8f83409fb1bef8321a
2023/06/09 10:30:05   283  
18a75ad87f240ad3e26c337f0e2b5b43008153c2a3e525c99a3f5cca404ba369
2023/06/28 17:05:06   264  
19ad146ee7d6075d9450800f8b9bb920b30911c1812590409129eb5fcaa0aba5
2023/07/02 10:10:11   272  
1d3adb612e90d6e6eef88d9f2d0d496f231be7dc6befd1da870966da22b42a8a
2023/06/07 10:30:05   282  
1e676be243d7dd58bc39182ebb9767ffc8f8b9d49c8d812d343ed838fae76f4e
2023/06/05 01:00:03   268  
226adc2d95c43a5c88c894fa93a93f263e1ae80a31b40e4b6f1ce28d50c64979
2023/07/02 15:10:12   274  
2541bd2e646a78ab238675d8dc2eec6673cf4eb8354a7e6294e303c470facd07
2023/07/01 10:30:05   282  
28d272ef897c18a8baf545a426b48121a085e458dc78f76989200567ce05739d


You can add -s flag, to see the S3 STORAGE CLASS

√ ~ % s5cmd ls -s s3://restic/snapshots/ | head
2023/06/30 17:10:03 STANDARD263  
09c68acfe0cd536c3a21273a7adeee2911f370aa4f12fb9de5d13e1b8a93a7ef
2023/06/18 10:30:05 STANDARD285  
0e6d2982310da04e9c003087457286f5526481facfcd24b617604353af6a00fb
2023/06/30 01:00:02 STANDARD270  
133002c58d01afd187184bf4b25024d4247c173c01971f8f83409fb1bef8321a
2023/06/09 10:30:05 STANDARD283  
18a75ad87f240ad3e26c337f0e2b5b43008153c2a3e525c99a3f5cca404ba369
2023/06/28 17:05:06 STANDARD264  
19ad146ee7d6075d9450800f8b9bb920b30911c1812590409129eb5fcaa0aba5
2023/07/02 10:10:11 STANDARD272  
1d3adb612e90d6e6eef88d9f2d0d496f231be7dc6befd1da870966da22b42a8a
2023/06/07 10:30:05 STANDARD282  
1e676be243d7dd58bc39182ebb9767ffc8f8b9d49c8d812d343ed838fae76f4e
2023/06/05 01:00:03 STANDARD268  
226adc2d95c43a5c88c894fa93a93f263e1ae80a31b40e4b6f1ce28d50c64979
2023/07/02 15:10:12 STANDARD274  
2541bd2e646a78ab238675d8dc2eec6673cf4eb8354a7e6294e303c470facd07
2023/07/01 10:30:05 STANDARD282  
28d272ef897c18a8baf545a426b48121a085e458dc78f76989200567ce05739d

And, there we can see that S3 STORAGE CLASS is STANDARD


k


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Transmit rate metric based per bucket

2023-07-03 Thread Ondřej Kukla
Well in fact it does.

For example in our setup we are parsing the bucket name from the URL. It’s a 
bit tricky as a client could use both the domain name and path base styles, but 
that is not an issue for us.

Alternatively you can parse and analyse logs directly from RGWs which have the 
bucket information without URL parsing needed.

Ondrej

> On 3. 7. 2023, at 9:58, Szabo, Istvan (Agoda)  wrote:
> 
> Hi,
> 
> Yeah, I'm using haproxy but that doesn't have the bucket information :/ 
> 
> Istvan Szabo
> Staff Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
> 
> -Original Message-
> From: Ondřej Kukla  
> Sent: Monday, July 3, 2023 2:25 PM
> To: Szabo, Istvan (Agoda) 
> Subject: Re: [ceph-users] Transmit rate metric based per bucket
> 
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
> 
> Hello, Istvan,
> 
> As far as I’m avare there is no way to do it directly on the RGW. What you 
> can do is using a LB in front of the RGWs something like Nginx or HAProxy.
> 
> Then just collect the LB logs using ELK stack, Loki etc. and sum the body 
> bytes sent per request type and bucket.
> 
> Ondrej
> 
>> On 20. 6. 2023, at 7:01, Szabo, Istvan (Agoda)  
>> wrote:
>> 
>> Hello,
>> 
>> I'd like to know is there a way to query some metrics/logs in octopus (or if 
>> has newer version I'm interested for the future too) about the bandwidth 
>> used in the bucket for put/get operations?
>> 
>> Thank you
>> 
>> 
>> This message is confidential and is for the sole use of the intended 
>> recipient(s). It may also be privileged or otherwise protected by copyright 
>> or other legal rules. If you have received it by mistake please let us know 
>> by reply email and delete it from your system. It is prohibited to copy this 
>> message or disclose its content to anyone. Any confidentiality or privilege 
>> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
>> the message. All messages sent to and from Agoda may be monitored to ensure 
>> compliance with company policies, to protect the company's interests and to 
>> remove potential malware. Electronic messages may be intercepted, amended, 
>> lost or deleted, or contain viruses.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
>> email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io