[ceph-users] Re: RGW: max number of shards per bucket index

2022-04-25 Thread Konstantin Shalygin
Hi,

This is not true, you don't. Indexless, compressed or STANDARD classes may be 
used in same pool simultaneously


k

Sent from my iPhone

> On 26 Apr 2022, at 04:08, Szabo, Istvan (Agoda)  
> wrote:
> 
> you need 2 different data pool, one for normal buckets and one for indexless.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs hangs on writes

2022-04-25 Thread Xiubo Li



On 4/26/22 2:06 AM, Vladimir Brik wrote:

> a), max_mds > 1 ?
No, but I had tried it in the past (i.e. set max_mds to 2, and then 
reverted back to 1)


> b), inline_data enabled ?
No


Okay, this is a different bug.



> c), how to reproduce it, could you provide the detail steps ?
Sometimes, but not always, something like this will hang:
dd if=/dev/zero of=zero bs=100M count=1

I am using the upstream code and have created thousands of file by using 
dd command, but couldn't reproduce it.


Could you try kernel-4.18.0-376.el8, which has been synced to the 
upstream recently ? Maybe this bug only existing in old versions.


-- Xiubo


We use cephfs a the shared storage for our cluster, and another way to 
reproduce it is to start many jobs that execute something like

date > /$RANDOM
In this case there is no hanging, but all files in path_to_some_dir 
are empty.


> d), could you enable the kernel debug log and set the
> debug_mds to 25 in MDSes and share the logs ?
As of this morning we began experiencing OSD cyclically crashing with 
"heartbeat_map is_healthy ... had suicide timed out" so the logs 
probably will have a lot of unrelated stuff until we fix that issue. 
I'll let you know when that happens



Vlad


On 4/24/22 23:40, Xiubo Li wrote:

Hi Vladimir,

This issue looks like the one I am working on now in [1], which is 
also a infinitely stuck bug when creating a new file and then writes 
something to it.


The issue [1] was caused by setting the max_mds > 1 and enabling the 
inline_data and then create a file and then write to it. It seems a 
deadlock in MDS vs kernel.



BTW, what's your setup for:

a), max_mds > 1 ?

b), inline_data enabled ?

c), how to reproduce it, could you provide the detail steps ?

d), could you enable the kernel debug log and set the debug_mds to 25 
in MDSes and share the logs ?



[1] https://tracker.ceph.com/issues/55377

Thanks

BRs

-- Xiubo



On 4/25/22 5:25 AM, Vladimir Brik wrote:

Hello

We are experiencing an issue where, sometimes, when users write to 
cephfs an empty file is created and then the application hangs, 
seemingly indefinitely. I am sometimes able to reproduce with dd.


Does anybody know what might be going on?

Some details:
- ceph health complains about 100+ slow metadata IOs
- CPU utilization of ceph-mds is low
- We have almost 200 kernel cephfs clients
- Cephfs metadata is stored on 3 OSDs that use NVMe flash AICs


Vlad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io







___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Any suggestion for convert a small cluster to cephadm

2022-04-25 Thread Yu Changyuan
After carefully reading the code of cephadm, I think the adoption
process from official document be apply to my cluster with some small
change. And I successfully convert my cluster.

First of all, I change the docker image, use lsync to sync changed
service file to external volume, and do not share /var/lib/containers
with host. Anyone interest can be the code from below repo:

https://github.com/yuchangyuan/cephadm-container

I generally follow the adoption process, but before adopt any ceph
daemon, I need manually stop the daemon on host, due "cephadm" command
can not stop the daemon from inside container. And adoption command will
refuse to start generated ceph daemon service due lack of
firewalld.service, so I need to manually start the service.

For OSD adoption, I update /etc/ceph/ceph.conf, remove any 'osd data'
lines. And original docker image I use is 'ceph/ceph-daemon', which do
not have '/var/lib/ceph/osd/ceph-{ID}' directory on host, so I need
manually create this directory.

I do not use cephx auth, I have below 3 lines of config in my ceph.conf
```
auth client required = none
auth cluster required = none
auth service required = none
```
but "ceph config generate-minimal-conf" output only include first line,
which will cause mgr or other daemon fail to start. so run command
"ceph cephadm set-extra-ceph-conf", to insert later 2 line.

And finally, there're some questions: newly deployed MDS use
'docker.io/ceph/daemon-base:latest-pacific-devel', but other daemon(mon,
mgr & osd) use 'quay.io/ceph/ceph:v16', why use different image? And how
can I make MDS to use 'quay.io/ceph/daemon-base:lastest-pacific-devel'
image?

Yu Changyuan  writes:

> I run a small ceph cluster(3 mon on 3 node, 7 osd on 2 node) at home
> with custom setup, and I think cephadm is the future, so I want to
> convert this cluster to cephadm.
>
> My cluster setup is complex compare to standard deployment, the cluster
> is created in early days, so the it is deployed manually, and later I
> make all ceph daemons run inside container(using ceph/daemon) with
> podman to decouple with the host system(is NixOS), and manage container
> startup with NixOS using systemd service(service file is generated with
> nix expression).
>
> I think some OS files need to be mutable to make cephadm work properly,
> for example, /etc/ceph/ceph.conf need to be writable by cephadm.  This
> is how we config most Linux distros, but not NixOS, which is basically
> all system files is immutable, include /etc.
>
> So I plan to run cephadm in a container, with "--privileged=true" and
> "--net=host", and ssh listen on port '23' to avoid conflict with host,
> and create a dummy 'ntp.service' which only run 'sleep inf' to cheat
> cephadm, because I have chrony on host system. Maybe /dev need to bind
> mount from host.
>
> I have already build the image and successfully run 'cephadm check-host'
> in the container. Official document for cephadm adoption
> process(https://docs.ceph.com/en/latest/cephadm/adoption/) lack details
> so I am not sure whether my unusual setup cluster can be convert
> successfully or not. so I need some suggestion for further steps of
> convertion.
>
> Below is some details of what I have already done:
>
> Dockerfile:
> ```
> FROM fedora:36
>
> RUN dnf -y install \
> systemd openssh-server openssh-clients cephadm podman 
> containernetworking-plugins && \
> dnf clean all
>
> RUN ssh-keygen -f /etc/ssh/ssh_host_rsa_key -N '' -t rsa && \
> ssh-keygen -f /etc/ssh/ssh_host_ed25519_key -N '' -t ed25519 && \
> sed -i -e 's/^.*pam_loginuid.so.*$/session optional pam_loginuid.so/' 
> /etc/pam.d/sshd && \
> sed -i -e 's/^.*Port 22/Port 23/' /etc/ssh/sshd_config
>
> EXPOSE 23
>
> RUN (for i in \
>   systemd-network-generator.service \
>   rpmdb-migrate.service \
>   rpmdb-rebuild.service \
>   getty@tty1.service \
>   remote-fs.target \
>   systemd-resolved.service \
>   systemd-oomd.service \
>   systemd-network-generator.service \
>   dnf-makecache.timer \
>   fstrim.timer; do \
>   rm -f /etc/systemd/system/*.wants/$i; \
>   done)
>
> COPY ./ntp.service /etc/systemd/system
>
> RUN (cd /etc/systemd/system/multi-user.target.wants; ln -s ../ntp.service)
>
> RUN mkdir -p /etc/ceph && \
> mkdir -p /var/lib/containers && \
> mkdir -p /var/lib/ceph && \
> mkdir -p /var/log/ceph && \
> mkdir -p /root/.ssh && chown 700 /root/.ssh
>
> VOLUME /etc/ceph
> VOLUME /var/lib/containers
> VOLUME /var/lib/ceph
> VOLUME /var/log/ceph
> VOLUME /root/.ssh
>
> CMD ["/sbin/init"]
> ```
>
> and below is ntp.service file:
> ```
> [Unit]
> After=network.target
>
> [Service]
> ExecStart=/bin/sleep inf
> Restart=always
> Type=simple
> ```
>
> I start tag the image build from above Dockerfile with name 'cephadm',
> and "--security-opt=seccomp=unconfined" option is necessary for podman
> build to work.
>
> Then I start container with below script:
> ```
> #!/bin/sh
>
> mkdir -p /var/log/ceph
> mkdir -p /etc/ceph/ssh
>
> podman run --rm

[ceph-users] rbd mirror between clusters with private "public" network

2022-04-25 Thread Tony Liu
Hi,

I understand that, for rbd mirror to work, the rbd mirror service requires the
connectivity to all nodes from both cluster.

In my case, for security purpose, the "public" network is actually a private 
network,
which is not routable to external. All internal RBD clients are on that private 
network.
I also put HAProxy there for accessing dashboard and radosgw from external.

I wonder if there is any way to use rbd-mirror in this case?
Using some sort of proxy?


Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem with recreating OSD with disk that died previously

2022-04-25 Thread Rainer Krienke

Hello Josh,

the osd is "down", it was down since it experienced read/write errors on 
the old disk which I as said removed after the rebalance of the cluster. 
And its still now in "down" state.


root@ceph4:~# ceph osd tree down
ID CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
-1   523.97095 root default
-958.21899 host ceph4
49   hdd   3.63899 osd.49   down0 1.0

I am unsure if a destroy would work because of the old pv that is still 
around which is visible in the output of pvs (look at the UUID 
"5d1acce2-ba98-4b4c-81bd-f52a3309161f". Of cource this pv is actually no 
longer existant because I pulled the old disk and inserted a new one.


Rainer

Am 25.04.22 um 18:26 schrieb Josh Baergen:

On Mon, Apr 25, 2022 at 10:22 AM Rainer Krienke  wrote:

Hello,


Hi!


-->  RuntimeError: The osd ID 49 is already in use or does not exist.


This error indicates that the issue is with the osd ID itself, not
with the disk or lvm state. Do you need to run a "ceph osd destroy 49"
first? (You could check "ceph osd tree down" to see if the osd is in a
"down" or "destroyed" state.)

Josh


--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 
1001312

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Problem with recreating OSD with disk that died previously

2022-04-25 Thread Rainer Krienke

Hello,

I run a ceph Nautilius cluster with 9 hosts and a total of 144osds. 
Recently osd.49 on host ceph4  died because of disk errors. ceph tried 
to restart the osd  for 4 timess and then gave up. The cluster then 
rebalanced successfully. ceph -s now says


  cluster:
id: 1234567
health: HEALTH_WARN
4 daemons have recently crashed

  services:
mon: 3 daemons, quorum ceph2,ceph5,ceph8 (age 9h)
mgr: ceph8(active, since 9w), standbys: ceph2, ceph5, ceph-admin
mds: cephfsrz:1 {0=ceph1=up:active} 2 up:standby
osd: 144 osds: 143 up (since 3d), 143 in (since 3d)
...

So I got a new brand new disk and swapped it against the one that died. 
The old disk was /dev/sdb, after inserting the new one is /dev/sds. Next 
I ran ceph-volume which worked without problems after previous 
disk-failure situations. This time it does not work:


# ceph-volume lvm create --bluestore --osd-id 49 --data /dev/sds
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring 
osd tree -f json

-->  RuntimeError: The osd ID 49 is already in use or does not exist.

The ceph-osd.49 log shows:
2022-04-25 17:23:52.698 7f27d3211c00 -1 ESC[0;31m ** ERROR: unable to 
open OSD superblock on /var/lib/ceph/osd/ceph-49: (2) No such file or 
directoryESC[0m


There is no osd.49 process running. Running a ls show the following:

# ls -l  /var/lib/ceph/osd/ceph-49/
lrwxrwxrwx 1 ceph ceph 93 Feb 21 08:52 block -> 
/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3

-rw--- 1 ceph ceph 37 Feb 21 08:52 ceph_fsid
-rw--- 1 ceph ceph 37 Feb 21 08:52 fsid
-rw--- 1 ceph ceph 56 Feb 21 08:52 keyring
-rw--- 1 ceph ceph  6 Feb 21 08:52 ready
-rw--- 1 ceph ceph 10 Feb 21 08:52 type
-rw--- 1 ceph ceph  3 Feb 21 08:52 whoami

The block link seems to be wrong probably old, since I get an io error 
if I run


# dd if=/var/lib/ceph/osd/ceph-49/block of=/dev/null bs=1024k count=1
dd: error reading '/var/lib/ceph/osd/ceph-49/block': Input/output error

However a dd from /dev/sds works just fine.

Running pvs yields:

root@ceph4:~# pvs

/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3: 
read failed after 0 of 4096 at 0: Input/output error


/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3: 
read failed after 0 of 4096 at 4000761970688: Input/output error


/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3: 
read failed after 0 of 4096 at 4000762028032: Input/output error


/dev/ceph-5d1acce2-ba98-4b4c-81bd-f52a3309161f/osd-block-f9443ebd-d004-45d0-ade0-d8ef0df2c0d3: 
read failed after 0 of 4096 at 4096: Input/output error
  PV VGFmt  Attr PSize 
   PFree
  /dev/md1   systemlvm2 a-- 
<110.32g <72.38g
  /dev/sda   ceph-8dfa3747-9e1b-4eb7-adfe-1ed6ee76dfb5 lvm2 a-- 
<3.64t  0
  /dev/sdc   ceph-f2c81a29-1968-4c7c-9354-2d1ac71b361f lvm2 a-- 
<3.64t  0

...
...
  /dev/sdq   ceph-ac6dc03b-422d-49cd-978e-71d2585cdd24 lvm2 a-- 
<3.64t  0
  /dev/sdr   ceph-7b28ddc2-8500-487b-a693-51d711d26d40 lvm2 a-- 
<3.64t  0



So how could I proceed? It seems somehow lvm has a dangling pv, which 
was the old disk. How could I solve this issue?


Thanks
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 
1001312

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW: max number of shards per bucket index

2022-04-25 Thread Casey Bodley
On Fri, Apr 22, 2022 at 3:20 PM Cory Snyder  wrote:
>
> Hi all,
>
> Does anyone have any guidance on the maximum number of bucket index shards
> to optimize performance for buckets with a huge number of objects? It seems
> like there is probably a threshold where performance starts to decrease
> with an increased number of shards (particularly bucket listings). More
> specifically, if I have N OSDs in the bucket index pool, does it make sense
> to allow a bucket to have more than N index shards?

with respect to write parallelism, i think the most interesting limit
is the PG count of the index pool. my understanding is that the OSDs
can only handle a single write at a time per PG due to the rados
recovery model. so you'd expect to see index write performance
increase as you raise the shard count, but level off as you get closer
to that PG count

> Perhaps some multiple
> of N makes sense, with the value of the multiplier influenced by
> osd_op_num_threads_per_shard and osd_op_num_shards?

i'm less familiar with these OSD configurables, but it's possible that
they'd impose limits on parallelism below the PG count

>
> Thanks in advance for any theoretical or empirical insights!

if you need to list these huge buckets, you'll want to strike a
balance between write parallelism and the latency of bucket listing
requests. once that request latency reaches the client's retry timer,
you'll really start to see listing performance fall off a cliff

>
> Cory Snyder
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW: max number of shards per bucket index

2022-04-25 Thread Anthony D'Atri



> Hi,
> 
> I have a bucket which has been sized for 2.4B objects with a prime number 
> 24xxx index shards.
> You can't list this bucket but you don't have any large omap issue.

Why not make it indexless then?

> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] calculate rocksdb size

2022-04-25 Thread Boris Behrens
Hi,
is there a way to show the utilization of cache.db devices?

Cheers
 Boris

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Bad CRC in data messages logging out to syslog

2022-04-25 Thread Chris Page
Hi,

Every now and then I am getting the following logs -

pve01 2022-04-25T16:41:03.109+0100 7ff35b6da700  0 bad crc in data
3860390385 != exp 919468086 from v1:10.0.0.111:0/873787122
pve01 2022-04-25T16:41:04.361+0100 7fb0e2feb700  0 bad crc in data
1141834112 != exp 797386370 from v1:10.0.0.111:0/873787122
pve03 2022-04-25T16:42:03.988+0100 7ffbb02d1700  0 bad crc in data
2357454667 != exp 1757772123 from v1:10.0.0.111:0/873787122
pve01 2022-04-25T16:42:04.394+0100 7faf49cad700  0 bad crc in data
329941200 != exp 2275667382 from v1:10.0.0.111:0/873787122

The messages look fairly consistent and all come in at a similar time
across nodes. Is this anything to worry about? Ceph doesn't seem to be
highlighting any consistency issues and has no problems performing scrubs.

Chris.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm Deployment with io_uring OSD

2022-04-25 Thread Gene Kuo
Hi Mark,

Sorry I somehow missed this email.

I'm currently running Debian 11 with the following kernel version.
Linux ceph01 5.10.0-12-amd64 #1 SMP Debian 5.10.103-1 (2022-03-07) x86_64
GNU/Linux

I've tried upgrading to 17.2.0 and the issue still exists.

Regards,
Gene Kuo
Co-organizer Cloud Native Taiwan User Group


Mark Nelson  於 2022年1月11日 週二 上午3:15寫道:

> Hi Gene,
>
>
> Unfortunately when the io_uring code was first implemented there were no
> stable centos kernels in our test lab that included io_uring support so
> it hasn't gotten a ton of testing.  I agree that your issue looks
> similar to what was reported in issue #47661, but it looks like you are
> running pacific so should have the patch that was included in octopus to
> fix that issue?
>
> What OS/Kernel is this?  FWIW our initial testing was on CentOS 8 with a
> custom EPEL kernel build.
>
> Mark
>
>
> On 1/7/22 7:27 AM, Kuo Gene wrote:
> > Hi,
> >
> > I'm recently trying to enable OSD to use io_uring with our Cephadm
> > deployment by bellow command.
> >
> > ceph config set osd bdev_ioring true
> > ceph config set osd bdev_ioring_hipri true
> > ceph config set osd bdev_ioring_sqthread_poll true
> >
> > However, I've ran into the issue similar to this bug.
> > Bug #47661: Cannot allocate memory appears when using io_uring osd -
> > bluestore - Ceph 
> >
> > I've tried setting "--ulimit memlock=-1:-1" to the docker run line in
> > unit.run file that cephadm created for OSD service.
> > I can confirm that the "max locked memory" is set to unlimited in the
> > container when running ulimit -a in the container.
> > The osd still failed to start when io_uring is enabled.
> >
> > Any suggestions?
> >
> > OSD logs:
> > Using recent ceph image
> >
> quay.io/ceph/ceph@sha256:bb6a71f7f481985f6d3b358e3b9ef64c6755b3db5aa53198e0aac38be5c8ae54
> >
> > debug 2022-01-05T18:34:38.878+ 7f06ffaee080  0 set uid:gid to 167:167
> > (ceph:ceph)
> > debug 2022-01-05T18:34:38.878+ 7f06ffaee080  0 ceph version 16.2.7
> > (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable), process
> > ceph-osd, pid 7
> > debug 2022-01-05T18:34:38.878+ 7f06ffaee080  0 pidfile_write: ignore
> > empty --pid-file
> > debug 2022-01-05T18:34:38.878+ 7f06ffaee080  1 bdev(0x55f113f5c800
> > /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
> > debug 2022-01-05T18:34:38.882+ 7f06ffaee080 -1 bdev(0x55f113f5c800
> > /var/lib/ceph/osd/ceph-2/block) _aio_start io_setup(2) failed: (12)
> Cannot
> > allocate memory
> > debug 2022-01-05T18:34:38.882+ 7f06ffaee080  0 starting osd.2
> osd_data
> > /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
> >
> > ulimit -a output (container is started when io_uring is disabled):
> > core file size  (blocks, -c) unlimited
> > data seg size   (kbytes, -d) unlimited
> > scheduling priority (-e) 0
> > file size   (blocks, -f) unlimited
> > pending signals (-i) 1030203
> > max locked memory   (kbytes, -l) unlimited
> > max memory size (kbytes, -m) unlimited
> > open files  (-n) 1048576
> > pipe size(512 bytes, -p) 8
> > POSIX message queues (bytes, -q) 819200
> > real-time priority  (-r) 0
> > stack size  (kbytes, -s) 8192
> > cpu time   (seconds, -t) unlimited
> > max user processes  (-u) unlimited
> > virtual memory  (kbytes, -v) unlimited
> > file locks  (-x) unlimited
> >
> >
> > Regards,
> > Gene Kuo
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: config/mgr/mgr/dashboard/GRAFANA_API_URL vs fqdn

2022-04-25 Thread Robert Sander

Am 21.04.22 um 10:46 schrieb cephl...@drop.grin.hu:


The hosts do have their short name as hostname and they also possess
FQDNs to be accessible from "outside". That's according to the doc.


To set the Grafana-URL that the browser should use use this command:

ceph dashboard set-grafana-frontend-api-url https://f.q.d.n:3000/

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How I disable DB and WAL for an OSD for improving 8K performance

2022-04-25 Thread Boris Behrens
I wouldn't create a separate partition for the DB and let the OSD handle it.
First, you don't waste space (even a couple of GB begin to pile up on a lot
of OSDs) and you have a lot less to manage.
Just create the OSD on the raw device (I use `ceph-volume lvm create
--bluestore --data /dev/sdXY`)

Regarding the WAL I can't give any advice. I have never thought about
tuning this option/

Cheers
 Boris

Am Mo., 25. Apr. 2022 um 12:59 Uhr schrieb huxia...@horebdata.cn <
huxia...@horebdata.cn>:

> Thanks a lof, Boris.
>
> Do you mean that, the best practice would be to create a DB partition on
> the SSD as OSD, and disable WAL by setting bluestore_prefer_deferred_size=
> 0, and bluestore_prefer_deferred_size_ssd=0
>
> Or there is no need to create a DB partition on the SSD and let the OSD
> manages everything including data and metadata?
>
> Do not know which is the best strategy in terms of performance..
>
>
> --
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How I disable DB and WAL for an OSD for improving 8K performance

2022-04-25 Thread huxia...@horebdata.cn
Thanks a lof, Boris.

Do you mean that, the best practice would be to create a DB partition on the 
SSD as OSD, and disable WAL by setting bluestore_prefer_deferred_size= 0, and 
bluestore_prefer_deferred_size_ssd=0

Or there is no need to create a DB partition on the SSD and let the OSD manages 
everything including data and metadata?  

Do not know which is the best strategy in terms of performance..

Samuel



huxia...@horebdata.cn
 
From: Boris Behrens
Date: 2022-04-25 10:26
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: [ceph-users] How I disable DB and WAL for an OSD for improving 8K 
performance
Hi Samuel,

IIRC at least the DB (I am not sure if flash drives use the 1GB WAL) is always 
located on the same device as the OSD, when it is not configured somewhere 
else. On SSDs/NVMEs people tend to not separate the DB/WAL on other devices.

Cheers
 Boris

Am Mo., 25. Apr. 2022 um 10:09 Uhr schrieb huxia...@horebdata.cn 
:
Dear Ceph folks,

When setting up an all flash Ceph cluster with 8 nodes, I am wondering whether 
should i disable (or turn off)  DB and WAL for SSD based OSDs for better 8K IO 
performance. 

Nornally for HDD OSDs, i used to create a 30GB+ partitions on separate SSDs as 
DB/WAL for them. For (enterprise level)SSD-based OSDs,  one way is to create a 
partition on every SSD OSD as DB/WAL, and then use the rest as the data 
partition of the OSD. However, I am wondering whether such operation would 
improve performance or degrade performance? Since WAL is just a pure write 
buffering, it could cause double writes on the same SSD and thus cause damage 
to the performance...

Any comments, suggestions are highly appreciated,

Samuel



huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im 
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How I disable DB and WAL for an OSD for improving 8K performance

2022-04-25 Thread Boris Behrens
Hi Samuel,

IIRC at least the DB (I am not sure if flash drives use the 1GB WAL) is
always located on the same device as the OSD, when it is not configured
somewhere else. On SSDs/NVMEs people tend to not separate the DB/WAL on
other devices.

Cheers
 Boris

Am Mo., 25. Apr. 2022 um 10:09 Uhr schrieb huxia...@horebdata.cn <
huxia...@horebdata.cn>:

> Dear Ceph folks,
>
> When setting up an all flash Ceph cluster with 8 nodes, I am wondering
> whether should i disable (or turn off)  DB and WAL for SSD based OSDs for
> better 8K IO performance.
>
> Nornally for HDD OSDs, i used to create a 30GB+ partitions on separate
> SSDs as DB/WAL for them. For (enterprise level)SSD-based OSDs,  one way is
> to create a partition on every SSD OSD as DB/WAL, and then use the rest as
> the data partition of the OSD. However, I am wondering whether such
> operation would improve performance or degrade performance? Since WAL is
> just a pure write buffering, it could cause double writes on the same SSD
> and thus cause damage to the performance...
>
> Any comments, suggestions are highly appreciated,
>
> Samuel
>
>
>
> huxia...@horebdata.cn
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How I disable DB and WAL for an OSD for improving 8K performance

2022-04-25 Thread huxia...@horebdata.cn
Dear Ceph folks,

When setting up an all flash Ceph cluster with 8 nodes, I am wondering whether 
should i disable (or turn off)  DB and WAL for SSD based OSDs for better 8K IO 
performance. 

Nornally for HDD OSDs, i used to create a 30GB+ partitions on separate SSDs as 
DB/WAL for them. For (enterprise level)SSD-based OSDs,  one way is to create a 
partition on every SSD OSD as DB/WAL, and then use the rest as the data 
partition of the OSD. However, I am wondering whether such operation would 
improve performance or degrade performance? Since WAL is just a pure write 
buffering, it could cause double writes on the same SSD and thus cause damage 
to the performance...

Any comments, suggestions are highly appreciated,

Samuel



huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io