[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-22 Thread Venky Shankar
On Wed, Mar 22, 2023 at 1:36 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The reruns were in the queue for 4 days because of some slowness issues.
> The core team (Neha, Radek, Laura, and others) are trying to narrow
> down the root cause.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> the core)
> rgw - Casey
> fs - Venky (the fs suite has an unusually high amount of failed jobs,
> any reason to suspect it in the observed slowness?)

Some of those failures can be attributed to slowness, but not all.

Can we have a re-run?

> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/octopus-x - Laura is looking into failures
> upgrade/pacific-x - Laura is looking into failures
> upgrade/quincy-p2p - Laura is looking into failures
> client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> is looking into it
> powercycle - Brad
> ceph-volume - needs a rerun on merged
> https://github.com/ceph/ceph-ansible/pull/7409
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Also, share any findings or hypnosis about the slowness in the
> execution of the suite.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph performance problems

2023-03-22 Thread Alex Gorbachev
Hi Dominik,

RADOS bench will perform parallel IOs, which stresses the internal
configuration, but it would not be the speed of an individual client.  Ceph
is inherently designed for fairness, due to the pseudo random distribution
of data, and the sharded storage design.   Kernel mounts are going to be
fastest, and you can play with caching parameters, and things like
readahead.

When reading, you are reading from one OSD at a time, when writing you are
writing to all redundant OSDs (three if you use 3x replication, or however
many with an erasure coded setup).  Writing is acknowledged only when all
participating OSDs have completed.hardened their writes, so you will have
network latency + OS overhead + ceph overhead + drive latency for each
write.

We have used tricks, such as parallelizing IO across multiple RBD images,
and increasing queue depth, but that's not cephfs, rather ZFS or XFS on top
of RBD.  With proper block alignment, we have seen reasonable performance
from such setups.

For your network links, are you using 802.3ad aggregation and have your MTU
correctly set across the board - client, OSD nods, MONs?  You will want the
MTU to be ideally the same (1500, 9216, 9000 etc.) across the entire
cluster.  Check your hashing algorithm (we use layer 2+3 for most setups).

I would also focus on benchmarking what you would use this cluster for, if
there's one thing I learned from the storage industry is that there are
lies, damn lies, and benchmarks.  If your workload goes to 5000 IOPS max,
you do not need a million IOPS.  If you need good latency response, buy the
best NVMe drives possible for your use case, because latency will always go
all the way to the drive itself.

Hope this helps, and others can likely address cephfs aspects for you.
--
Alex Gorbachev
https://alextelescope.blogspot.com



On Wed, Mar 22, 2023 at 10:16 AM Dominik Baack <
dominik.ba...@cs.uni-dortmund.de> wrote:

> Hi,
>
> we are currently testing out ways to increase Ceph performance because
> what we experience so far is very close to unusable.
>
> For the test cluster we utilizing 4 nodes with the following hardware data:
>
> Dual 200GBe Mellanox Ethernet
> 2x EPYC Rome 7302
> 16x 32GB 3200MHz ECC
> 9x 15.36TB Micron 9300 Pro
>
> For production this will be extended to all 8 nodes if it shows
> promising results.
>
> - Ceph was installed with cephadm.
> - MDS and ODS are located on the same nodes.
> - Mostly using stock config
>
> - Network performance tested with iperf3 seems fine, 26Gbits/s with -P4
> on single port (details below).
> Close to 200Gbits with 10 parallel instances and servers.
>
> When testing a mounted CephFS on the working nodes in various
> configurations I only got <50MB/s for fuse mount and <270MB/s for kernel
> mounts. (dd command and output attached below)
> In addition ceph dashboard and our graphana monitoring reports packet
> loss on all relevant interfaces during load. Which does not occur during
> the normal iperf load tests or rsync/scp file transfer.
>
> Rados Bench shows performance around 2000MB/s which is not max
> performance of the SSDs but fine for us (details below).
>
>
> Why is the filesystem so slow compared to the individual components?
>
> Cheers
> Dominik
>
>
>
>
> Test details:
>
>
> --
>
> Some tests done on working nodes:
>
> Ceph mounted with ceph-fuse
>
> root@ml2ran10:/mnt/cephfs/backup# dd if=/dev/zero of=testfile bs=1M
> count=4096 oflag=direct
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4,3 GB, 4,0 GiB) copied, 88,2933 s, 48,6 MB/s
>
>
> Ceph mounted with kernel driver:
>
> root@ml2ran06:/mnt/cephfs/backup# dd if=/dev/zero of=testfile bs=1M
> count=4096 oflag=direct
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.0989 s, 267 MB/s
>
>
> Storage Node
>
> With fuse
>
> root@ml2rsn05:/mnt/ml2r_storage/backup# dd if=/dev/zero of=testfile
> bs=1M count=4096 oflag=direct
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 53.9977 s, 79.5 MB/s
>
> Kernel mount:
>
> dd if=/dev/zero of=testfile bs=1M count=4096 oflag=direct
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.6726 s, 243 MB/s
>
> ___
>
> Iperf3
>
> iperf3 --zerocopy  -n 10240M -P4 -c ml2ran08s0 -p 4701 -i 15 -b
> 2000
> Connecting to host ml2ran08s0, port 4701
> [  5] local 129.217.31.180 port 43958 connected to 129.217.31.218 port 4701
> [  7] local 129.217.31.180 port 43960 connected to 129.217.31.218 port 4701
> [  9] local 129.217.31.180 port 43962 connected to 129.217.31.218 port 4701
> [ 11] local 129.217.31.180 port 43964 connected to 129.217.31.218 port 4701
> [ ID] Interval   Transfer Bitrate Retr  Cwnd
> [  5]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec0632 KBytes
> [  7]   0.00-3.21   

[ceph-users] Re: MDS host in OSD blacklist

2023-03-22 Thread Xiubo Li


On 22/03/2023 15:54, Frank Schilder wrote:

Dear Xiubo,

thanks for that link. It seems like its a harmless issue. I believe I have seen 
a blocked OP in the ceph warnings for this MDS and was quite happy it restarted 
by itself. Looks like its a very rare race condition and does not lead to data 
loss or corruption.

In a situation like this, is it normal that the MDS host is blacklisted? The 
MDS reconnected just fine. Is it the MDS client ID of the crashed MDS that is 
blocked? I can't see anything that is denied access.


Yeah, I think so. The OSD should try to block the further possibly 
corrupting writes to it.


The blocklist items will be in format likes {IP:Port:Nonce} of the last 
crashed MDS, not the clients. And after the new MDS starting, it will 
get a new nonce, so you won't see the new MDS will be denied to access.


Thanks

- Xiubo


Thanks for your reply and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: 22 March 2023 07:27:08
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] MDS host in OSD blacklist

Hi Frank,

This should be the same issue with
https://tracker.ceph.com/issues/49132, which has been fixed.

Thanks

- Xiubo

On 21/03/2023 23:32, Frank Schilder wrote:

Hi all,

we have an octopus v15.2.17 cluster and observe that one of our MDS hosts 
showed up in the OSD blacklist:

# ceph osd blacklist ls
192.168.32.87:6801/3841823949 2023-03-22T10:08:02.589698+0100
192.168.32.87:6800/3841823949 2023-03-22T10:08:02.589698+0100

I see an MDS restart that might be related; see log snippets below. There are 
no clients running on this host, only OSDs and one MDS. What could be the 
reason for the blacklist entries?

Thanks!

Log snippets:

Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 
7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
(8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, 
LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 4: 
(C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) 
[0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) 
[0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) 
[0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) 
[0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal: *** Caught signal (Aborted) **
Mar 21 10:07:54 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher
Mar 21 10:07:54 ceph-23 journal: 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 
7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:54 ceph-23 journal:
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
(8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, 
LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 4: 
(C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) 

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Xiubo Li

Hi Frank,

Could you reproduce it again by enabling the kclient debug logs and also 
the mds debug logs ?


I need to know what exactly has happened in kclient and mds side. 
Locally I couldn't reproduce it.


Thanks

- Xiubo

On 22/03/2023 23:27, Frank Schilder wrote:

Hi Gregory,

thanks for your reply. First a quick update. Here is how I get ln to work after 
it failed, there seems no timeout:

$ ln envs/satwindspy/include/ffi.h 
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h
ln: failed to create hard link 
'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h': Read-only file system
$ ls -l envs/satwindspy/include mambaforge/pkgs/libffi-3.3-h58526e2_2
envs/satwindspy/include:
total 7664
-rw-rw-r--.   1 rit rit959 Mar  5  2021 ares_build.h
[...]
$ ln envs/satwindspy/include/ffi.h 
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h

After an ls -l on both directories ln works.

To the question: How can I pull out a log from the nfs server? There is nothing 
in /var/log/messages.

I can't reproduce it with simple commands on the NFS client. It seems to occur 
only when a large number of files/dirs is created. I can make the archive 
available to you if this helps.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Gregory Farnum 
Sent: Wednesday, March 22, 2023 4:14 PM
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': 
Read-only file system

Do you have logs of what the nfs server is doing?
Managed to reproduce it in terms of direct CephFS ops?


On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder 
mailto:fr...@dtu.dk>> wrote:
I have to correct myself. It also fails on an export with "sync" mode. Here is 
an strace on the client (strace ln envs/satwindspy/include/ffi.h 
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h):

[...]
stat("mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0x7ffdc5c32820) = 
-1 ENOENT (No such file or directory)
lstat("envs/satwindspy/include/ffi.h", {st_mode=S_IFREG|0664, st_size=13934, 
...}) = 0
linkat(AT_FDCWD, "envs/satwindspy/include/ffi.h", AT_FDCWD, 
"mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0) = -1 EROFS (Read-only file 
system)
[...]
write(2, "ln: ", 4ln: ) = 4
write(2, "failed to create hard link 'mamb"..., 80failed to create hard link 
'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h') = 80
[...]
write(2, ": Read-only file system", 23: Read-only file system) = 23
write(2, "\n", 1
)   = 1
lseek(0, 0, SEEK_CUR)   = -1 ESPIPE (Illegal seek)
close(0)= 0
close(1)= 0
close(2)= 0
exit_group(1)   = ?
+++ exited with 1 +++

Has anyone advice?

Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder mailto:fr...@dtu.dk>>
Sent: Wednesday, March 22, 2023 2:44 PM
To: ceph-users@ceph.io
Subject: [ceph-users] ln: failed to create hard link 'file name': Read-only 
file system

Hi all,

on an NFS re-export of a ceph-fs (kernel client) I observe a very strange 
error. I'm un-taring a larger package (1.2G) and after some time I get these 
errors:

ln: failed to create hard link 'file name': Read-only file system

The strange thing is that this seems only temporary. When I used "ln src dst" for manual 
testing, the command failed as above. However, after that I tried "ln -v src dst" and 
this command created the hard link with exactly the same path arguments. During the period when the 
error occurs, I can't see any FS in read-only mode, neither on the NFS client nor the NFS server. 
Funny thing is that file creation and write still works, its only the hard-link creation that fails.

For details, the set-up is:

file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to 
other server
other server: mount /shares/path as NFS

More precisely, on the file-server:

fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph 
defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev 0 0
exports: /shares/nfs/folder 
-no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP

On the host at DEST-IP:

fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev 0 0

Both, the file server and the client server are virtual machines. The file 
server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine is 
on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64).

When I change the NFS export from "async" to "sync" everything works. However, 
that's a rather bad workaround and not a solution. Although this looks like an NFS issue, I'm 
afraid it is a problem with hard links and ceph-fs. It looks like a race with scheduling and 
executing operations on the ceph-fs kernel mount.

Has anyone seen 

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-22 Thread Adam King
orch approved.

After reruns, the only failed jobs in the orch run were orch/rook tests
(which are broken currently) and 2 instances of "Test failure:
test_non_existent_cluster". That failure is just a command expecting a zero
return code and an error message instead of a nonzero return code in a
failure case. I think the test got backported without the change of the
error code, either way it's not a big deal.

I also took a brief look at the orchestrator failure from the upgrade
tests  (https://tracker.ceph.com/issues/59121) that Laura saw. In the
instance of it I looked at, It seems like the test is running "orch upgrade
start" and then not running "orch upgrade pause" until about 20 minutes
later, at which point the upgrade has already completed (and I can see all
the daemons got upgraded to the new image in the logs). It looks like it
was waiting on a loop to see a minority of the mons had been upgraded
before pausing the upgrade, but even starting that loop took over 2
minutes, despite the only actions in between being a "ceph orch ps" call
and echoing out a value. Really not sure why it was so slow in running
those commands or why it happened 3 times in the initial run but never in
the reruns, but the failure came from that, and the upgrade itself seems to
still work fine.

 - Adam King
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd cp vs. rbd clone + rbd flatten

2023-03-22 Thread Tony Liu
Hi,

I want
1) copy a snapshot to an image,
2) no need to copy snapshots,
3) no dependency after copy,
4) all same image format 2.
In that case, is rbd cp the same as rbd clone + rbd flatten?
I ran some tests, seems like it, but want to confirm, in case of missing 
anything.
Also, seems cp is a bit faster and flatten, is that true?


Thanks!
Tony

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-22 Thread Boris Behrens
Hey Igor,

sadly we do not have the data from the time where c1 was on nautilus.
The RocksDB warning persisted the recreation.

Here are the measurements.
I've picked the same SSD models from the clusters to have some comparablity.
For the 8TB disks it's even the same chassis configuration
(CPU/Memory/Board/Network)

The IOPS seem VERY low for me. Or are these normal values for SSDs? After
recreation the IOPS are a lot better on the pacific cluster.

I also blkdiscarded the SSDs before recreating them.

Nautilus Cluster
osd.22  = 8TB
osd.343 = 2TB
https://pastebin.com/EfSSLmYS

Pacific Cluster before recreating OSDs
osd.40  = 8TB
osd.162 = 2TB
https://pastebin.com/wKMmSW9T

Pacific Cluster after recreation OSDs
osd.40  = 8TB
osd.162 = 2TB
https://pastebin.com/80eMwwBW

Am Mi., 22. März 2023 um 11:09 Uhr schrieb Igor Fedotov <
igor.fedo...@croit.io>:

> Hi Boris,
>
> first of all I'm not sure if it's valid to compare two different clusters
> (pacific vs . nautilus, C1 vs. C2 respectively). The perf numbers
> difference might be caused by a bunch of other factors: different H/W, user
> load, network etc... I can see that you got ~2x latency increase after
> Octopus to Pacific upgrade at C1 but Octopus numbers had been much above
> Nautilus at C2 before the upgrade. Did you observe even lower numbers at C1
> when it was running Nautilus if any?
>
>
> You might want to try "ceph tell osd.N bench" to compare OSDs performance
> for both C1 and C2. Would it be that different?
>
>
> Then redeploy a single OSD at C1, wait till rebalance completion and
> benchmark it again. What would be the new numbers? Please also collect perf
> counters from the to-be-redeployed OSD beforehand.
>
> W.r.t. rocksdb warning - I presume this might be caused by newer RocksDB
> version running on top of DB with a legacy format.. Perhaps redeployment
> would fix that...
>
>
> Thanks,
>
> Igor
> On 3/21/2023 5:31 PM, Boris Behrens wrote:
>
> Hi Igor,
> i've offline compacted all the OSDs and reenabled the bluefs_buffered_io
>
> It didn't change anything and the commit and apply latencies are around
> 5-10 times higher than on our nautlus cluster. The pacific cluster got a 5
> minute mean over all OSDs 2.2ms, while the nautilus cluster is around 0.2 -
> 0.7 ms.
>
> I also see these kind of logs. Google didn't really help:
> 2023-03-21T14:08:22.089+ 7efe7b911700  3 rocksdb:
> [le/block_based/filter_policy.cc:579] Using legacy Bloom filter with high
> (20) bits/key. Dramatic filter space and/or accuracy improvement is
> available with format_version>=5.
>
>
>
>
> Am Di., 21. März 2023 um 10:46 Uhr schrieb Igor Fedotov 
> :
>
>
> Hi Boris,
>
> additionally you might want to manually compact RocksDB for every OSD.
>
>
> Thanks,
>
> Igor
> On 3/21/2023 12:22 PM, Boris Behrens wrote:
>
> Disabling the write cache and the bluefs_buffered_io did not change
> anything.
> What we see is that larger disks seem to be the leader in therms of
> slowness (we have 70% 2TB, 20% 4TB and 10% 8TB SSDs in the cluster), but
> removing some of the 8TB disks and replace them with 2TB (because it's by
> far the majority and we have a lot of them) disks did also not change
> anything.
>
> Are there any other ideas I could try. Customer start to complain about the
> slower performance and our k8s team mentions problems with ETCD because the
> latency is too high.
>
> Would it be an option to recreate every OSD?
>
> Cheers
>  Boris
>
> Am Di., 28. Feb. 2023 um 22:46 Uhr schrieb Boris Behrens  
>   :
>
>
> Hi Josh,
> thanks a lot for the breakdown and the links.
> I disabled the write cache but it didn't change anything. Tomorrow I will
> try to disable bluefs_buffered_io.
>
> It doesn't sound that I can mitigate the problem with more SSDs.
>
>
> Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh Baergen 
>  :
>
>
> Hi Boris,
>
> OK, what I'm wondering is whetherhttps://tracker.ceph.com/issues/58530 is 
> involved. There are two
> aspects to that ticket:
> * A measurable increase in the number of bytes written to disk in
> Pacific as compared to Nautilus
> * The same, but for IOPS
>
> Per the current theory, both are due to the loss of rocksdb log
> recycling when using default recovery options in rocksdb 6.8; Octopus
> uses version 6.1.2, Pacific uses 6.8.1.
>
> 16.2.11 largely addressed the bytes-written amplification, but the
> IOPS amplification remains. In practice, whether this results in a
> write performance degradation depends on the speed of the underlying
> media and the workload, and thus the things I mention in the next
> paragraph may or may not be applicable to you.
>
> There's no known workaround or solution for this at this time. In some
> cases I've seen that disabling bluefs_buffered_io (which itself can
> cause IOPS amplification in some cases) can help; I think most folks
> do this by setting it in local conf and then restarting OSDs in order
> to gain the config change. Something else to consider 
> 

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-22 Thread Ilya Dryomov
On Tue, Mar 21, 2023 at 9:06 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The reruns were in the queue for 4 days because of some slowness issues.
> The core team (Neha, Radek, Laura, and others) are trying to narrow
> down the root cause.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> the core)
> rgw - Casey
> fs - Venky (the fs suite has an unusually high amount of failed jobs,
> any reason to suspect it in the observed slowness?)
> orch - Adam King
> rbd - Ilya
> krbd - Ilya

rbd and krbd approved.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Gregory Farnum
On Wed, Mar 22, 2023 at 8:27 AM Frank Schilder  wrote:

> Hi Gregory,
>
> thanks for your reply. First a quick update. Here is how I get ln to work
> after it failed, there seems no timeout:
>
> $ ln envs/satwindspy/include/ffi.h
> mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h
> ln: failed to create hard link
> 'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h': Read-only file system
> $ ls -l envs/satwindspy/include mambaforge/pkgs/libffi-3.3-h58526e2_2
> envs/satwindspy/include:
> total 7664
> -rw-rw-r--.   1 rit rit959 Mar  5  2021 ares_build.h
> [...]
> $ ln envs/satwindspy/include/ffi.h
> mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h
>
> After an ls -l on both directories ln works.
>
> To the question: How can I pull out a log from the nfs server? There is
> nothing in /var/log/messages.


So you’re using the kernel server and re-exporting, right?

I’m not very familiar with its implementation; I wonder if it’s doing
something strange via the kernel vfs.
AFAIK this isn’t really supportable for general use because nfs won’t
respect the CephFS file consistency protocol. But maybe it’s trying a bit
and that’s causing trouble?
-Greg



>
> I can't reproduce it with simple commands on the NFS client. It seems to
> occur only when a large number of files/dirs is created. I can make the
> archive available to you if this helps.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Gregory Farnum 
> Sent: Wednesday, March 22, 2023 4:14 PM
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name':
> Read-only file system
>
> Do you have logs of what the nfs server is doing?
> Managed to reproduce it in terms of direct CephFS ops?
>
>
> On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder  fr...@dtu.dk>> wrote:
> I have to correct myself. It also fails on an export with "sync" mode.
> Here is an strace on the client (strace ln envs/satwindspy/include/ffi.h
> mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h):
>
> [...]
> stat("mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h",
> 0x7ffdc5c32820) = -1 ENOENT (No such file or directory)
> lstat("envs/satwindspy/include/ffi.h", {st_mode=S_IFREG|0664,
> st_size=13934, ...}) = 0
> linkat(AT_FDCWD, "envs/satwindspy/include/ffi.h", AT_FDCWD,
> "mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0) = -1 EROFS
> (Read-only file system)
> [...]
> write(2, "ln: ", 4ln: ) = 4
> write(2, "failed to create hard link 'mamb"..., 80failed to create hard
> link 'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h') = 80
> [...]
> write(2, ": Read-only file system", 23: Read-only file system) = 23
> write(2, "\n", 1
> )   = 1
> lseek(0, 0, SEEK_CUR)   = -1 ESPIPE (Illegal seek)
> close(0)= 0
> close(1)= 0
> close(2)= 0
> exit_group(1)   = ?
> +++ exited with 1 +++
>
> Has anyone advice?
>
> Thanks!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Frank Schilder mailto:fr...@dtu.dk>>
> Sent: Wednesday, March 22, 2023 2:44 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] ln: failed to create hard link 'file name':
> Read-only file system
>
> Hi all,
>
> on an NFS re-export of a ceph-fs (kernel client) I observe a very strange
> error. I'm un-taring a larger package (1.2G) and after some time I get
> these errors:
>
> ln: failed to create hard link 'file name': Read-only file system
>
> The strange thing is that this seems only temporary. When I used "ln src
> dst" for manual testing, the command failed as above. However, after that I
> tried "ln -v src dst" and this command created the hard link with exactly
> the same path arguments. During the period when the error occurs, I can't
> see any FS in read-only mode, neither on the NFS client nor the NFS server.
> Funny thing is that file creation and write still works, its only the
> hard-link creation that fails.
>
> For details, the set-up is:
>
> file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to
> other server
> other server: mount /shares/path as NFS
>
> More precisely, on the file-server:
>
> fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph
> defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev
> 0 0
> exports: /shares/nfs/folder
> -no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP
>
> On the host at DEST-IP:
>
> fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev
> 0 0
>
> Both, the file server and the client server are virtual machines. The file
> server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine
> is on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64).
>
> When I 

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Frank Schilder
Hi Gregory,

thanks for your reply. First a quick update. Here is how I get ln to work after 
it failed, there seems no timeout:

$ ln envs/satwindspy/include/ffi.h 
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h
ln: failed to create hard link 
'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h': Read-only file system
$ ls -l envs/satwindspy/include mambaforge/pkgs/libffi-3.3-h58526e2_2
envs/satwindspy/include:
total 7664
-rw-rw-r--.   1 rit rit959 Mar  5  2021 ares_build.h
[...]
$ ln envs/satwindspy/include/ffi.h 
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h

After an ls -l on both directories ln works.

To the question: How can I pull out a log from the nfs server? There is nothing 
in /var/log/messages.

I can't reproduce it with simple commands on the NFS client. It seems to occur 
only when a large number of files/dirs is created. I can make the archive 
available to you if this helps.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Gregory Farnum 
Sent: Wednesday, March 22, 2023 4:14 PM
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': 
Read-only file system

Do you have logs of what the nfs server is doing?
Managed to reproduce it in terms of direct CephFS ops?


On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder 
mailto:fr...@dtu.dk>> wrote:
I have to correct myself. It also fails on an export with "sync" mode. Here is 
an strace on the client (strace ln envs/satwindspy/include/ffi.h 
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h):

[...]
stat("mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0x7ffdc5c32820) = 
-1 ENOENT (No such file or directory)
lstat("envs/satwindspy/include/ffi.h", {st_mode=S_IFREG|0664, st_size=13934, 
...}) = 0
linkat(AT_FDCWD, "envs/satwindspy/include/ffi.h", AT_FDCWD, 
"mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0) = -1 EROFS (Read-only 
file system)
[...]
write(2, "ln: ", 4ln: ) = 4
write(2, "failed to create hard link 'mamb"..., 80failed to create hard link 
'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h') = 80
[...]
write(2, ": Read-only file system", 23: Read-only file system) = 23
write(2, "\n", 1
)   = 1
lseek(0, 0, SEEK_CUR)   = -1 ESPIPE (Illegal seek)
close(0)= 0
close(1)= 0
close(2)= 0
exit_group(1)   = ?
+++ exited with 1 +++

Has anyone advice?

Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder mailto:fr...@dtu.dk>>
Sent: Wednesday, March 22, 2023 2:44 PM
To: ceph-users@ceph.io
Subject: [ceph-users] ln: failed to create hard link 'file name': Read-only 
file system

Hi all,

on an NFS re-export of a ceph-fs (kernel client) I observe a very strange 
error. I'm un-taring a larger package (1.2G) and after some time I get these 
errors:

ln: failed to create hard link 'file name': Read-only file system

The strange thing is that this seems only temporary. When I used "ln src dst" 
for manual testing, the command failed as above. However, after that I tried 
"ln -v src dst" and this command created the hard link with exactly the same 
path arguments. During the period when the error occurs, I can't see any FS in 
read-only mode, neither on the NFS client nor the NFS server. Funny thing is 
that file creation and write still works, its only the hard-link creation that 
fails.

For details, the set-up is:

file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to 
other server
other server: mount /shares/path as NFS

More precisely, on the file-server:

fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph 
defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev 0 0
exports: /shares/nfs/folder 
-no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP

On the host at DEST-IP:

fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev 0 0

Both, the file server and the client server are virtual machines. The file 
server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine is 
on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64).

When I change the NFS export from "async" to "sync" everything works. However, 
that's a rather bad workaround and not a solution. Although this looks like an 
NFS issue, I'm afraid it is a problem with hard links and ceph-fs. It looks 
like a race with scheduling and executing operations on the ceph-fs kernel 
mount.

Has anyone seen something like that?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Gregory Farnum
Do you have logs of what the nfs server is doing?
Managed to reproduce it in terms of direct CephFS ops?


On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder  wrote:

> I have to correct myself. It also fails on an export with "sync" mode.
> Here is an strace on the client (strace ln envs/satwindspy/include/ffi.h
> mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h):
>
> [...]
> stat("mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h",
> 0x7ffdc5c32820) = -1 ENOENT (No such file or directory)
> lstat("envs/satwindspy/include/ffi.h", {st_mode=S_IFREG|0664,
> st_size=13934, ...}) = 0
> linkat(AT_FDCWD, "envs/satwindspy/include/ffi.h", AT_FDCWD,
> "mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0) = -1 EROFS
> (Read-only file system)
> [...]
> write(2, "ln: ", 4ln: ) = 4
> write(2, "failed to create hard link 'mamb"..., 80failed to create hard
> link 'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h') = 80
> [...]
> write(2, ": Read-only file system", 23: Read-only file system) = 23
> write(2, "\n", 1
> )   = 1
> lseek(0, 0, SEEK_CUR)   = -1 ESPIPE (Illegal seek)
> close(0)= 0
> close(1)= 0
> close(2)= 0
> exit_group(1)   = ?
> +++ exited with 1 +++
>
> Has anyone advice?
>
> Thanks!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Frank Schilder 
> Sent: Wednesday, March 22, 2023 2:44 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] ln: failed to create hard link 'file name':
> Read-only file system
>
> Hi all,
>
> on an NFS re-export of a ceph-fs (kernel client) I observe a very strange
> error. I'm un-taring a larger package (1.2G) and after some time I get
> these errors:
>
> ln: failed to create hard link 'file name': Read-only file system
>
> The strange thing is that this seems only temporary. When I used "ln src
> dst" for manual testing, the command failed as above. However, after that I
> tried "ln -v src dst" and this command created the hard link with exactly
> the same path arguments. During the period when the error occurs, I can't
> see any FS in read-only mode, neither on the NFS client nor the NFS server.
> Funny thing is that file creation and write still works, its only the
> hard-link creation that fails.
>
> For details, the set-up is:
>
> file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to
> other server
> other server: mount /shares/path as NFS
>
> More precisely, on the file-server:
>
> fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph
> defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev
> 0 0
> exports: /shares/nfs/folder
> -no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP
>
> On the host at DEST-IP:
>
> fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev
> 0 0
>
> Both, the file server and the client server are virtual machines. The file
> server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine
> is on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64).
>
> When I change the NFS export from "async" to "sync" everything works.
> However, that's a rather bad workaround and not a solution. Although this
> looks like an NFS issue, I'm afraid it is a problem with hard links and
> ceph-fs. It looks like a race with scheduling and executing operations on
> the ceph-fs kernel mount.
>
> Has anyone seen something like that?
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph performance problems

2023-03-22 Thread Joachim Kraftmayer

Hi Dominik,

if you need performance, the default configuration is not designed for that.
When tuning performance, please pay attention to the sources, if 
benchmark or similar is described there, you must pay attention to 
whether it is suitable for productive operation.


Regards, Joachim

___
Clyso GmbH - Ceph Foundation Member

Am 21.03.23 um 10:40 schrieb Dominik Baack:

Hi,

we are currently testing out ways to increase Ceph performance because 
what we experience so far is very close to unusable.


For the test cluster we utilizing 4 nodes with the following hardware 
data:


Dual 200GBe Mellanox Ethernet
2x EPYC Rome 7302
16x 32GB 3200MHz ECC
9x 15.36TB Micron 9300 Pro

For production this will be extended to all 8 nodes if it shows 
promising results.


- Ceph was installed with cephadm.
- MDS and ODS are located on the same nodes.
- Mostly using stock config

- Network performance tested with iperf3 seems fine, 26Gbits/s with 
-P4 on single port (details below).

   Close to 200Gbits with 10 parallel instances and servers.

When testing a mounted CephFS on the working nodes in various 
configurations I only got <50MB/s for fuse mount and <270MB/s for 
kernel mounts. (dd command and output attached below)
In addition ceph dashboard and our graphana monitoring reports packet 
loss on all relevant interfaces during load. Which does not occur 
during the normal iperf load tests or rsync/scp file transfer.


Rados Bench shows performance around 2000MB/s which is not max 
performance of the SSDs but fine for us (details below).



Why is the filesystem so slow compared to the individual components?

Cheers
Dominik




Test details:

-- 



Some tests done on working nodes:

Ceph mounted with ceph-fuse

root@ml2ran10:/mnt/cephfs/backup# dd if=/dev/zero of=testfile bs=1M 
count=4096 oflag=direct

4096+0 records in
4096+0 records out
4294967296 bytes (4,3 GB, 4,0 GiB) copied, 88,2933 s, 48,6 MB/s


Ceph mounted with kernel driver:

root@ml2ran06:/mnt/cephfs/backup# dd if=/dev/zero of=testfile bs=1M 
count=4096 oflag=direct

4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.0989 s, 267 MB/s


Storage Node

With fuse

root@ml2rsn05:/mnt/ml2r_storage/backup# dd if=/dev/zero of=testfile 
bs=1M count=4096 oflag=direct

4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 53.9977 s, 79.5 MB/s

Kernel mount:

dd if=/dev/zero of=testfile bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.6726 s, 243 MB/s

___

Iperf3

iperf3 --zerocopy  -n 10240M -P4 -c ml2ran08s0 -p 4701 -i 15 -b 
2000

Connecting to host ml2ran08s0, port 4701
[  5] local 129.217.31.180 port 43958 connected to 129.217.31.218 port 
4701
[  7] local 129.217.31.180 port 43960 connected to 129.217.31.218 port 
4701
[  9] local 129.217.31.180 port 43962 connected to 129.217.31.218 port 
4701
[ 11] local 129.217.31.180 port 43964 connected to 129.217.31.218 port 
4701

[ ID] Interval   Transfer Bitrate Retr  Cwnd
[  5]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec    0    632 KBytes
[  7]   0.00-3.21   sec  2.50 GBytes  6.70 Gbits/sec    0    522 KBytes
[  9]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec    0    612 KBytes
[ 11]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec    0    430 KBytes
[SUM]   0.00-3.21   sec  10.0 GBytes  26.8 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bitrate Retr
[  5]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec 0 sender
[  5]   0.00-3.21   sec  2.50 GBytes  6.67 Gbits/sec  
receiver

[  7]   0.00-3.21   sec  2.50 GBytes  6.70 Gbits/sec 0 sender
[  7]   0.00-3.21   sec  2.50 GBytes  6.67 Gbits/sec  
receiver

[  9]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec 0 sender
[  9]   0.00-3.21   sec  2.49 GBytes  6.67 Gbits/sec  
receiver

[ 11]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec 0 sender
[ 11]   0.00-3.21   sec  2.50 GBytes  6.67 Gbits/sec  
receiver

[SUM]   0.00-3.21   sec  10.0 GBytes  26.8 Gbits/sec 0 sender
[SUM]   0.00-3.21   sec  9.98 GBytes  26.7 Gbits/sec  
receiver




_

Rados Bench on storage node:

# rados bench -p testbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 
4194304 for up to 10 seconds or 0 objects

Object prefix: benchmark_data_ml2rsn05_2829244
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s) avg 
lat(s)

    0   0 0 0 0 0 - 0
    1  16   747   731   2923.84  2924 0.0178757 0.0216262
    2  16  1506  1490   

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Frank Schilder
I have to correct myself. It also fails on an export with "sync" mode. Here is 
an strace on the client (strace ln envs/satwindspy/include/ffi.h 
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h):

[...]
stat("mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0x7ffdc5c32820) = 
-1 ENOENT (No such file or directory)
lstat("envs/satwindspy/include/ffi.h", {st_mode=S_IFREG|0664, st_size=13934, 
...}) = 0
linkat(AT_FDCWD, "envs/satwindspy/include/ffi.h", AT_FDCWD, 
"mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0) = -1 EROFS (Read-only 
file system)
[...]
write(2, "ln: ", 4ln: ) = 4
write(2, "failed to create hard link 'mamb"..., 80failed to create hard link 
'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h') = 80
[...]
write(2, ": Read-only file system", 23: Read-only file system) = 23
write(2, "\n", 1
)   = 1
lseek(0, 0, SEEK_CUR)   = -1 ESPIPE (Illegal seek)
close(0)= 0
close(1)= 0
close(2)= 0
exit_group(1)   = ?
+++ exited with 1 +++

Has anyone advice?

Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Wednesday, March 22, 2023 2:44 PM
To: ceph-users@ceph.io
Subject: [ceph-users] ln: failed to create hard link 'file name': Read-only 
file system

Hi all,

on an NFS re-export of a ceph-fs (kernel client) I observe a very strange 
error. I'm un-taring a larger package (1.2G) and after some time I get these 
errors:

ln: failed to create hard link 'file name': Read-only file system

The strange thing is that this seems only temporary. When I used "ln src dst" 
for manual testing, the command failed as above. However, after that I tried 
"ln -v src dst" and this command created the hard link with exactly the same 
path arguments. During the period when the error occurs, I can't see any FS in 
read-only mode, neither on the NFS client nor the NFS server. Funny thing is 
that file creation and write still works, its only the hard-link creation that 
fails.

For details, the set-up is:

file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to 
other server
other server: mount /shares/path as NFS

More precisely, on the file-server:

fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph 
defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev 0 0
exports: /shares/nfs/folder 
-no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP

On the host at DEST-IP:

fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev 0 0

Both, the file server and the client server are virtual machines. The file 
server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine is 
on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64).

When I change the NFS export from "async" to "sync" everything works. However, 
that's a rather bad workaround and not a solution. Although this looks like an 
NFS issue, I'm afraid it is a problem with hard links and ceph-fs. It looks 
like a race with scheduling and executing operations on the ceph-fs kernel 
mount.

Has anyone seen something like that?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] S3 notification for backup

2023-03-22 Thread Olivier Audry
Hello

I would like to know if using the bucket notification system to http for 
backuping S3 bucket is a good move or not.

Is someone already did it ?

I got around 100to of little documents to backup and archive for legal purpose 
and dr each day. 

Many thanks for your help.

oau
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Days India 2023 - Call for proposals

2023-03-22 Thread Gaurav Sitlani
Hello Cephers,

We're happy to share that we're organizing Ceph Day India on 5th May 2023
this year.

The event is now sold out! If you missed getting a ticket, consider
submitting the CFP and we'll provide a ticket if accepted.

https://ceph.io/en/community/events/2023/ceph-days-india/

Please reach out to us if you need any help regarding the submissions.

Thanks and regards,

Gaurav Sitlani
Ceph Community Ambassador
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph performance problems

2023-03-22 Thread Dominik Baack

Hi,

we are currently testing out ways to increase Ceph performance because 
what we experience so far is very close to unusable.


For the test cluster we utilizing 4 nodes with the following hardware data:

Dual 200GBe Mellanox Ethernet
2x EPYC Rome 7302
16x 32GB 3200MHz ECC
9x 15.36TB Micron 9300 Pro

For production this will be extended to all 8 nodes if it shows 
promising results.


- Ceph was installed with cephadm.
- MDS and ODS are located on the same nodes.
- Mostly using stock config

- Network performance tested with iperf3 seems fine, 26Gbits/s with -P4 
on single port (details below).

   Close to 200Gbits with 10 parallel instances and servers.

When testing a mounted CephFS on the working nodes in various 
configurations I only got <50MB/s for fuse mount and <270MB/s for kernel 
mounts. (dd command and output attached below)
In addition ceph dashboard and our graphana monitoring reports packet 
loss on all relevant interfaces during load. Which does not occur during 
the normal iperf load tests or rsync/scp file transfer.


Rados Bench shows performance around 2000MB/s which is not max 
performance of the SSDs but fine for us (details below).



Why is the filesystem so slow compared to the individual components?

Cheers
Dominik




Test details:

--

Some tests done on working nodes:

Ceph mounted with ceph-fuse

root@ml2ran10:/mnt/cephfs/backup# dd if=/dev/zero of=testfile bs=1M 
count=4096 oflag=direct

4096+0 records in
4096+0 records out
4294967296 bytes (4,3 GB, 4,0 GiB) copied, 88,2933 s, 48,6 MB/s


Ceph mounted with kernel driver:

root@ml2ran06:/mnt/cephfs/backup# dd if=/dev/zero of=testfile bs=1M 
count=4096 oflag=direct

4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.0989 s, 267 MB/s


Storage Node

With fuse

root@ml2rsn05:/mnt/ml2r_storage/backup# dd if=/dev/zero of=testfile 
bs=1M count=4096 oflag=direct

4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 53.9977 s, 79.5 MB/s

Kernel mount:

dd if=/dev/zero of=testfile bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.6726 s, 243 MB/s

___

Iperf3

iperf3 --zerocopy  -n 10240M -P4 -c ml2ran08s0 -p 4701 -i 15 -b 2000
Connecting to host ml2ran08s0, port 4701
[  5] local 129.217.31.180 port 43958 connected to 129.217.31.218 port 4701
[  7] local 129.217.31.180 port 43960 connected to 129.217.31.218 port 4701
[  9] local 129.217.31.180 port 43962 connected to 129.217.31.218 port 4701
[ 11] local 129.217.31.180 port 43964 connected to 129.217.31.218 port 4701
[ ID] Interval   Transfer Bitrate Retr  Cwnd
[  5]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec    0    632 KBytes
[  7]   0.00-3.21   sec  2.50 GBytes  6.70 Gbits/sec    0    522 KBytes
[  9]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec    0    612 KBytes
[ 11]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec    0    430 KBytes
[SUM]   0.00-3.21   sec  10.0 GBytes  26.8 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bitrate Retr
[  5]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec 0 sender
[  5]   0.00-3.21   sec  2.50 GBytes  6.67 Gbits/sec  
receiver

[  7]   0.00-3.21   sec  2.50 GBytes  6.70 Gbits/sec 0 sender
[  7]   0.00-3.21   sec  2.50 GBytes  6.67 Gbits/sec  
receiver

[  9]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec 0 sender
[  9]   0.00-3.21   sec  2.49 GBytes  6.67 Gbits/sec  
receiver

[ 11]   0.00-3.21   sec  2.50 GBytes  6.69 Gbits/sec 0 sender
[ 11]   0.00-3.21   sec  2.50 GBytes  6.67 Gbits/sec  
receiver

[SUM]   0.00-3.21   sec  10.0 GBytes  26.8 Gbits/sec 0 sender
[SUM]   0.00-3.21   sec  9.98 GBytes  26.7 Gbits/sec  
receiver




_

Rados Bench on storage node:

# rados bench -p testbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 
4194304 for up to 10 seconds or 0 objects

Object prefix: benchmark_data_ml2rsn05_2829244
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s) avg 
lat(s)

    0   0 0 0 0 0 -   0
    1  16   747   731   2923.84  2924 0.0178757   0.0216262
    2  16  1506  1490   2979.71  3036 0.0308664   0.0213685
    3  16  2267  2251   3000.99  3044 0.0259053   0.0212556
    4  16  3058  3042   3041.62  3164 0.0227621   0.0209792
    5  16  3850  3834    3066.8  3168 0.0130519   0.0208148
    6  16  4625  4609   3072.26  3100 0.151371   0.0207904
    7  16  

[ceph-users] Advices for the best way to move db/wal lv from old nvme to new one

2023-03-22 Thread Christophe BAILLON
Hello,

We have a cluster with 26 nodes, and 15 nodes have a bad batch of 2 nvme wheree 
we have for each 6 lv for db/wal.  We have to change it, because they fail one 
by one...
The defective nvme are M2 samsung enterprise.
When they fail, we got sense errors, and the nvme disappear, if we power of the 
server, and power on, it come back... If we just do a soft reboot, the nvme 
don’t come back...
So we have decided to replace all of them by Intel pcie SSDPEDME016T4S
The original one are 1Tb, and the new are 1,6Tb

What is the best method to do that ?

put the node in maint mode, and do a pvmove of each and doing after a lv resize 
of each ?

Or there are a easier way  to do that ?  Like what I found in mailing list 
archives ? 
ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD}
bluefs-bdev-new-db --dev-target /dev/bluesfs_db/db-osd${OSD}

and

ceph-bluestore-tool --path dev/osd1/ --devs-source dev/osd1/block
--dev-target dev/osd1/block.db bluefs-bdev-migrate

For your understanding, we are in last Quincy and hardware is :
12 x 18tb sas
2 x nvme for db/wal

Thanks for advance for your insights
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Frank Schilder
Hi all,

on an NFS re-export of a ceph-fs (kernel client) I observe a very strange 
error. I'm un-taring a larger package (1.2G) and after some time I get these 
errors:

ln: failed to create hard link 'file name': Read-only file system

The strange thing is that this seems only temporary. When I used "ln src dst" 
for manual testing, the command failed as above. However, after that I tried 
"ln -v src dst" and this command created the hard link with exactly the same 
path arguments. During the period when the error occurs, I can't see any FS in 
read-only mode, neither on the NFS client nor the NFS server. Funny thing is 
that file creation and write still works, its only the hard-link creation that 
fails.

For details, the set-up is:

file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to 
other server
other server: mount /shares/path as NFS

More precisely, on the file-server:

fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph 
defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev 0 0
exports: /shares/nfs/folder 
-no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP

On the host at DEST-IP:

fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev 0 0

Both, the file server and the client server are virtual machines. The file 
server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine is 
on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64).

When I change the NFS export from "async" to "sync" everything works. However, 
that's a rather bad workaround and not a solution. Although this looks like an 
NFS issue, I'm afraid it is a problem with hard links and ceph-fs. It looks 
like a race with scheduling and executing operations on the ceph-fs kernel 
mount.

Has anyone seen something like that?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-22 Thread Casey Bodley
On Tue, Mar 21, 2023 at 4:06 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The reruns were in the queue for 4 days because of some slowness issues.
> The core team (Neha, Radek, Laura, and others) are trying to narrow
> down the root cause.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> the core)
> rgw - Casey

there were some java_s3test failures related to
https://tracker.ceph.com/issues/58554. i've added the fix to
https://github.com/ceph/java_s3tests/commits/ceph-quincy, so a rerun
should resolve those failures
there were also some 'Failed to fetch package version' failures in the
rerun that warranted another rerun anyway

there's also an urgent priority bug fix in
https://github.com/ceph/ceph/pull/50625 that i'd really like to add to
this release; sorry for the late notice

> fs - Venky (the fs suite has an unusually high amount of failed jobs,
> any reason to suspect it in the observed slowness?)
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/octopus-x - Laura is looking into failures
> upgrade/pacific-x - Laura is looking into failures
> upgrade/quincy-p2p - Laura is looking into failures
> client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> is looking into it
> powercycle - Brad
> ceph-volume - needs a rerun on merged
> https://github.com/ceph/ceph-ansible/pull/7409
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Also, share any findings or hypnosis about the slowness in the
> execution of the suite.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-03-22 Thread Daniel Gryniewicz
Yes, the POSIXDriver will support that.  If you want NFS access, we'd 
suggest you use Ganesha's FSAL_RGW to access through RGW (because 
multipart uploads are not fun), but it will work.


Daniel

On 3/21/23 15:48, Fox, Kevin M wrote:

Will either the file store or the posix/gpfs filter support the underlying 
files changing underneath so you can access the files either through s3 or by 
other out of band means (smb, nfs, etc)?

Thanks,
Kevin


From: Matt Benjamin 
Sent: Monday, March 20, 2023 5:27 PM
To: Chris MacNaughton
Cc: ceph-users@ceph.io; Kyle Bader
Subject: [ceph-users] Re: s3 compatible interface

Check twice before you click! This email originated from outside PNNL.


Hi Chris,

This looks useful.  Note for this thread:  this *looks like* it's using the
zipper dbstore backend?  Yes, that's coming in Reef.  We think of dbstore
as mostly the zipper reference driver, but it can be useful as a standalone
setup, potentially.

But there's now a prototype of a posix file filter that can be stacked on
dbstore (or rados, I guess)--not yet merged, and iiuc post-Reef.  That's
the project Daniel was describing.  The posix/gpfs filter is aiming for
being thin and fast and horizontally scalable.

The s3gw project that Clyso and folks were writing about is distinct from
both of these.  I *think* it's truthful to say that s3gw is its own
thing--a hybrid backing store with objects in files, but also metadata
atomicity from an embedded db--plus interesting orchestration.

Matt

On Mon, Mar 20, 2023 at 3:45 PM Chris MacNaughton <
chris.macnaugh...@canonical.com> wrote:


On 3/20/23 12:02, Frank Schilder wrote:

Hi Marc,

I'm also interested in an S3 service that uses a file system as a back-end. I looked at the 
documentation of 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faquarist-labs%2Fs3gw=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=nq5PtA585rwTIsKwtyuh2EYcCDMIu%2Bwry6%2BXh1GukKs%3D=0
 and have to say that it doesn't make much sense to me. I don't see this kind of gateway 
anywhere there. What I see is a build of a rados gateway that can be pointed at a ceph 
cluster. That's not a gateway to an FS.

Did I misunderstand your actual request or can you point me to the part of the 
documentation where it says how to spin up an S3 interface using a file system 
for user data?

The only thing I found is 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fs3gw-docs.readthedocs.io%2Fen%2Flatest%2Fhelm-charts%2F%23local-storage=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=1fr9aDJ3nqnB3RDDzsF6vpxzXN4961YRDQ%2BhHCdEC%2Bw%3D=0,
 but it sounds to me that this is not where the user data will be going.

Thanks for any hints and best regards,


for testing you can try: 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faquarist-labs%2Fs3gw=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=nq5PtA585rwTIsKwtyuh2EYcCDMIu%2Bwry6%2BXh1GukKs%3D=0

Yes indeed, that looks like it can be used with a simple fs backend.

Hey,

(Re-sending this email from a mailing-list subscribed email)

I was playing around with RadosGW's file backend (coming in Reef, zipper)
a few months back and ended up making this docker container that just works
to setup things:
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FChrisMacNaughton%2Fceph-rgw-docker=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Lu%2F9P50FHeInNkTkYUKQGzwDePnvkvcRR%2FmTOPdzeRE%3D=0;
 published (still,
maybe for a while?) at 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fr%2Ficeyec%2Fceph-rgw-zipper=05%7C01%7Ckevin.fox%40pnnl.gov%7C748fc400c7aa4d6e60db08db29a36b4b%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638149554103894808%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=WQI5wYhaP6XDTiR%2FcKvkAe7i6o4iBgATWVdr4zSBDRI%3D=0

Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-22 Thread Igor Fedotov

Hi Boris,

first of all I'm not sure if it's valid to compare two different 
clusters (pacific vs . nautilus, C1 vs. C2 respectively). The perf 
numbers difference might be caused by a bunch of other factors: 
different H/W, user load, network etc... I can see that you got ~2x 
latency increase after Octopus to Pacific upgrade at C1 but Octopus 
numbers had been much above Nautilus at C2 before the upgrade. Did you 
observe even lower numbers at C1 when it was running Nautilus if any?



You might want to try "ceph tell osd.N bench" to compare OSDs 
performance for both C1 and C2. Would it be that different?



Then redeploy a single OSD at C1, wait till rebalance completion and  
benchmark it again. What would be the new numbers? Please also collect 
perf counters from the to-be-redeployed OSD beforehand.


W.r.t. rocksdb warning - I presume this might be caused by newer RocksDB 
version running on top of DB with a legacy format.. Perhaps redeployment 
would fix that...



Thanks,

Igor

On 3/21/2023 5:31 PM, Boris Behrens wrote:

Hi Igor,
i've offline compacted all the OSDs and reenabled the bluefs_buffered_io

It didn't change anything and the commit and apply latencies are around
5-10 times higher than on our nautlus cluster. The pacific cluster got a 5
minute mean over all OSDs 2.2ms, while the nautilus cluster is around 0.2 -
0.7 ms.

I also see these kind of logs. Google didn't really help:
2023-03-21T14:08:22.089+ 7efe7b911700  3 rocksdb:
[le/block_based/filter_policy.cc:579] Using legacy Bloom filter with high
(20) bits/key. Dramatic filter space and/or accuracy improvement is
available with format_version>=5.




Am Di., 21. März 2023 um 10:46 Uhr schrieb Igor Fedotov <
igor.fedo...@croit.io>:


Hi Boris,

additionally you might want to manually compact RocksDB for every OSD.


Thanks,

Igor
On 3/21/2023 12:22 PM, Boris Behrens wrote:

Disabling the write cache and the bluefs_buffered_io did not change
anything.
What we see is that larger disks seem to be the leader in therms of
slowness (we have 70% 2TB, 20% 4TB and 10% 8TB SSDs in the cluster), but
removing some of the 8TB disks and replace them with 2TB (because it's by
far the majority and we have a lot of them) disks did also not change
anything.

Are there any other ideas I could try. Customer start to complain about the
slower performance and our k8s team mentions problems with ETCD because the
latency is too high.

Would it be an option to recreate every OSD?

Cheers
  Boris

Am Di., 28. Feb. 2023 um 22:46 Uhr schrieb Boris Behrens  
:


Hi Josh,
thanks a lot for the breakdown and the links.
I disabled the write cache but it didn't change anything. Tomorrow I will
try to disable bluefs_buffered_io.

It doesn't sound that I can mitigate the problem with more SSDs.


Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh 
Baergen:


Hi Boris,

OK, what I'm wondering is whetherhttps://tracker.ceph.com/issues/58530 is 
involved. There are two
aspects to that ticket:
* A measurable increase in the number of bytes written to disk in
Pacific as compared to Nautilus
* The same, but for IOPS

Per the current theory, both are due to the loss of rocksdb log
recycling when using default recovery options in rocksdb 6.8; Octopus
uses version 6.1.2, Pacific uses 6.8.1.

16.2.11 largely addressed the bytes-written amplification, but the
IOPS amplification remains. In practice, whether this results in a
write performance degradation depends on the speed of the underlying
media and the workload, and thus the things I mention in the next
paragraph may or may not be applicable to you.

There's no known workaround or solution for this at this time. In some
cases I've seen that disabling bluefs_buffered_io (which itself can
cause IOPS amplification in some cases) can help; I think most folks
do this by setting it in local conf and then restarting OSDs in order
to gain the config change. Something else to consider is
https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
,
as sometimes disabling these write caches can improve the IOPS
performance of SSDs.

Josh

On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens   
 wrote:

Hi Josh,
we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.



Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen <

jbaer...@digitalocean.com>:

Hi Boris,

Which version did you upgrade from and to, specifically? And what
workload are you running (RBD, etc.)?

Josh

On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens   
 wrote:

Hi,
today I did the first update from octopus to pacific, and it looks

like the

avg apply latency went up from 1ms to 2ms.

All 36 OSDs are 4TB SSDs and nothing else changed.
Someone knows if this is an issue, or am I just missing a config

value?

Cheers
  Boris
___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend

im 

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-22 Thread Boris Behrens
Might be. Josh also pointed in that direction. I currently search for ways
to mitigate it.

Am Mi., 22. März 2023 um 10:30 Uhr schrieb Konstantin Shalygin <
k0...@k0ste.ru>:

> Hi,
>
>
> Maybe [1] ?
>
>
> [1] https://tracker.ceph.com/issues/58530
> k
>
> On 22 Mar 2023, at 16:20, Boris Behrens  wrote:
>
> Are there any other ides?
>
>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-22 Thread Konstantin Shalygin
Hi,


Maybe [1] ?


[1] https://tracker.ceph.com/issues/58530
k

> On 22 Mar 2023, at 16:20, Boris Behrens  wrote:
> 
> Are there any other ides?
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Moving From BlueJeans to Jitsi for Ceph meetings

2023-03-22 Thread Rok Jaklič
We deployed jitsi for the public sector during covid and it is still free
to use.

https://vid.arnes.si/

---

However, the landing page is in Slovene language and for future
reservations you need an AAI (SSO) account (which you get if you are a part
of a public organization (school, faculty, ...).

We've noticed that for conferences of over >20 people it used to not work
well for some people (those with slow internet connection) but do not know
if this problem still exists.

Rok

On Wed, Mar 22, 2023 at 6:24 AM Alvaro Soto  wrote:

> +1 jitsi
>
> ---
> Alvaro Soto.
>
> Note: My work hours may not be your work hours. Please do not feel the need
> to respond during a time that is not convenient for you.
> --
> Great people talk about ideas,
> ordinary people talk about things,
> small people talk... about other people.
>
> On Tue, Mar 21, 2023, 2:02 PM Federico Lucifredi 
> wrote:
>
> > Jitsi is really good, and getting better — we have been using it with my
> > local User’s Group for the last couple of years.
> >
> > Only observation is to discover the maximum allowable number of guests in
> > advance if this is not already known - we had a fairly generous allowance
> > in BlueJeans accounts for Red Hat, Jitsi community accounts may not be as
> > large.
> >
> > Best-F
> >
> > > On Mar 21, 2023, at 12:26, Mike Perez  wrote:
> > >
> > > I'm not familiar with BBB myself. Are there any objections to Jitsi? I
> > > want to update the calendar invites this week.
> > >
> > >> On Thu, Mar 16, 2023 at 6:16 PM huxia...@horebdata.cn
> > >>  wrote:
> > >>
> > >> Besides Jitsi, another option would be BigBlueButton(BBB). Does anyone
> > know how BBB compares with Jitsi?
> > >>
> > >>
> > >>
> > >>
> > >> huxia...@horebdata.cn
> > >>
> > >> From: Mike Perez
> > >> Date: 2023-03-16 21:54
> > >> To: ceph-users
> > >> Subject: [ceph-users] Moving From BlueJeans to Jitsi for Ceph meetings
> > >> Hi everyone,
> > >>
> > >> We have been using BlueJeans to meet and record some of our meetings
> > >> that later get posted to our YouTube channel. Unfortunately, we have
> > >> to figure out a new meeting platform due to Red Hat discontinuing
> > >> BlueJeans by the end of this month.
> > >>
> > >> Google Meets is an option, but some users in other countries have
> > >> trouble using Google's services.
> > >>
> > >> For some meetings, we have tried out Jitsi, and it works well, meets
> > >> our requirements, and is free.
> > >>
> > >> Does anyone else have suggestions for another free meeting platform
> > >> that provides recording capabilities?
> > >>
> > >> --
> > >> Mike Perez
> > >> ___
> > >> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>
> > >> ___
> > >> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>
> > >
> > >
> > > --
> > > Mike Perez
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephalocon Amsterdam 2023 Photographer Volunteer + tld common sense

2023-03-22 Thread Marc
I just forwarded your message to a photographer in the Amsterdam area who might 
be able to help you out. 

Then I noticed your .foundation email address. I know the marketing people just 
love all the new weird extension being released, but think about this for a 1s.

When you propagate in communication to users the idea you use multiple tld's. 
These users will assume you use multiple tld's, and will not have a clue which 
tld's is yours and which is not. 
If you propagate in communication to users that your organisation is only using 
1 tld eg. ceph.com, than it is clear to users that everything else is not 
related to your company.
Being able to distinguish what company/people are behind ceph.idiots and 
ceph.com could be handy in some cases.

Thus logical reasons to stick to one domain:
1. operating one tld is clear, and safest. You can always get more to protect 
your trade mark, but just don't use the (not even forward them). 

2. your company will look more smart and professional, because you understood 
the concept mentioned at 1.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS host in OSD blacklist

2023-03-22 Thread Frank Schilder
Dear Xiubo,

thanks for that link. It seems like its a harmless issue. I believe I have seen 
a blocked OP in the ceph warnings for this MDS and was quite happy it restarted 
by itself. Looks like its a very rare race condition and does not lead to data 
loss or corruption.

In a situation like this, is it normal that the MDS host is blacklisted? The 
MDS reconnected just fine. Is it the MDS client ID of the crashed MDS that is 
blocked? I can't see anything that is denied access.

Thanks for your reply and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: 22 March 2023 07:27:08
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] MDS host in OSD blacklist

Hi Frank,

This should be the same issue with
https://tracker.ceph.com/issues/49132, which has been fixed.

Thanks

- Xiubo

On 21/03/2023 23:32, Frank Schilder wrote:
> Hi all,
>
> we have an octopus v15.2.17 cluster and observe that one of our MDS hosts 
> showed up in the OSD blacklist:
>
> # ceph osd blacklist ls
> 192.168.32.87:6801/3841823949 2023-03-22T10:08:02.589698+0100
> 192.168.32.87:6800/3841823949 2023-03-22T10:08:02.589698+0100
>
> I see an MDS restart that might be related; see log snippets below. There are 
> no clients running on this host, only OSDs and one MDS. What could be the 
> reason for the blacklist entries?
>
> Thanks!
>
> Log snippets:
>
> Mar 21 10:07:54 ceph-23 journal: 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
>  In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 
> 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
> Mar 21 10:07:54 ceph-23 journal: 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
>  59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
> Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
> (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
> Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, 
> char const*, int, char const*)+0x158) [0x7f99f4a25b92]
> Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
> Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, 
> LogSegment*)+0x32c) [0x561bd623962c]
> Mar 21 10:07:54 ceph-23 journal: 4: 
> (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
> Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) 
> [0x561bd6422656]
> Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) 
> [0x561bd6422b5c]
> Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) 
> [0x561bd6422cb4]
> Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) 
> [0x7f99f4ab6a95]
> Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
> Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
> Mar 21 10:07:54 ceph-23 journal: *** Caught signal (Aborted) **
> Mar 21 10:07:54 ceph-23 journal: in thread 7f99e63d5700 
> thread_name:MR_Finisher
> Mar 21 10:07:54 ceph-23 journal: 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
>  In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 
> 7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
> Mar 21 10:07:54 ceph-23 journal: 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
>  59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
> Mar 21 10:07:54 ceph-23 journal:
> Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
> (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
> Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, 
> char const*, int, char const*)+0x158) [0x7f99f4a25b92]
> Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
> Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, 
> LogSegment*)+0x32c) [0x561bd623962c]
> Mar 21 10:07:54 ceph-23 journal: 4: 
> (C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
> Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) 
> [0x561bd6422656]
> Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) 
> [0x561bd6422b5c]
> Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) 
> [0x561bd6422cb4]
> Mar 21 10:07:54 ceph-23 journal: 8: 

[ceph-users] Re: MDS host in OSD blacklist

2023-03-22 Thread Xiubo Li

Hi Frank,

This should be the same issue with 
https://tracker.ceph.com/issues/49132, which has been fixed.


Thanks

- Xiubo

On 21/03/2023 23:32, Frank Schilder wrote:

Hi all,

we have an octopus v15.2.17 cluster and observe that one of our MDS hosts 
showed up in the OSD blacklist:

# ceph osd blacklist ls
192.168.32.87:6801/3841823949 2023-03-22T10:08:02.589698+0100
192.168.32.87:6800/3841823949 2023-03-22T10:08:02.589698+0100

I see an MDS restart that might be related; see log snippets below. There are 
no clients running on this host, only OSDs and one MDS. What could be the 
reason for the blacklist entries?

Thanks!

Log snippets:

Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 
7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
(8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, 
LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 4: 
(C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) 
[0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) 
[0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) 
[0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) 
[0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal: *** Caught signal (Aborted) **
Mar 21 10:07:54 ceph-23 journal: in thread 7f99e63d5700 thread_name:MR_Finisher
Mar 21 10:07:54 ceph-23 journal: 2023-03-21T10:07:54.980+0100 7f99e63d5700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 
7f99e63d5700 time 2023-03-21T10:07:54.967936+0100
Mar 21 10:07:54 ceph-23 journal: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/mds/ScatterLock.h:
 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)
Mar 21 10:07:54 ceph-23 journal:
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
(8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*)+0x158) [0x7f99f4a25b92]
Mar 21 10:07:54 ceph-23 journal: 2: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 3: (MDCache::truncate_inode(CInode*, 
LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 4: 
(C_MDS_inode_update_finish::finish(int)+0x133) [0x561bd6210a83]
Mar 21 10:07:54 ceph-23 journal: 5: (MDSContext::complete(int)+0x56) 
[0x561bd6422656]
Mar 21 10:07:54 ceph-23 journal: 6: (MDSIOContextBase::complete(int)+0x39c) 
[0x561bd6422b5c]
Mar 21 10:07:54 ceph-23 journal: 7: (MDSLogContextBase::complete(int)+0x44) 
[0x561bd6422cb4]
Mar 21 10:07:54 ceph-23 journal: 8: (Finisher::finisher_thread_entry()+0x1a5) 
[0x7f99f4ab6a95]
Mar 21 10:07:54 ceph-23 journal: 9: (()+0x81ca) [0x7f99f35fb1ca]
Mar 21 10:07:54 ceph-23 journal: 10: (clone()+0x43) [0x7f99f204ddd3]
Mar 21 10:07:54 ceph-23 journal:
Mar 21 10:07:54 ceph-23 journal: ceph version 15.2.17 
(8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
Mar 21 10:07:54 ceph-23 journal: 1: (()+0x12ce0) [0x7f99f3605ce0]
Mar 21 10:07:54 ceph-23 journal: 2: (gsignal()+0x10f) [0x7f99f2062a9f]
Mar 21 10:07:54 ceph-23 journal: 3: (abort()+0x127) [0x7f99f2035e05]
Mar 21 10:07:54 ceph-23 journal: 4: (ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*)+0x1a9) [0x7f99f4a25be3]
Mar 21 10:07:54 ceph-23 journal: 5: (()+0x27ddac) [0x7f99f4a25dac]
Mar 21 10:07:54 ceph-23 journal: 6: (MDCache::truncate_inode(CInode*, 
LogSegment*)+0x32c) [0x561bd623962c]
Mar 21 10:07:54 ceph-23 journal: 7: