[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR
The things in "ceph orch ps" output are gathered by checking the contents of the /var/lib/ceph// directory on the host. Those "cephadm." files get deployed normally though, and aren't usually reported in "ceph orch ps" as it should only report things that are directories rather than files. You could maybe try going and removing them anyway to see what happens (cephadm should just deploy another one though). Would be interested anyway in what the contents of /var/lib/ceph// are on that srvcephprod07 node and also what "cephadm ls" spits out on that node (you would have to put a copy of the cephadm tool on the host to run that). As for the logs, the "cephadm.log" on the host is only the log of what the cephadm tool has done on that host, not what the cephadm mgr module is running. Could maybe try "ceph mgr fail; ceph -W cephadm" and let it sit for a bit to see if you get a traceback printout that way. On Fri, Mar 10, 2023 at 10:41 AM wrote: > looking at ceph orch upgrade check > I find out > }, > > "cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2": > { > "current_id": null, > "current_name": null, > "current_version": null > }, > > > Could this lead to the issue? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CephFS thrashing through the page cache
Also, I am able to reproduce the network read amplification when I try to do very small reads from larger files. e.g. for i in $(seq 1 1); do dd if=test_${i} of=/dev/null bs=5k count=10 done This piece of code generates a network traffic of 3.3 GB while it actually reads approx 500 MB of data. Thanks and Regards, Ashu Pachauri On Fri, Mar 10, 2023 at 9:22 PM Ashu Pachauri wrote: > We have an internal use case where we back the storage of a proprietary > database by a shared file system. We noticed something very odd when > testing some workload with a local block device backed file system vs > cephfs. We noticed that the amount of network IO done by cephfs is almost > double compared to the IO done in case of a local file system backed by an > attached block device. > > We also noticed that CephFS thrashes through the page cache very quickly > compared to the amount of data being read and think that the two issues > might be related. So, I wrote a simple test. > > 1. I wrote 10k files 400KB each using dd (approx 4 GB data). > 2. I dropped the page cache completely. > 3. I then read these files serially, again using dd. The page cache usage > shot up to 39 GB for reading such a small amount of data. > > Following is the code used to repro this in bash: > > for i in $(seq 1 1); do > dd if=/dev/zero of=test_${i} bs=4k count=100 > done > > sync; echo 1 > /proc/sys/vm/drop_caches > > for i in $(seq 1 1); do > dd if=test_${i} of=/dev/null bs=4k count=100 > done > > > The ceph version being used is: > ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus > (stable) > > The ceph configs being overriden: > WHO MASK LEVEL OPTION VALUE >RO > mon advanced auth_allow_insecure_global_id_reclaim false > > mgr advanced mgr/balancer/mode upmap > > mgr advanced mgr/dashboard/server_addr 127.0.0.1 >* > mgr advanced mgr/dashboard/server_port 8443 > * > mgr advanced mgr/dashboard/ssl false >* > mgr advanced mgr/prometheus/server_addr 0.0.0.0 >* > mgr advanced mgr/prometheus/server_port 9283 > * > osd advanced bluestore_compression_algorithmlz4 > > osd advanced bluestore_compression_mode > aggressive > osd advanced bluestore_throttle_bytes 536870912 > > osd advanced osd_max_backfills 3 > > osd advanced osd_op_num_threads_per_shard_ssd 8 >* > osd advanced osd_scrub_auto_repair true > > mds advanced client_oc false > > mds advanced client_readahead_max_bytes 4096 > > mds advanced client_readahead_max_periods 1 > > mds advanced client_readahead_min 0 > > mds basic mds_cache_memory_limit > 21474836480 > clientadvanced client_oc false > > clientadvanced client_readahead_max_bytes 4096 > > clientadvanced client_readahead_max_periods 1 > > clientadvanced client_readahead_min 0 > > clientadvanced fuse_disable_pagecache false > > > The cephfs mount options (note that readahead was disabled for this test): > /mnt/cephfs type ceph > (rw,relatime,name=cephfs,secret=,acl,rasize=0) > > Any help or pointers are appreciated; this is a major performance issue > for us. > > > Thanks and Regards, > Ashu Pachauri > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] CephFS thrashing through the page cache
We have an internal use case where we back the storage of a proprietary database by a shared file system. We noticed something very odd when testing some workload with a local block device backed file system vs cephfs. We noticed that the amount of network IO done by cephfs is almost double compared to the IO done in case of a local file system backed by an attached block device. We also noticed that CephFS thrashes through the page cache very quickly compared to the amount of data being read and think that the two issues might be related. So, I wrote a simple test. 1. I wrote 10k files 400KB each using dd (approx 4 GB data). 2. I dropped the page cache completely. 3. I then read these files serially, again using dd. The page cache usage shot up to 39 GB for reading such a small amount of data. Following is the code used to repro this in bash: for i in $(seq 1 1); do dd if=/dev/zero of=test_${i} bs=4k count=100 done sync; echo 1 > /proc/sys/vm/drop_caches for i in $(seq 1 1); do dd if=test_${i} of=/dev/null bs=4k count=100 done The ceph version being used is: ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) The ceph configs being overriden: WHO MASK LEVEL OPTION VALUE RO mon advanced auth_allow_insecure_global_id_reclaim false mgr advanced mgr/balancer/mode upmap mgr advanced mgr/dashboard/server_addr 127.0.0.1 * mgr advanced mgr/dashboard/server_port 8443 * mgr advanced mgr/dashboard/ssl false * mgr advanced mgr/prometheus/server_addr 0.0.0.0 * mgr advanced mgr/prometheus/server_port 9283 * osd advanced bluestore_compression_algorithmlz4 osd advanced bluestore_compression_mode aggressive osd advanced bluestore_throttle_bytes 536870912 osd advanced osd_max_backfills 3 osd advanced osd_op_num_threads_per_shard_ssd 8 * osd advanced osd_scrub_auto_repair true mds advanced client_oc false mds advanced client_readahead_max_bytes 4096 mds advanced client_readahead_max_periods 1 mds advanced client_readahead_min 0 mds basic mds_cache_memory_limit 21474836480 clientadvanced client_oc false clientadvanced client_readahead_max_bytes 4096 clientadvanced client_readahead_max_periods 1 clientadvanced client_readahead_min 0 clientadvanced fuse_disable_pagecache false The cephfs mount options (note that readahead was disabled for this test): /mnt/cephfs type ceph (rw,relatime,name=cephfs,secret=,acl,rasize=0) Any help or pointers are appreciated; this is a major performance issue for us. Thanks and Regards, Ashu Pachauri ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] pg wait too long when osd restart
Hi all, osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by default, so, pg will wait 16s when osd restart in the worst case. This wait time is too long, client i/o can not be unacceptable. I think adjusting the osd_pool_default_read_lease_ratio to lower is a good way. Have any good suggestions about reduce pg wait time? Best Regard Yite Gu ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR
looking at ceph orch upgrade check I find out }, "cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2": { "current_id": null, "current_name": null, "current_version": null }, Could this lead to the issue? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR
I find out with ceph orch ps cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2 srvcephprod04 stopped4m ago - cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2 srvcephprod06 stopped4m ago - cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2 srvcephprod07 stopped4m ago - And cannot remove. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR
I cannnot find anything interesting in the cephadm.log now the error is HEALTH_ERR Module 'cephadm' has failed: 'cephadm' Idea how to fix it ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pg wait too long when osd restart
Hello, When you say "osd restart", what sort of restart are you referring to - planned (e.g. for upgrades or maintenance) or unplanned (OSD hang/crash, host issue, etc.)? If it's the former, then these parameters shouldn't matter provided that you're running a recent enough Ceph with default settings - it's supposed to handle planned restarts with little I/O wait time. There were some issues with this mechanism before Octopus 15.2.17 / Pacific 16.2.8 that could cause planned restarts to wait for the read lease timeout in some circumstances. Josh On Fri, Mar 10, 2023 at 1:31 AM yite gu wrote: > > Hi all, > osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by > default, so, pg will wait 16s when osd restart in the worst case. This wait > time is too long, client i/o can not be unacceptable. I think adjusting > the osd_pool_default_read_lease_ratio to lower is a good way. Have any good > suggestions about reduce pg wait time? > > Best Regard > Yite Gu > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] pg wait too long when osd restart
Hi all, osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by default, so, pg will wait 16s when osd restart in the worst case. This wait time is too long, client i/o can not be unacceptable. I think adjusting the osd_pool_default_read_lease_ratio to lower is a good way. Have any good suggestions about reduce pg wait time? Best Regard Yite Gu ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io