[ceph-users] Re: data corruption after rbd migration

2023-11-03 Thread Nikola Ciprich
Hello Jaroslav,

thank you for you reply.. 

> I found your info a bit confusing. The first command suggests that the VM
> is shut down and later you are talking about live migration. So how are you
> migrating data online or offline?
online, the VM is started after migration prepare command:

..
> > rbd migration prepare ssd/D1 sata/D1Z
> > virsh create xml_new.xml
..

> 
> In the case of live migration, I would suggest looking at the
> fsfreeze(proxmox use it) command.

I don't think this is related, I'm not concerned about FS consistency
during snapshotting, but the fact, that checksums of the snapshot which
should be same differs after migration.. to clarify the usecase, we've
noticed backups corruption for volumes which were migrated between pools
during backups (using snapshots and moving of them to other (SATA) pool)

BR

nik




> 
> Hope it helps!
> 
> Best Regards,
> 
> Jaroslav Shejbal
> 
> 
> 
> pá 3. 11. 2023 v 9:08 odesílatel Nikola Ciprich 
> napsal:
> 
> > Dear ceph users and developers,
> >
> > we're struggling with strange issue which I think might be a bug
> > causing snapshot data corruption while migrating RBD image
> >
> > we've tracked it to minimal set of steps to reproduce using VM
> > with one 32G drive:
> >
> > rbd create --size 32768 sata/D2
> > virsh create xml_orig.xml
> > rbd snap create ssd/D1@snap1
> > rbd export-diff ssd/D1@snap1 - | rbd import-diff - sata/D2
> > rbd export --export-format 1 --no-progress ssd/D1@snap1 - | xxh64sum
> > 505dde3c49775773
> > rbd export --export-format 1 --no-progress sata/D2@snap1 - | xxh64sum
> > 505dde3c49775773  # <- checksums match - OK
> >
> > virsh shutdown VM
> > rbd migration prepare ssd/D1 sata/D1Z
> > virsh create xml_new.xml
> > rbd snap create sata/D1Z@snap2
> > rbd export-diff --from-snap snap1 sata/D1Z@snap2 - | rbd import-diff -
> > sata/D2
> > rbd migration execute sata/D1Z
> > rbd migration commit sata/D1Z
> > rbd export --export-format 1 --no-progress sata/D1Z@snap2 - | xxh64sum
> > 19892545c742c1de
> > rbd export --export-format 1 --no-progress sata/D2@snap2 - | xxh64sum
> > cc045975baf74ba8 # <- snapshosts differ
> >
> > OS is alma 9 based, kernel 5.15.105, CEPH 17.2.6, qemu-8.0.3
> > we tried disabling VM disk caches as well as discard, to no avail.
> >
> > my first question is, is it correct to assume creating snapshots while live
> > migrating data is safe? if so, any ideas on where the problem could be?
> >
> > If I could provide more info, please let me know
> >
> > with regards
> >
> > nikola ciprich
> >
> >
> >
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:    +420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] data corruption after rbd migration

2023-11-03 Thread Nikola Ciprich
Dear ceph users and developers,

we're struggling with strange issue which I think might be a bug
causing snapshot data corruption while migrating RBD image

we've tracked it to minimal set of steps to reproduce using VM
with one 32G drive:

rbd create --size 32768 sata/D2
virsh create xml_orig.xml
rbd snap create ssd/D1@snap1
rbd export-diff ssd/D1@snap1 - | rbd import-diff - sata/D2
rbd export --export-format 1 --no-progress ssd/D1@snap1 - | xxh64sum
505dde3c49775773
rbd export --export-format 1 --no-progress sata/D2@snap1 - | xxh64sum
505dde3c49775773  # <- checksums match - OK

virsh shutdown VM
rbd migration prepare ssd/D1 sata/D1Z
virsh create xml_new.xml
rbd snap create sata/D1Z@snap2
rbd export-diff --from-snap snap1 sata/D1Z@snap2 - | rbd import-diff - sata/D2
rbd migration execute sata/D1Z
rbd migration commit sata/D1Z
rbd export --export-format 1 --no-progress sata/D1Z@snap2 - | xxh64sum
19892545c742c1de
rbd export --export-format 1 --no-progress sata/D2@snap2 - | xxh64sum
cc045975baf74ba8 # <- snapshosts differ

OS is alma 9 based, kernel 5.15.105, CEPH 17.2.6, qemu-8.0.3
we tried disabling VM disk caches as well as discard, to no avail.

my first question is, is it correct to assume creating snapshots while live
migrating data is safe? if so, any ideas on where the problem could be?

If I could provide more info, please let me know

with regards

nikola ciprich



-- 
---------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-23 Thread Nikola Ciprich
Hello Igor,

just reporting, that since last restart (after reverting changed values
to their defaults) the performance hasn't decreased (and it's been over
two weeks now). So either it helped after all, or the drop is caused
by something else I'll yet have to figure out.. we've automated the test
so once the performance drops beyond threshold, I'll know it and
investigate further (and report)

cheers

with regards

nik



On Wed, May 10, 2023 at 07:36:06AM +0200, Nikola Ciprich wrote:
> Hello Igor,
> > You didn't reset the counters every hour, do you? So having average
> > subop_w_latency growing that way means the current values were much higher
> > than before.
> 
> bummer, I didn't.. I've updated gather script to reset stats, wait 10m and 
> then
> gather perf data, each hour. It's running since yesterday, so now we'll have 
> to wait
> about one week for the problem to appear again..
> 
> 
> > 
> > Curious if subop latencies were growing for every OSD or just a subset (may
> > be even just a single one) of them?
> since I only have long time averaga, it's not easy to say, but based on what 
> we have:
> 
> only two OSDs avg got sub_w_lat > 0.0006. no clear relation between them
> 19 OSDs got avg sub_w_lat > 0.0005 - this is more interesting - 15 out of them
> are on those later installed nodes (note that those nodes have almost no VMs 
> running
> so they are much less used!) 4 are on other nodes. but also note, that not all
> of OSDs on suspicious nodes are over the threshold, it's 6, 6 and 3 out of 7 
> OSDs
> on the node. but still it's strange..
> 
> > 
> > 
> > Next time you reach the bad state please do the following if possible:
> > 
> > - reset perf counters for every OSD
> > 
> > -  leave the cluster running for 10 mins and collect perf counters again.
> > 
> > - Then start restarting OSD one-by-one starting with the worst OSD (in terms
> > of subop_w_lat from the prev step). Wouldn't be sufficient to reset just a
> > few OSDs before the cluster is back to normal?
> 
> will do once it slows down again.
> 
> 
> > > 
> > > I see very similar crash reported 
> > > here:https://tracker.ceph.com/issues/56346
> > > so I'm not reporting..
> > > 
> > > Do you think this might somehow be the cause of the problem? Anything 
> > > else I should
> > > check in perf dumps or elsewhere?
> > 
> > Hmm... don't know yet. Could you please last 20K lines prior the crash from
> > e.g two sample OSDs?
> 
> https://storage.linuxbox.cz/index.php/s/o5bMaGMiZQxWadi
> 
> > 
> > And the crash isn't permanent, OSDs are able to start after the second(?)
> > shot, aren't they?
> yes, actually they start after issuing systemctl ceph-osd@xx restart, it just 
> takes
> long time performing log recovery..
> 
> If I can provide more info, please let me know
> 
> BR
> 
> nik
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-09 Thread Nikola Ciprich
Hello Igor,
> You didn't reset the counters every hour, do you? So having average
> subop_w_latency growing that way means the current values were much higher
> than before.

bummer, I didn't.. I've updated gather script to reset stats, wait 10m and then
gather perf data, each hour. It's running since yesterday, so now we'll have to 
wait
about one week for the problem to appear again..


> 
> Curious if subop latencies were growing for every OSD or just a subset (may
> be even just a single one) of them?
since I only have long time averaga, it's not easy to say, but based on what we 
have:

only two OSDs avg got sub_w_lat > 0.0006. no clear relation between them
19 OSDs got avg sub_w_lat > 0.0005 - this is more interesting - 15 out of them
are on those later installed nodes (note that those nodes have almost no VMs 
running
so they are much less used!) 4 are on other nodes. but also note, that not all
of OSDs on suspicious nodes are over the threshold, it's 6, 6 and 3 out of 7 
OSDs
on the node. but still it's strange..

> 
> 
> Next time you reach the bad state please do the following if possible:
> 
> - reset perf counters for every OSD
> 
> -  leave the cluster running for 10 mins and collect perf counters again.
> 
> - Then start restarting OSD one-by-one starting with the worst OSD (in terms
> of subop_w_lat from the prev step). Wouldn't be sufficient to reset just a
> few OSDs before the cluster is back to normal?

will do once it slows down again.


> > 
> > I see very similar crash reported here:https://tracker.ceph.com/issues/56346
> > so I'm not reporting..
> > 
> > Do you think this might somehow be the cause of the problem? Anything else 
> > I should
> > check in perf dumps or elsewhere?
> 
> Hmm... don't know yet. Could you please last 20K lines prior the crash from
> e.g two sample OSDs?

https://storage.linuxbox.cz/index.php/s/o5bMaGMiZQxWadi

> 
> And the crash isn't permanent, OSDs are able to start after the second(?)
> shot, aren't they?
yes, actually they start after issuing systemctl ceph-osd@xx restart, it just 
takes
long time performing log recovery..

If I can provide more info, please let me know

BR

nik

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-08 Thread Nikola Ciprich
Hello Igor,

so I was checking the performance every day since Tuesday.. every day it seemed
to be the same - ~ 60-70kOPS on random write from single VM
yesterday it finally dropped to 20kOPS
today to 10kOPS. I also tried with newly created volume, the result (after 
prefill)
is the same, so it doesn't make any difference..

so I reverted all mentioned options to their defaults and restarted all OSDs.
performance immediately returned to better values (I suppose this is again
caused by the restart only)

good news is, that setting osd_fast_shutdown_timeout to 0 really helped with
OSD crashes during restarts, which speeds it up a lot.. but I have some new
crashes, more on this later..

> > I'd suggest to start monitoring perf counters for your osds.
> > op_w_lat/subop_w_lat ones specifically. I presume they raise eventually,
> > don't they?
> OK, starting collecting those for all OSDs..
I have hour samples of all OSDs perf dumps loaded in DB, so I can easily 
examine,
sort, whatever..


> 
> currently values for avgtime are around 0.0003 for subop_w_lat and 0.001-0.002
> for op_w_lat
OK, so there is no visible trend on op_w_lat, still between 0.001 and 0.002

subop_w_lat seems to have increased since yesterday though! I see values from
0.0004 to as high as 0.001

If some other perf data might be interesting, please let me know..

During OSD restarts, I noticed strange thing - restarts on first 6 machines
went smooth, but then on another 3, I saw rocksdb logs recovery on all SSD
OSDs. but first didn't see any mention of daemon crash in ceph -s

later, crash info appeared, but only about 3 daemons (in total, at least 20
of them crashed though)

crash report was similar for all three OSDs:

[root@nrbphav4a ~]# ceph crash info 
2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
{
"backtrace": [
"/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
"(BlueStore::_txc_create(BlueStore::Collection*, 
BlueStore::OpSequencer*, std::__cxx11::list 
>*, boost::intrusive_ptr)+0x413) [0x55a1c9d07c43]",

"(BlueStore::queue_transactions(boost::intrusive_ptr&,
 std::vector >&, 
boost::intrusive_ptr, ThreadPool::TPHandle*)+0x22b) 
[0x55a1c9d27e9b]",
"(ReplicatedBackend::submit_transaction(hobject_t const&, 
object_stat_sum_t const&, eversion_t const&, std::unique_ptr >&&, eversion_t const&, eversion_t const&, 
std::vector >&&, 
std::optional&, Context*, unsigned long, osd_reqid_t, 
boost::intrusive_ptr)+0x8ad) [0x55a1c9bbcfdd]",
"(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, 
PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",

"(PrimaryLogPG::simple_opc_submit(std::unique_ptr >)+0x57) [0x55a1c99d6777]",
"(PrimaryLogPG::handle_watch_timeout(std::shared_ptr)+0xb73) 
[0x55a1c99da883]",
"/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
"(CommonSafeTimer::timer_thread()+0x11a) [0x55a1c9e226aa]",
"/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
"/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
"/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
],
"ceph_version": "17.2.6",
"crash_id": 
"2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
"entity_name": "osd.98",
"os_id": "almalinux",
"os_name": "AlmaLinux",
"os_version": "9.0 (Emerald Puma)",
"os_version_id": "9.0",
"process_name": "ceph-osd",
"stack_sig": 
"b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66",
"timestamp": "2023-05-08T17:45:47.056675Z",
"utsname_hostname": "nrbphav4h",
"utsname_machine": "x86_64",
"utsname_release": "5.15.90lb9.01",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET 2023"
}


I was trying to figure out why this particular 3 nodes could behave differently
and found out from colleagues, that those 3 nodes were added to cluster lately
with direct install of 17.2.5 (others were installed 15.2.16 and later upgraded)

not sure whether this is related to our problem though..

I see very similar crash reported here: https://tracker.ceph.com/issues/56346
so I'm not reporting..

Do you think this might somehow be the cause of the problem? Anything else I 
should
check in perf dumps or elsewhere?

with best regards

nik






-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Nikola Ciprich
Hello Igor,

On Tue, May 02, 2023 at 05:41:04PM +0300, Igor Fedotov wrote:
> Hi Nikola,
> 
> I'd suggest to start monitoring perf counters for your osds.
> op_w_lat/subop_w_lat ones specifically. I presume they raise eventually,
> don't they?
OK, starting collecting those for all OSDs..

currently values for avgtime are around 0.0003 for subop_w_lat and 0.001-0.002
for op_w_lat

I guess it'll need some time to find some trend, so I'll check tomorrow


> 
> Does subop_w_lat grow for every OSD or just a subset of them? How large is
> the delta between the best and the worst OSDs after a one week period? How
> many "bad" OSDs are at this point?
I'll see and report

> 
> 
> And some more questions:
> 
> How large are space utilization/fragmentation for your OSDs?
OSD usage is around 16-18%. fragmentation should not be very bad, this
cluster is deployed for few months only


> 
> Is the same performance drop observed for artificial benchmarks, e.g. 4k
> random writes to a fresh RBD image using fio?
will check again when the slowdown occurs and report


> 
> Is there any RAM utilization growth for OSD processes over time? Or may be
> any suspicious growth in mempool stats?
nope, RAM usage seems to be pretty constant.

hewever, probably worh noting, historically we're using following OSD options:
ceph config set osd bluestore_rocksdb_options 
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB
ceph config set osd bluestore_cache_autotune 0
ceph config set osd bluestore_cache_size_ssd 2G
ceph config set osd bluestore_cache_kv_ratio 0.2
ceph config set osd bluestore_cache_meta_ratio 0.8
ceph config set osd osd_min_pg_log_entries 10
ceph config set osd osd_max_pg_log_entries 10
ceph config set osd osd_pg_log_dups_tracked 10
ceph config set osd osd_pg_log_trim_min 10

so maybe I'll start resetting those to defaults (ie enabling cache autotune etc)
as a first step..


> 
> 
> As a blind and brute force approach you might also want to compact RocksDB
> through ceph-kvstore-tool and switch bluestore allocator to bitmap
> (presuming default hybrid one is effective right now). Please do one
> modification at a time to realize what action is actually helpful if any.
will do..

thanks again for your hints

BR

nik


> 
> 
> Thanks,
> 
> Igor
> 
> On 5/2/2023 11:32 AM, Nikola Ciprich wrote:
> > Hello dear CEPH users and developers,
> > 
> > we're dealing with strange problems.. we're having 12 node alma linux 9 
> > cluster,
> > initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running 
> > bunch
> > of KVM virtual machines accessing volumes using RBD.
> > 
> > everything is working well, but there is strange and for us quite serious 
> > issue
> >   - speed of write operations (both sequential and random) is constantly 
> > degrading
> >   drastically to almost unusable numbers (in ~1week it drops from ~70k 4k 
> > writes/s
> >   from 1 VM  to ~7k writes/s)
> > 
> > When I restart all OSD daemons, numbers immediately return to normal..
> > 
> > volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
> > INTEL SSDPE2KX080T8 NVMEs.
> > 
> > I've updated cluster to 17.2.6 some time ago, but the problem persists. 
> > This is
> > especially annoying in connection with https://tracker.ceph.com/issues/56896
> > as restarting OSDs is quite painfull when half of them crash..
> > 
> > I don't see anything suspicious, nodes load is quite low, no logs errors,
> > network latency and throughput is OK too
> > 
> > Anyone having simimar issue?
> > 
> > I'd like to ask for hints on what should I check further..
> > 
> > we're running lots of 14.2.x and 15.2.x clusters, none showing similar
> > issue, so I'm suspecting this is something related to quincy
> > 
> > thanks a lot in advance
> > 
> > with best regards
> > 
> > nikola ciprich
> > 
> > 
> > 
> -- 
> Igor Fedotov
> Ceph Lead Developer
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https:/

[ceph-users] quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Nikola Ciprich
Hello dear CEPH users and developers,

we're dealing with strange problems.. we're having 12 node alma linux 9 cluster,
initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running bunch
of KVM virtual machines accessing volumes using RBD.

everything is working well, but there is strange and for us quite serious issue
 - speed of write operations (both sequential and random) is constantly 
degrading
 drastically to almost unusable numbers (in ~1week it drops from ~70k 4k 
writes/s
 from 1 VM  to ~7k writes/s)

When I restart all OSD daemons, numbers immediately return to normal..

volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
INTEL SSDPE2KX080T8 NVMEs.

I've updated cluster to 17.2.6 some time ago, but the problem persists. This is
especially annoying in connection with https://tracker.ceph.com/issues/56896
as restarting OSDs is quite painfull when half of them crash..

I don't see anything suspicious, nodes load is quite low, no logs errors,
network latency and throughput is OK too

Anyone having simimar issue?

I'd like to ask for hints on what should I check further..

we're running lots of 14.2.x and 15.2.x clusters, none showing similar
issue, so I'm suspecting this is something related to quincy

thanks a lot in advance

with best regards

nikola ciprich



-- 
---------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: EC pool OSDs getting erroneously "full" (15.2.15)

2022-04-20 Thread Nikola Ciprich
thanks for the tip on alternative balancer, I'll have a look at it
however I don't think the root of the problem is in improper balancing,
those 3 OSDs just should not be that full. I'll see how it gets when the
snaptrims finis, usage seems to go down by 0.01%/minute now..

I'll report the results later..


> If your clients allow (understand upmaps) you might yield better results
> with the balancer in upmap mode. Jonas Jelten made a nice balancer as well
> [1].
> 
> Gr. Stefan
> 
> [1]: https://github.com/TheJJ/ceph-balancer
> 

-- 
-----
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: EC pool OSDs getting erroneously "full" (15.2.15)

2022-04-20 Thread Nikola Ciprich
Hi Stefan,

all daemons are 15.2.15 (I'm considering doing update to 15.2.16 today)

> What do you have set as neafull ratio? ceph osd dump |grep nearfull.
nearfull is 0.87
> 
> Do you have the ceph balancer enabled? ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.000538",
"last_optimize_started": "Wed Apr 20 13:02:26 2022",
"mode": "crush-compat",
"optimize_result": "Some objects (0.130412) are degraded; try again later",
"plans": []
}

> What kind of maintenance was going on?
we were replacing failing memory module (according to IPMI log, all errors
were corrected though..)

> 
> Are the PGs on those OSDs *way* bigger than on those of the other nodes?
> ceph pg ls-by-osd $osd-id and check for bytes (and OMAP bytes). Only
> accurate information when PGs have been recently deep-scrubbed.
sizes seem to be ~similar (each pg is between 65-75GB), if I count sum of them,
it's almost twice as big for osd.5 as for osd.53-osd.55
it hasn't been scrubbed due to ongoing recovery though.. but the OMAP
sizes shouldn't make such a difference..

> 
> In this case the PG backfilltoofull warning(s) might have been correct.
> Yesterday though, I had no OSDs close to near full ratio and was getting the
> same PG backfilltoofull message ... previously seen due to this bug [1]. I
> could fix that by setting upmaps for the affacted PGs to another OSD.
warning is correct, but the usage value seems to be wrong..

what comes to my mind, there seem to be a lot of pgs waiting for snaptrims..
I'll keep it snaptrimming for some time and see if usage lowers...

> 
> > 
> > any idea on why could this be happening or what to check?
> 
> I helps to know what kind of maintenance was going on. Sometimes Ceph PG
> mappings are not what you want. There are ways to do maintenance in a more
> controlled fashion.

the maintenance itself wasn't ceph related, it shouldn't cause any PG 
movements..
one thing to note, I added SSD volume for all OSD DBs to speed up recovery, but
we've hat this problem before that, so I don't think this should be the 
culprit..

BR

nik

> 
> > 
> > thanks a lot in advance for hints..
> 
> Gr. Stefan
> 
> [1]: https://tracker.ceph.com/issues/39555
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] EC pool OSDs getting erroneously "full" (15.2.15)

2022-04-20 Thread Nikola Ciprich
Hi fellow ceph users and developers,

we've got into quite strange situation I'm not sure is
not a ceph bug..

we have 4 node CEPH cluster with multiple pools. one of them
is SATA EC 2+2 pool containting 4x3 10TB drives (one of tham
is actually 12TB)

one day, after planned downtime of fourth node, we got into strange
state where there seemed to be large amount of degraded PGs
to recover (we had noout set for the duration of downtime though)

the weird thing was, that OSDs of that node seemed to be almost full (ie
80%) while there were almost no PGs on them according to osd df tree
leading to backfilltoofull..

after some experimenting, I dropped those and recreated them, but during
the recovery, we got into the same state:


-31 120.0 -  112 TiB   81 TiB   80 TiB   36 GiB  456 GiB   
31 TiB  72.58  1.06-  root sata-archive
-32  30.0 -   29 TiB   20 TiB   20 TiB   10 GiB  133 GiB  
9.5 TiB  67.48  0.99-  host v1a-sata-archive
  5hdd   10.0   1.0  9.2 TiB  6.2 TiB  6.1 TiB  3.5 GiB   47 GiB  
3.0 TiB  67.78  0.99  171  up  osd.5
 10hdd   10.0   1.0  9.2 TiB  6.2 TiB  6.2 TiB  3.6 GiB   48 GiB  
2.9 TiB  68.06  1.00  171  up  osd.10   
 13hdd   10.0   1.0   11 TiB  7.3 TiB  7.3 TiB  3.2 GiB   38 GiB  
3.6 TiB  66.73  0.98  170  up  osd.13   
-33  30.0 -   27 TiB   19 TiB   18 TiB   11 GiB  139 GiB  
9.0 TiB  67.39  0.99-  host v1b-sata-archive
 19hdd   10.0   1.0  9.2 TiB  6.1 TiB  6.1 TiB  3.5 GiB   46 GiB  
3.0 TiB  67.11  0.98  171  up  osd.19   
 28hdd   10.0   1.0  9.2 TiB  6.1 TiB  6.0 TiB  3.5 GiB   46 GiB  
3.1 TiB  66.44  0.97  170  up  osd.28   
 29hdd   10.0   1.0  9.2 TiB  6.3 TiB  6.2 TiB  3.6 GiB   48 GiB  
2.9 TiB  68.61  1.00  171  up  osd.29   
-34  30.0 -   27 TiB   19 TiB   19 TiB   11 GiB  143 GiB  
8.6 TiB  68.65  1.00-  host v1c-sata-archive
 30hdd   10.0   1.0  9.2 TiB  6.3 TiB  6.2 TiB  3.8 GiB   48 GiB  
2.8 TiB  68.91  1.01  171  up  osd.30   
 31hdd   10.0   1.0  9.1 TiB  6.3 TiB  6.3 TiB  3.6 GiB   48 GiB  
2.8 TiB  69.20  1.01  171  up  osd.31   
 52hdd   10.0   1.0  9.1 TiB  6.2 TiB  6.1 TiB  3.4 GiB   46 GiB  
2.9 TiB  67.84  0.99  170  up  osd.52   
-35  30.0 -   27 TiB   24 TiB   24 TiB  4.0 GiB   41 GiB  
3.5 TiB  87.13  1.27-  host v1d-sata-archive
 53hdd   10.0   1.0  9.2 TiB  8.1 TiB  8.0 TiB  1.3 GiB   14 GiB  
1.0 TiB  88.54  1.29   81  up  osd.53   
 54hdd   10.0   1.0  9.2 TiB  8.3 TiB  8.2 TiB  1.4 GiB   14 GiB  
897 GiB  90.44  1.32   79  up  osd.54   
 55hdd   10.0   1.0  9.1 TiB  7.5 TiB  7.5 TiB  1.3 GiB   13 GiB  
1.6 TiB  82.39  1.21   62  up  osd.55   

the count of pgs on osd 53..55 is less then 1/2 of other OSDs but they are 
almost full. according
to weights, this should not happen..

any idea on why could this be happening or what to check?

thanks a lot in advance for hints..

with best regards

nikola ciprich





-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd daemons still reading disks at full speed while there is no pool activity

2021-11-09 Thread Nikola Ciprich
Hello Josh,

just wanted to confirm that setting bluefs_buffered_io immediately
helped hotfix the problem. I've also updated to 14.2.22 and we'll
discuss adding more NVME modules to move OSD databases out of spinners
to prevent further occurances

thanks a lot for your time!

with best regards

nikola ciprich

On Wed, Nov 03, 2021 at 09:11:20AM -0600, Josh Baergen wrote:
> Hi Nikola,
> 
> > yes, some nodes have stray pgs (1..5)  shell I do something about those?
> 
> No need to do anything - Ceph will clean those up itself (and is doing
> so right now). I just wanted to confirm my hunch.
> 
> Enabling buffered I/O should have an immediate effect on the read rate
> to your disks. I would recommend upgrading to 14.2.17+, though, as the
> improvements to PG cleaning are pretty substantial.
> 
> Josh
> 
> On Wed, Nov 3, 2021 at 8:13 AM Nikola Ciprich
>  wrote:
> >
> > Hello Josh,
> > >
> > > Was there PG movement (backfill) happening in this cluster recently?
> > > Do the OSDs have stray PGs (e.g. 'ceph daemon osd.NN perf dump | grep
> > > numpg_stray' - run this against an affected OSD from the housing
> > > node)?
> > yes, some nodes have stray pgs (1..5)  shell I do something about those?
> >
> >
> > >
> > > I'm wondering if you're running into
> > > https://tracker.ceph.com/issues/45765, where cleaning of PGs from OSDs
> > hmm, yes, this seems very familiar, problems started with using balancer,
> > forgot to mention that!
> >
> > > leads to a high read rate from disk due to a combination of rocksdb
> > > behaviour and caching issues. Turning on bluefs_buffered_io (on by
> > > default in 14.2.22) is a mitigation for this problem, but has some
> > > side effects to watch out for (write IOPS amplification, for one).
> > > Fixes for that linked issue went into 14.2.17, 14.2.22, and then
> > > Pacific; we found the 14.2.17 changes to be quite effective by
> > > themselves.
> > >
> > > Even if you don't have stray PGs, trying bluefs_buffered_io might be
> > > an interesting experiment. An alternative would be to compact rocksdb
> > > for each of your OSDs and see if that helps; compacting eliminates the
> > > tombstoned data that can cause problems during iteration, but if you
> > > have a workload that generates a lot of rocksdb tombstones (like PG
> > > cleaning does), then the problem will return a while after compaction.
> > >
> >
> > hmm, I'll try enabling bluefs_buffered_io (it was indeed false) and do the
> > compaction as well anyways..
> >
> > I'll report back, thanks for the hints!
> >
> > BR
> >
> > nik
> >
> >
> > > Josh
> > >
> >
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd daemons still reading disks at full speed while there is no pool activity

2021-11-03 Thread Nikola Ciprich
Hello Josh,
> 
> Was there PG movement (backfill) happening in this cluster recently?
> Do the OSDs have stray PGs (e.g. 'ceph daemon osd.NN perf dump | grep
> numpg_stray' - run this against an affected OSD from the housing
> node)?
yes, some nodes have stray pgs (1..5)  shell I do something about those?


> 
> I'm wondering if you're running into
> https://tracker.ceph.com/issues/45765, where cleaning of PGs from OSDs
hmm, yes, this seems very familiar, problems started with using balancer,
forgot to mention that!

> leads to a high read rate from disk due to a combination of rocksdb
> behaviour and caching issues. Turning on bluefs_buffered_io (on by
> default in 14.2.22) is a mitigation for this problem, but has some
> side effects to watch out for (write IOPS amplification, for one).
> Fixes for that linked issue went into 14.2.17, 14.2.22, and then
> Pacific; we found the 14.2.17 changes to be quite effective by
> themselves.
> 
> Even if you don't have stray PGs, trying bluefs_buffered_io might be
> an interesting experiment. An alternative would be to compact rocksdb
> for each of your OSDs and see if that helps; compacting eliminates the
> tombstoned data that can cause problems during iteration, but if you
> have a workload that generates a lot of rocksdb tombstones (like PG
> cleaning does), then the problem will return a while after compaction.
> 

hmm, I'll try enabling bluefs_buffered_io (it was indeed false) and do the
compaction as well anyways..

I'll report back, thanks for the hints!

BR

nik


> Josh
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd daemons still reading disks at full speed while there is no pool activity

2021-11-03 Thread Nikola Ciprich
Hello Eugen,

thank you for you reply. Yes, restarting all OSDs, monitors, also
increasing osd_map_cache_size to 5000 (this helped us in case
of problem with not pruning OSD maps). none of this helped..

with best regards

nik

On Wed, Nov 03, 2021 at 11:41:28AM +, Eugen Block wrote:
> Hi,
> 
> I don't have an explanation but I remember having a similar issue a
> year ago or so. IIRC a simple OSD restart fixed that, so I never got
> to the bottom of it. Have you tried to restart OSD daemons?
> 
> 
> Zitat von Nikola Ciprich :
> 
> >Hello fellow ceph users,
> >
> >I'm trying to catch ghost here.. On one of our clusters, 6 nodes,
> >14.2.15, EC pool 4+2, 6*32 SATA bluestore OSDs we got into very strange
> >state.
> >
> >The cluster is clean (except for pgs not deep-scrubbed in time warning,
> >since we've disabled scrubbing while investigating), there is absolutely
> >no activity on EC pool, but according to atop, all OSDs are still reading
> >furiously, without any apparent reason. even when increasing osd loglevel,
> >I don't see anything interesting, except for occasional
> >2021-11-03 12:04:52.664 7fb8652e3700  5 osd.0 9347 heartbeat
> >osd_stat(store_statfs(0xb80056c/0x26b57/0xe8d7fc0,
> >data 0x2f0ddd813e8/0x30b0ee6, compress 0x0/0x0/0x0, omap
> >0x98b706, meta 0x26abe48fa), peers 
> >[1,26,27,34,36,40,44,49,52,55,57,65,69,75,76,78,82,83,87,93,96,97,104,105,107,108,111,112,114,120,121,122,123,135,136,137,143,147,154,156,157,169,171,187,192,196,200,204,208,212,217,218,220,222,224,226,227]
> >op hist [])
> >and also compactions stats.
> >
> >trying to sequentially read data from the pool leads to very poor
> >performance (ie 8MB/s)
> >
> >We've had very similar problem on different cluster (replicated, no EC), when
> >osdmaps were not pruned correctly, but I checked and those seem to
> >be OK, it's just
> >OSD are still reading something and I'm unable to find out what.
> >
> >here's output of crush for one node, others are pretty similar:
> >
> > -1   2803.19824- 2.7 PiB  609 TiB  607 TiB 1.9 GiB
> >1.9 TiB 2.1 PiB 21.78 1.01   -root sata
> > -2467.19971- 466 TiB  102 TiB  101 TiB 320 MiB
> >328 GiB 364 TiB 21.83 1.01   -host spbstdv1a-sata
> >  0   hdd   14.5  1.0  15 TiB  3.1 TiB  3.0 TiB 9.5 MiB
> >9.7 GiB  12 TiB 20.98 0.97  51 up osd.0
> >  1   hdd   14.5  1.0  15 TiB  2.4 TiB  2.4 TiB 7.4 MiB
> >7.7 GiB  12 TiB 16.34 0.76  50 up osd.1
> >  2   hdd   14.5  1.0  15 TiB  3.5 TiB  3.5 TiB  11 MiB
> >11 GiB  11 TiB 24.33 1.13  51 up osd.2
> >  3   hdd   14.5  1.0  15 TiB  2.9 TiB  2.8 TiB 9.3 MiB
> >9.1 GiB  12 TiB 19.58 0.91  48 up osd.3
> >  4   hdd   14.5  1.0  15 TiB  3.3 TiB  3.3 TiB  11 MiB
> >11 GiB  11 TiB 22.94 1.06  51 up osd.4
> >  5   hdd   14.5  1.0  15 TiB  3.5 TiB  3.5 TiB  12 MiB
> >12 GiB  11 TiB 23.94 1.11  50 up osd.5
> >  6   hdd   14.5  1.0  15 TiB  2.8 TiB  2.8 TiB 9.6 MiB
> >9.6 GiB  12 TiB 19.11 0.89  49 up osd.6
> >  7   hdd   14.5  1.0  15 TiB  3.4 TiB  3.4 TiB 4.9 MiB
> >11 GiB  11 TiB 23.68 1.10  50 up osd.7
> >  8   hdd   14.59998  1.0  15 TiB  3.2 TiB  3.2 TiB  10 MiB
> >10 GiB  11 TiB 22.18 1.03  51 up osd.8
> >  9   hdd   14.5  1.0  15 TiB  3.4 TiB  3.4 TiB 4.9 MiB
> >11 GiB  11 TiB 23.52 1.09  50 up osd.9
> > 10   hdd   14.5  1.0  15 TiB  2.7 TiB  2.6 TiB 8.5 MiB
> >8.5 GiB  12 TiB 18.25 0.85  50 up osd.10
> > 11   hdd   14.5  1.0  15 TiB  3.4 TiB  3.3 TiB  10 MiB
> >11 GiB  11 TiB 23.02 1.07  51 up osd.11
> > 12   hdd   14.5  1.0  15 TiB  2.8 TiB  2.8 TiB  10 MiB
> >9.7 GiB  12 TiB 19.53 0.91  49 up osd.12
> > 13   hdd   14.5  1.0  15 TiB  3.7 TiB  3.7 TiB  11 MiB
> >12 GiB  11 TiB 25.62 1.19  49 up osd.13
> > 14   hdd   14.5  1.0  15 TiB  2.6 TiB  2.6 TiB 8.2 MiB
> >8.3 GiB  12 TiB 17.65 0.82  53 up osd.14
> > 15   hdd   14.5  1.0  15 TiB  2.5 TiB  2.5 TiB 7.6 MiB
> >7.8 GiB  12 TiB 17.42 0.81  50 up osd.15
> > 16   hdd   14.5  1.0  15 TiB  3.5 TiB  3.5 TiB  11 MiB
> >11 GiB  11 TiB 24.37 1.13  50 up osd.16
> > 17   hdd   14.5  1.0  15 TiB  3.5 TiB  3.5 TiB  12 MiB
> >12 GiB  11 TiB 24.09 1.12  52 up osd.17
> > 18   hdd   14.5  1.0  15 TiB  2.4 TiB  2.4 TiB 6.9 M

[ceph-users] osd daemons still reading disks at full speed while there is no pool activity

2021-11-03 Thread Nikola Ciprich
 GiB  11 
TiB 23.04 1.07  50 up osd.24
 25   hdd   14.5  1.0  15 TiB  3.1 TiB  3.1 TiB  10 MiB  9.9 GiB  11 
TiB 21.61 1.00  50 up osd.25
162   hdd   14.5  1.0  15 TiB  3.2 TiB  3.2 TiB  10 MiB   10 GiB  11 
TiB 21.76 1.01  50 up osd.162   
163   hdd   14.5  1.0  15 TiB  3.4 TiB  3.4 TiB  11 MiB   11 GiB  11 
TiB 23.60 1.09  50 up osd.163   
164   hdd   14.5  1.0  15 TiB  3.5 TiB  3.5 TiB  12 MiB   11 GiB  11 
TiB 24.38 1.13  51 up osd.164   
165   hdd   14.5  1.0  15 TiB  2.9 TiB  2.9 TiB 9.1 MiB  9.5 GiB  12 
TiB 20.18 0.94  50 up osd.165   
166   hdd   14.5  1.0  15 TiB  3.3 TiB  3.3 TiB  11 MiB   11 GiB  11 
TiB 22.62 1.05  50 up osd.166   
167   hdd   14.5  1.0  15 TiB  3.5 TiB  3.5 TiB  12 MiB   12 GiB  11 
TiB 24.36 1.13  52 up osd.167   

most of OSD settings are defaults, cache autotune, memory_target 4GB etc.

there is absolutely no activity on this (or any related) pool, just on one 
replicated, on different
drives, there are about 30MB/s writes. al lboxes are almost idle, have enough 
RAM. unfortunately
OSDs do not use any fast storage for WAL or any DB.

anyone met similar problem? Or somebody has hint on how to debug what are OSDs 
reading all the time?

I'd be very grateful

with best regards

nikola ciprich


-- 
---------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: can't get healthy cluster to trim osdmaps (13.2.8)

2020-03-23 Thread Nikola Ciprich
Hi Jan,

yes, I'm watching this TT as well, I'll post update there
(together with quick & dirty patch to get more debugging info)

BR

nik


On Mon, Mar 23, 2020 at 12:12:43PM +0100, Jan Fajerski wrote:
> https://tracker.ceph.com/issues/44184
> Looks similar, maybe you're also seeing other symptoms listed there?
> In any case would be good to track this in one place.
> 
> On Mon, Mar 23, 2020 at 11:29:53AM +0100, Nikola Ciprich wrote:
> >OK, so after some debugging, I've pinned the problem down to
> >OSDMonitor::get_trim_to:
> >
> >   std::lock_guard l(creating_pgs_lock);
> >   if (!creating_pgs.pgs.empty()) {
> > return 0;
> >   }
> >
> >apparently creating_pgs.pgs.empty() is not true, do I understand it
> >correctly that cluster thinks the list of creating pgs is not empty?
> >
> >all pgs are in clean+active state, so maybe there's something malformed
> >in the db? How can I check?
> >
> >I tried dumping list of creating_pgs according to
> >http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030297.html
> >but to no avail
> >
> >On Tue, Mar 17, 2020 at 12:25:29PM +0100, Nikola Ciprich wrote:
> >>Hello dear cephers,
> >>
> >>lately, there's been some discussion about slow requests hanging
> >>in "wait for new map" status. At least in my case, it's being caused
> >>by osdmaps not being properly trimmed. I tried all possible steps
> >>to force osdmap pruning (restarting mons, restarting everyging,
> >>poking crushmap), to no avail. Still all OSDs keep min osdmap version
> >>1, while newest is 4734. Otherwise cluster is healthy, with no down
> >>OSDs, network communication works flawlessly, all seems to be fine.
> >>Just can't get old osdmaps to go away.. I's very small cluster and I've
> >>moved all production traffic elsewhere, so I'm free to investigate
> >>and debug, however I'm out of ideas on what to try or where to look.
> >>
> >>Any ideas somebody please?
> >>
> >>The cluster is running 13.2.8
> >>
> >>I'd be very grateful for any tips
> >>
> >>with best regards
> >>
> >>nikola ciprich
> >>
> >>--
> >>-
> >>Ing. Nikola CIPRICH
> >>LinuxBox.cz, s.r.o.
> >>28.rijna 168, 709 00 Ostrava
> >>
> >>tel.:   +420 591 166 214
> >>fax:+420 596 621 273
> >>mobil:  +420 777 093 799
> >>www.linuxbox.cz
> >>
> >>mobil servis: +420 737 238 656
> >>email servis: ser...@linuxbox.cz
> >>-
> >>
> >
> >-- 
> >-
> >Ing. Nikola CIPRICH
> >LinuxBox.cz, s.r.o.
> >28.rijna 168, 709 00 Ostrava
> >
> >tel.:   +420 591 166 214
> >fax:+420 596 621 273
> >mobil:  +420 777 093 799
> >www.linuxbox.cz
> >
> >mobil servis: +420 737 238 656
> >email servis: ser...@linuxbox.cz
> >-
> >___
> >ceph-users mailing list -- ceph-users@ceph.io
> >To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> -- 
> Jan Fajerski
> Senior Software Engineer Enterprise Storage
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: can't get healthy cluster to trim osdmaps (13.2.8)

2020-03-23 Thread Nikola Ciprich
OK, to reply myself :-)

I wasn't very smart about decoding the output of "ceph-kvstore-tool get ..."
so I added dump of creating_pgs.pgs into get_trim_to function.

now I have the list of PGs which seem to be stuck in creating state
in monitors DB. If i query them, they're active+clean as I wrote.

I suppose I could remove them using ceph-kvstore-tool, right?

however I'd rather ask before I proceed:

is it safe to remove them from DB, if they all seem to be already created?

how do I do it? Stop all monitors, use the tool and start them again?
(I've moved all the services to other cluster, so this won't cause any outage)

I'd be very grateful for guidance here..

thanks in advance

BR

nik


On Mon, Mar 23, 2020 at 11:29:53AM +0100, Nikola Ciprich wrote:
> OK, so after some debugging, I've pinned the problem down to
> OSDMonitor::get_trim_to:
> 
> std::lock_guard l(creating_pgs_lock);
> if (!creating_pgs.pgs.empty()) {
>   return 0;
> }
> 
> apparently creating_pgs.pgs.empty() is not true, do I understand it
> correctly that cluster thinks the list of creating pgs is not empty?
> 
> all pgs are in clean+active state, so maybe there's something malformed
> in the db? How can I check?
> 
> I tried dumping list of creating_pgs according to
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030297.html
> but to no avail
> 
> On Tue, Mar 17, 2020 at 12:25:29PM +0100, Nikola Ciprich wrote:
> > Hello dear cephers,
> > 
> > lately, there's been some discussion about slow requests hanging
> > in "wait for new map" status. At least in my case, it's being caused
> > by osdmaps not being properly trimmed. I tried all possible steps
> > to force osdmap pruning (restarting mons, restarting everyging,
> > poking crushmap), to no avail. Still all OSDs keep min osdmap version
> > 1, while newest is 4734. Otherwise cluster is healthy, with no down
> > OSDs, network communication works flawlessly, all seems to be fine.
> > Just can't get old osdmaps to go away.. I's very small cluster and I've
> > moved all production traffic elsewhere, so I'm free to investigate
> > and debug, however I'm out of ideas on what to try or where to look.
> > 
> > Any ideas somebody please?
> > 
> > The cluster is running 13.2.8
> > 
> > I'd be very grateful for any tips
> > 
> > with best regards
> > 
> > nikola ciprich
> > 
> > -- 
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> > 
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> > 
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: can't get healthy cluster to trim osdmaps (13.2.8)

2020-03-23 Thread Nikola Ciprich
OK, so after some debugging, I've pinned the problem down to
OSDMonitor::get_trim_to:

std::lock_guard l(creating_pgs_lock);
if (!creating_pgs.pgs.empty()) {
  return 0;
}

apparently creating_pgs.pgs.empty() is not true, do I understand it
correctly that cluster thinks the list of creating pgs is not empty?

all pgs are in clean+active state, so maybe there's something malformed
in the db? How can I check?

I tried dumping list of creating_pgs according to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030297.html
but to no avail

On Tue, Mar 17, 2020 at 12:25:29PM +0100, Nikola Ciprich wrote:
> Hello dear cephers,
> 
> lately, there's been some discussion about slow requests hanging
> in "wait for new map" status. At least in my case, it's being caused
> by osdmaps not being properly trimmed. I tried all possible steps
> to force osdmap pruning (restarting mons, restarting everyging,
> poking crushmap), to no avail. Still all OSDs keep min osdmap version
> 1, while newest is 4734. Otherwise cluster is healthy, with no down
> OSDs, network communication works flawlessly, all seems to be fine.
> Just can't get old osdmaps to go away.. I's very small cluster and I've
> moved all production traffic elsewhere, so I'm free to investigate
> and debug, however I'm out of ideas on what to try or where to look.
> 
> Any ideas somebody please?
> 
> The cluster is running 13.2.8
> 
> I'd be very grateful for any tips
> 
> with best regards
> 
> nikola ciprich
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] can't get healthy cluster to trim osdmaps (13.2.8)

2020-03-17 Thread Nikola Ciprich
Hello dear cephers,

lately, there's been some discussion about slow requests hanging
in "wait for new map" status. At least in my case, it's being caused
by osdmaps not being properly trimmed. I tried all possible steps
to force osdmap pruning (restarting mons, restarting everyging,
poking crushmap), to no avail. Still all OSDs keep min osdmap version
1, while newest is 4734. Otherwise cluster is healthy, with no down
OSDs, network communication works flawlessly, all seems to be fine.
Just can't get old osdmaps to go away.. I's very small cluster and I've
moved all production traffic elsewhere, so I'm free to investigate
and debug, however I'm out of ideas on what to try or where to look.

Any ideas somebody please?

The cluster is running 13.2.8

I'd be very grateful for any tips

with best regards

nikola ciprich

-- 
-----
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-03-12 Thread Nikola Ciprich
Hi Dan,

nope, osdmap_first_commited is still 1, it must be some different issue..

I'll report when I have something..

n.


On Thu, Mar 12, 2020 at 04:07:26PM +0100, Dan van der Ster wrote:
> You have to wait 5 minutes or so after restarting the mon before it
> starts trimming.
> Otherwise, hmm, I'm not sure.
> 
> -- dan
> 
> On Thu, Mar 12, 2020 at 3:55 PM Nikola Ciprich
>  wrote:
> >
> > Hi Dan,
> >
> > # ceph report 2>/dev/null | jq .osdmap_first_committed
> > 1
> > # ceph report 2>/dev/null | jq .osdmap_last_committed
> > 4646
> >
> > seems like osdmap_first_committed doesn't change at all, restarting mons
> > doesn't help.. I don't have any down OSD, everything seems to be healty..
> >
> > BR
> >
> > nik
> >
> >
> >
> >
> > On Thu, Mar 12, 2020 at 03:23:25PM +0100, Dan van der Ster wrote:
> > > If untrimed osdmaps is related, then you should check:
> > > https://tracker.ceph.com/issues/37875, particularly #note6
> > >
> > > You can see what the mon thinks the valid range of osdmaps is:
> > >
> > > # ceph report | jq .osdmap_first_committed
> > > 113300
> > > # ceph report | jq .osdmap_last_committed
> > > 113938
> > >
> > > Then the workaround to start trimming is to restart the leader.
> > > This shrinks the range on the mon, which then starts telling the osds
> > > to trim range.
> > > Note that the OSDs will only trim 30 osdmaps for each new osdmap
> > > generated -- so if you have a lot of osdmaps to trim, you need to
> > > generate more.
> > >
> > > -- dan
> > >
> > >
> > > On Thu, Mar 12, 2020 at 11:02 AM Nikola Ciprich
> > >  wrote:
> > > >
> > > > OK,
> > > >
> > > > so I can confirm that at least in my case, the problem is caused
> > > > by old osd maps not being pruned for some reason, and thus not fitting
> > > > into cache. When I increased osd map cache to 5000 the problem is gone.
> > > >
> > > > The question is why they're not being pruned, even though the cluster 
> > > > is in
> > > > healthy state. But you can try checking:
> > > >
> > > > ceph daemon osd.X status to see how many maps are your OSDs storing
> > > > and ceph daemon osd.X perf dump | grep osd_map_cache_miss
> > > >
> > > > to see if you're experiencing similar problem..
> > > >
> > > > so I'm going to debug further..
> > > >
> > > > BR
> > > >
> > > > nik
> > > >
> > > > On Thu, Mar 12, 2020 at 09:16:58AM +0100, Nikola Ciprich wrote:
> > > > > Hi Paul and others,
> > > > >
> > > > > while digging deeper, I noticed that when the cluster gets into this
> > > > > state, osd_map_cache_miss on OSDs starts growing rapidly.. even when
> > > > > I increased osd map cache size to 500 (which was the default at least
> > > > > for luminous) it behaves the same..
> > > > >
> > > > > I think this could be related..
> > > > >
> > > > > I'll try playing more with cache settings..
> > > > >
> > > > > BR
> > > > >
> > > > > nik
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 11, 2020 at 03:40:04PM +0100, Paul Emmerich wrote:
> > > > > > Encountered this one again today, I've updated the issue with new
> > > > > > information: https://tracker.ceph.com/issues/44184
> > > > > >
> > > > > >
> > > > > > Paul
> > > > > >
> > > > > > --
> > > > > > Paul Emmerich
> > > > > >
> > > > > > Looking for help with your Ceph cluster? Contact us at 
> > > > > > https://croit.io
> > > > > >
> > > > > > croit GmbH
> > > > > > Freseniusstr. 31h
> > > > > > 81247 München
> > > > > > www.croit.io
> > > > > > Tel: +49 89 1896585 90
> > > > > >
> > > > > > On Sat, Feb 29, 2020 at 10:21 PM Nikola Ciprich
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I just wa

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-03-12 Thread Nikola Ciprich
Hi Dan,

# ceph report 2>/dev/null | jq .osdmap_first_committed
1
# ceph report 2>/dev/null | jq .osdmap_last_committed
4646

seems like osdmap_first_committed doesn't change at all, restarting mons
doesn't help.. I don't have any down OSD, everything seems to be healty..

BR

nik




On Thu, Mar 12, 2020 at 03:23:25PM +0100, Dan van der Ster wrote:
> If untrimed osdmaps is related, then you should check:
> https://tracker.ceph.com/issues/37875, particularly #note6
> 
> You can see what the mon thinks the valid range of osdmaps is:
> 
> # ceph report | jq .osdmap_first_committed
> 113300
> # ceph report | jq .osdmap_last_committed
> 113938
> 
> Then the workaround to start trimming is to restart the leader.
> This shrinks the range on the mon, which then starts telling the osds
> to trim range.
> Note that the OSDs will only trim 30 osdmaps for each new osdmap
> generated -- so if you have a lot of osdmaps to trim, you need to
> generate more.
> 
> -- dan
> 
> 
> On Thu, Mar 12, 2020 at 11:02 AM Nikola Ciprich
>  wrote:
> >
> > OK,
> >
> > so I can confirm that at least in my case, the problem is caused
> > by old osd maps not being pruned for some reason, and thus not fitting
> > into cache. When I increased osd map cache to 5000 the problem is gone.
> >
> > The question is why they're not being pruned, even though the cluster is in
> > healthy state. But you can try checking:
> >
> > ceph daemon osd.X status to see how many maps are your OSDs storing
> > and ceph daemon osd.X perf dump | grep osd_map_cache_miss
> >
> > to see if you're experiencing similar problem..
> >
> > so I'm going to debug further..
> >
> > BR
> >
> > nik
> >
> > On Thu, Mar 12, 2020 at 09:16:58AM +0100, Nikola Ciprich wrote:
> > > Hi Paul and others,
> > >
> > > while digging deeper, I noticed that when the cluster gets into this
> > > state, osd_map_cache_miss on OSDs starts growing rapidly.. even when
> > > I increased osd map cache size to 500 (which was the default at least
> > > for luminous) it behaves the same..
> > >
> > > I think this could be related..
> > >
> > > I'll try playing more with cache settings..
> > >
> > > BR
> > >
> > > nik
> > >
> > >
> > >
> > > On Wed, Mar 11, 2020 at 03:40:04PM +0100, Paul Emmerich wrote:
> > > > Encountered this one again today, I've updated the issue with new
> > > > information: https://tracker.ceph.com/issues/44184
> > > >
> > > >
> > > > Paul
> > > >
> > > > --
> > > > Paul Emmerich
> > > >
> > > > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > > >
> > > > croit GmbH
> > > > Freseniusstr. 31h
> > > > 81247 München
> > > > www.croit.io
> > > > Tel: +49 89 1896585 90
> > > >
> > > > On Sat, Feb 29, 2020 at 10:21 PM Nikola Ciprich
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I just wanted to report we've just hit very similar problem.. on mimic
> > > > > (13.2.6). Any manipulation with OSD (ie restart) causes lot of slow
> > > > > ops caused by waiting for new map. It seems those are slowed by SATA
> > > > > OSDs which keep being 100% busy reading for long time until all ops 
> > > > > are gone,
> > > > > blocking OPS on unrelated NVME pools - SATA pools are completely 
> > > > > unused now.
> > > > >
> > > > > is this possible that those maps are being requested from slow SATA 
> > > > > OSDs
> > > > > and it takes such a long time for some reason? why could it take so 
> > > > > long?
> > > > > the cluster is very small with very light load..
> > > > >
> > > > > BR
> > > > >
> > > > > nik
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Feb 19, 2020 at 10:03:35AM +0100, Wido den Hollander wrote:
> > > > > >
> > > > > >
> > > > > > On 2/19/20 9:34 AM, Paul Emmerich wrote:
> > > > > > > On Wed, Feb 19, 2020 at 7:26 AM Wido den Hollander 
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >>
> > > > > >

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-03-12 Thread Nikola Ciprich
OK,

so I can confirm that at least in my case, the problem is caused
by old osd maps not being pruned for some reason, and thus not fitting
into cache. When I increased osd map cache to 5000 the problem is gone.

The question is why they're not being pruned, even though the cluster is in
healthy state. But you can try checking:

ceph daemon osd.X status to see how many maps are your OSDs storing
and ceph daemon osd.X perf dump | grep osd_map_cache_miss

to see if you're experiencing similar problem..

so I'm going to debug further..

BR

nik

On Thu, Mar 12, 2020 at 09:16:58AM +0100, Nikola Ciprich wrote:
> Hi Paul and others,
> 
> while digging deeper, I noticed that when the cluster gets into this
> state, osd_map_cache_miss on OSDs starts growing rapidly.. even when
> I increased osd map cache size to 500 (which was the default at least
> for luminous) it behaves the same..
> 
> I think this could be related..
> 
> I'll try playing more with cache settings..
> 
> BR
> 
> nik
> 
> 
> 
> On Wed, Mar 11, 2020 at 03:40:04PM +0100, Paul Emmerich wrote:
> > Encountered this one again today, I've updated the issue with new
> > information: https://tracker.ceph.com/issues/44184
> > 
> > 
> > Paul
> > 
> > -- 
> > Paul Emmerich
> > 
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > 
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io
> > Tel: +49 89 1896585 90
> > 
> > On Sat, Feb 29, 2020 at 10:21 PM Nikola Ciprich
> >  wrote:
> > >
> > > Hi,
> > >
> > > I just wanted to report we've just hit very similar problem.. on mimic
> > > (13.2.6). Any manipulation with OSD (ie restart) causes lot of slow
> > > ops caused by waiting for new map. It seems those are slowed by SATA
> > > OSDs which keep being 100% busy reading for long time until all ops are 
> > > gone,
> > > blocking OPS on unrelated NVME pools - SATA pools are completely unused 
> > > now.
> > >
> > > is this possible that those maps are being requested from slow SATA OSDs
> > > and it takes such a long time for some reason? why could it take so long?
> > > the cluster is very small with very light load..
> > >
> > > BR
> > >
> > > nik
> > >
> > >
> > >
> > > On Wed, Feb 19, 2020 at 10:03:35AM +0100, Wido den Hollander wrote:
> > > >
> > > >
> > > > On 2/19/20 9:34 AM, Paul Emmerich wrote:
> > > > > On Wed, Feb 19, 2020 at 7:26 AM Wido den Hollander  
> > > > > wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 2/18/20 6:54 PM, Paul Emmerich wrote:
> > > > >>> I've also seen this problem on Nautilus with no obvious reason for 
> > > > >>> the
> > > > >>> slowness once.
> > > > >>
> > > > >> Did this resolve itself? Or did you remove the pool?
> > > > >
> > > > > I've seen this twice on the same cluster, it fixed itself the first
> > > > > time (maybe with some OSD restarts?) and the other time I removed the
> > > > > pool after a few minutes because the OSDs were running into heartbeat
> > > > > timeouts. There unfortunately seems to be no way to reproduce this :(
> > > > >
> > > >
> > > > Yes, that's the problem. I've been trying to reproduce it, but I can't.
> > > > It works on all my Nautilus systems except for this one.
> > > >
> > > > As you saw it, Bryan saw it, I expect others to encounter this at some
> > > > point as well.
> > > >
> > > > I don't have any extensive logging as this cluster is in production and
> > > > I can't simply crank up the logging and try again.
> > > >
> > > > > In this case it wasn't a new pool that caused problems but a very old 
> > > > > one.
> > > > >
> > > > >
> > > > > Paul
> > > > >
> > > > >>
> > > > >>> In my case it was a rather old cluster that was upgraded all the way
> > > > >>> from firefly
> > > > >>>
> > > > >>>
> > > > >>
> > > > >> This cluster has also been installed with Firefly. It was installed 
> > > > >> in
> > > > >>

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-03-12 Thread Nikola Ciprich
Hi Paul and others,

while digging deeper, I noticed that when the cluster gets into this
state, osd_map_cache_miss on OSDs starts growing rapidly.. even when
I increased osd map cache size to 500 (which was the default at least
for luminous) it behaves the same..

I think this could be related..

I'll try playing more with cache settings..

BR

nik



On Wed, Mar 11, 2020 at 03:40:04PM +0100, Paul Emmerich wrote:
> Encountered this one again today, I've updated the issue with new
> information: https://tracker.ceph.com/issues/44184
> 
> 
> Paul
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
> 
> On Sat, Feb 29, 2020 at 10:21 PM Nikola Ciprich
>  wrote:
> >
> > Hi,
> >
> > I just wanted to report we've just hit very similar problem.. on mimic
> > (13.2.6). Any manipulation with OSD (ie restart) causes lot of slow
> > ops caused by waiting for new map. It seems those are slowed by SATA
> > OSDs which keep being 100% busy reading for long time until all ops are 
> > gone,
> > blocking OPS on unrelated NVME pools - SATA pools are completely unused now.
> >
> > is this possible that those maps are being requested from slow SATA OSDs
> > and it takes such a long time for some reason? why could it take so long?
> > the cluster is very small with very light load..
> >
> > BR
> >
> > nik
> >
> >
> >
> > On Wed, Feb 19, 2020 at 10:03:35AM +0100, Wido den Hollander wrote:
> > >
> > >
> > > On 2/19/20 9:34 AM, Paul Emmerich wrote:
> > > > On Wed, Feb 19, 2020 at 7:26 AM Wido den Hollander  
> > > > wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 2/18/20 6:54 PM, Paul Emmerich wrote:
> > > >>> I've also seen this problem on Nautilus with no obvious reason for the
> > > >>> slowness once.
> > > >>
> > > >> Did this resolve itself? Or did you remove the pool?
> > > >
> > > > I've seen this twice on the same cluster, it fixed itself the first
> > > > time (maybe with some OSD restarts?) and the other time I removed the
> > > > pool after a few minutes because the OSDs were running into heartbeat
> > > > timeouts. There unfortunately seems to be no way to reproduce this :(
> > > >
> > >
> > > Yes, that's the problem. I've been trying to reproduce it, but I can't.
> > > It works on all my Nautilus systems except for this one.
> > >
> > > As you saw it, Bryan saw it, I expect others to encounter this at some
> > > point as well.
> > >
> > > I don't have any extensive logging as this cluster is in production and
> > > I can't simply crank up the logging and try again.
> > >
> > > > In this case it wasn't a new pool that caused problems but a very old 
> > > > one.
> > > >
> > > >
> > > > Paul
> > > >
> > > >>
> > > >>> In my case it was a rather old cluster that was upgraded all the way
> > > >>> from firefly
> > > >>>
> > > >>>
> > > >>
> > > >> This cluster has also been installed with Firefly. It was installed in
> > > >> 2015, so a while ago.
> > > >>
> > > >> Wido
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-02-29 Thread Nikola Ciprich
Hi,

I just wanted to report we've just hit very similar problem.. on mimic
(13.2.6). Any manipulation with OSD (ie restart) causes lot of slow
ops caused by waiting for new map. It seems those are slowed by SATA
OSDs which keep being 100% busy reading for long time until all ops are gone,
blocking OPS on unrelated NVME pools - SATA pools are completely unused now.

is this possible that those maps are being requested from slow SATA OSDs
and it takes such a long time for some reason? why could it take so long?
the cluster is very small with very light load..

BR

nik



On Wed, Feb 19, 2020 at 10:03:35AM +0100, Wido den Hollander wrote:
> 
> 
> On 2/19/20 9:34 AM, Paul Emmerich wrote:
> > On Wed, Feb 19, 2020 at 7:26 AM Wido den Hollander  wrote:
> >>
> >>
> >>
> >> On 2/18/20 6:54 PM, Paul Emmerich wrote:
> >>> I've also seen this problem on Nautilus with no obvious reason for the
> >>> slowness once.
> >>
> >> Did this resolve itself? Or did you remove the pool?
> > 
> > I've seen this twice on the same cluster, it fixed itself the first
> > time (maybe with some OSD restarts?) and the other time I removed the
> > pool after a few minutes because the OSDs were running into heartbeat
> > timeouts. There unfortunately seems to be no way to reproduce this :(
> > 
> 
> Yes, that's the problem. I've been trying to reproduce it, but I can't.
> It works on all my Nautilus systems except for this one.
> 
> As you saw it, Bryan saw it, I expect others to encounter this at some
> point as well.
> 
> I don't have any extensive logging as this cluster is in production and
> I can't simply crank up the logging and try again.
> 
> > In this case it wasn't a new pool that caused problems but a very old one.
> > 
> > 
> > Paul
> > 
> >>
> >>> In my case it was a rather old cluster that was upgraded all the way
> >>> from firefly
> >>>
> >>>
> >>
> >> This cluster has also been installed with Firefly. It was installed in
> >> 2015, so a while ago.
> >>
> >> Wido
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io