[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-07-24 Thread stachecki . tyler
Mark,

I ran into something similar to this recently while testing Quincy... I believe 
I see what happened here.

Based on the users information, the following non-default option was in use:
ceph config set osd bluestore_rocksdb_options
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB

This overriden option does not have the cap for WAL sizing, which is needed 
since the column family sharding was added to RocksDB:
https://github.com/ceph/ceph/pull/35277

Without that option specific as part of bluestore_rocksdb_options, Quincy will 
use ~100GiB WALs.

Everything works great until the WALs fill, and then the cluster begins caving 
in on itself progressively.

Cheers,
Tyler
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-07-18 Thread Gabriel Benhanokh
Greg,

You are correct - the timeout was used during debug to make sure the fast 
shutdown is indeed fast, but even if it completed after that timeout everything 
should be perfectly fine.

The timeout was set to 15 seconds which is more than enough to complete 
shutdown on a valid system (in reality we only need 1-2 seconds).

We can simply remove the assert (or maybe increase the timeout from 15s to 
30s), but that won't explain what is the we did so long during shutdown.
I suggest analyzing the shutdown process to find the root cause (which might be 
related to the overall system slowdown).

The OP describes a 10x slowdown in writes which might have been translated to a 
10X time during fast-shutdown from 2 seconds to 20 seconds (and so exceeding 
the 15 second timeout).
Other possibility is that the system was running with extremely high log level 
which is something I can address in the code by checking the log-level before 
asserting
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-23 Thread Mark Nelson

Hi Nikola,

Just to be clear, these were the settings that you changed back to the 
defaults?



Non-default settings are:

"bluestore_cache_size_hdd": {
"default": "1073741824",
"mon": "4294967296",
"final": "4294967296"
},
"bluestore_cache_size_ssd": {
"default": "3221225472",
"mon": "4294967296",
"final": "4294967296"
},
...
"osd_memory_cache_min": {
"default": "134217728",
"mon": "2147483648",
"final": "2147483648"
},
"osd_memory_target": {
"default": "4294967296",
"mon": "17179869184",
"final": "17179869184"
},
"osd_scrub_sleep": {
"default": 0,
"mon": 0.10001,
"final": 0.10001
},
"rbd_balance_parent_reads": {
"default": false,
"mon": true,
"final": true
},



Thanks,
Mark


On 5/23/23 12:17, Nikola Ciprich wrote:

Hello Igor,

just reporting, that since last restart (after reverting changed values
to their defaults) the performance hasn't decreased (and it's been over
two weeks now). So either it helped after all, or the drop is caused
by something else I'll yet have to figure out.. we've automated the test
so once the performance drops beyond threshold, I'll know it and
investigate further (and report)

cheers

with regards

nik



On Wed, May 10, 2023 at 07:36:06AM +0200, Nikola Ciprich wrote:

Hello Igor,

You didn't reset the counters every hour, do you? So having average
subop_w_latency growing that way means the current values were much higher
than before.


bummer, I didn't.. I've updated gather script to reset stats, wait 10m and then
gather perf data, each hour. It's running since yesterday, so now we'll have to 
wait
about one week for the problem to appear again..




Curious if subop latencies were growing for every OSD or just a subset (may
be even just a single one) of them?

since I only have long time averaga, it's not easy to say, but based on what we 
have:

only two OSDs avg got sub_w_lat > 0.0006. no clear relation between them
19 OSDs got avg sub_w_lat > 0.0005 - this is more interesting - 15 out of them
are on those later installed nodes (note that those nodes have almost no VMs 
running
so they are much less used!) 4 are on other nodes. but also note, that not all
of OSDs on suspicious nodes are over the threshold, it's 6, 6 and 3 out of 7 
OSDs
on the node. but still it's strange..




Next time you reach the bad state please do the following if possible:

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf counters again.

- Then start restarting OSD one-by-one starting with the worst OSD (in terms
of subop_w_lat from the prev step). Wouldn't be sufficient to reset just a
few OSDs before the cluster is back to normal?


will do once it slows down again.




I see very similar crash reported here:https://tracker.ceph.com/issues/56346
so I'm not reporting..

Do you think this might somehow be the cause of the problem? Anything else I 
should
check in perf dumps or elsewhere?


Hmm... don't know yet. Could you please last 20K lines prior the crash from
e.g two sample OSDs?


https://storage.linuxbox.cz/index.php/s/o5bMaGMiZQxWadi



And the crash isn't permanent, OSDs are able to start after the second(?)
shot, aren't they?

yes, actually they start after issuing systemctl ceph-osd@xx restart, it just 
takes
long time performing log recovery..

If I can provide more info, please let me know

BR

nik

--
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-





--
Best Regards,
Mark Nelson
Head of R (USA)

Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-23 Thread Nikola Ciprich
Hello Igor,

just reporting, that since last restart (after reverting changed values
to their defaults) the performance hasn't decreased (and it's been over
two weeks now). So either it helped after all, or the drop is caused
by something else I'll yet have to figure out.. we've automated the test
so once the performance drops beyond threshold, I'll know it and
investigate further (and report)

cheers

with regards

nik



On Wed, May 10, 2023 at 07:36:06AM +0200, Nikola Ciprich wrote:
> Hello Igor,
> > You didn't reset the counters every hour, do you? So having average
> > subop_w_latency growing that way means the current values were much higher
> > than before.
> 
> bummer, I didn't.. I've updated gather script to reset stats, wait 10m and 
> then
> gather perf data, each hour. It's running since yesterday, so now we'll have 
> to wait
> about one week for the problem to appear again..
> 
> 
> > 
> > Curious if subop latencies were growing for every OSD or just a subset (may
> > be even just a single one) of them?
> since I only have long time averaga, it's not easy to say, but based on what 
> we have:
> 
> only two OSDs avg got sub_w_lat > 0.0006. no clear relation between them
> 19 OSDs got avg sub_w_lat > 0.0005 - this is more interesting - 15 out of them
> are on those later installed nodes (note that those nodes have almost no VMs 
> running
> so they are much less used!) 4 are on other nodes. but also note, that not all
> of OSDs on suspicious nodes are over the threshold, it's 6, 6 and 3 out of 7 
> OSDs
> on the node. but still it's strange..
> 
> > 
> > 
> > Next time you reach the bad state please do the following if possible:
> > 
> > - reset perf counters for every OSD
> > 
> > -  leave the cluster running for 10 mins and collect perf counters again.
> > 
> > - Then start restarting OSD one-by-one starting with the worst OSD (in terms
> > of subop_w_lat from the prev step). Wouldn't be sufficient to reset just a
> > few OSDs before the cluster is back to normal?
> 
> will do once it slows down again.
> 
> 
> > > 
> > > I see very similar crash reported 
> > > here:https://tracker.ceph.com/issues/56346
> > > so I'm not reporting..
> > > 
> > > Do you think this might somehow be the cause of the problem? Anything 
> > > else I should
> > > check in perf dumps or elsewhere?
> > 
> > Hmm... don't know yet. Could you please last 20K lines prior the crash from
> > e.g two sample OSDs?
> 
> https://storage.linuxbox.cz/index.php/s/o5bMaGMiZQxWadi
> 
> > 
> > And the crash isn't permanent, OSDs are able to start after the second(?)
> > shot, aren't they?
> yes, actually they start after issuing systemctl ceph-osd@xx restart, it just 
> takes
> long time performing log recovery..
> 
> If I can provide more info, please let me know
> 
> BR
> 
> nik
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-10 Thread Igor Fedotov

Hey Zakhar,

You do need to restart OSDs to bring performance back to normal anyway, 
don't you? So yeah, we're not aware of better way so far - all the 
information I  have is from you and Nikola. And you both tell us about 
the need for restart.


Apparently there is no need to restart every OSD but "degraded/slow" 
ones only. We actually need to verify that. So please indicate the 
slowest OSDs (in terms of subop_w_lat) and do restart for them first. 
Hopefully just a fraction of your OSDs would require this.



Thanks,
Igor

On 5/10/2023 6:01 AM, Zakhar Kirpichenko wrote:
Thank you, Igor. I will try to see how to collect the perf values. Not 
sure about restarting all OSDs as it's a production cluster, is there 
a less invasive way?


/Z

On Tue, 9 May 2023 at 23:58, Igor Fedotov  wrote:

Hi Zakhar,

Let's leave questions regarding cache usage/tuning to a different
topic for now. And concentrate on performance drop.

Could you please do the same experiment I asked from Nikola once
your cluster reaches "bad performance" state (Nikola, could you
please use this improved scenario as well?):

- collect perf counters for every OSD

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf counters
again.

- Then restart OSDs one-by-one starting with the worst OSD (in
terms of subop_w_lat from the prev step). Wouldn't be sufficient
to reset just a few OSDs before the cluster is back to normal?

- if partial OSD restart is sufficient - please leave the
remaining OSDs run as-is without reboot.

- after the restart (no matter partial or complete one - the key
thing it's should successful) reset all the perf counters and
leave the cluster run for 30 mins and collect perf counters again.

- wait 24 hours and collect the counters one more time

- share all four counters snapshots.


Thanks,

Igor

On 5/8/2023 11:31 PM, Zakhar Kirpichenko wrote:

Don't mean to hijack the thread, but I may be observing something
similar with 16.2.12: OSD performance noticeably peaks after OSD
restart and then gradually reduces over 10-14 days, while commit
and apply latencies increase across the board.

Non-default settings are:

        "bluestore_cache_size_hdd": {
            "default": "1073741824",
            "mon": "4294967296",
            "final": "4294967296"
        },
        "bluestore_cache_size_ssd": {
            "default": "3221225472",
            "mon": "4294967296",
            "final": "4294967296"
        },
...
        "osd_memory_cache_min": {
            "default": "134217728",
            "mon": "2147483648",
            "final": "2147483648"
        },
        "osd_memory_target": {
            "default": "4294967296",
            "mon": "17179869184",
            "final": "17179869184"
        },
        "osd_scrub_sleep": {
            "default": 0,
            "mon": 0.10001,
            "final": 0.10001
        },
        "rbd_balance_parent_reads": {
            "default": false,
            "mon": true,
            "final": true
        },

All other settings are default, the usage is rather simple
Openstack / RBD.

I also noticed that OSD cache usage doesn't increase over time
(see my message "Ceph 16.2.12, bluestore cache doesn't seem to be
used much" dated 26 April 2023, which received no comments),
despite OSDs are being used rather heavily and there's plenty of
host and OSD cache / target memory available. It may be worth
checking if available memory is being used in a good way.

/Z

On Mon, 8 May 2023 at 22:35, Igor Fedotov 
wrote:

Hey Nikola,

On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
> OK, starting collecting those for all OSDs..
> I have hour samples of all OSDs perf dumps loaded in DB, so
I can easily examine,
> sort, whatever..
>
You didn't reset the counters every hour, do you? So having
average
subop_w_latency growing that way means the current values
were much
higher than before.

Curious if subop latencies were growing for every OSD or just
a subset
(may be even just a single one) of them?


Next time you reach the bad state please do the following if
possible:

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf
counters again.

- Then start restarting OSD one-by-one starting with the
worst OSD (in
terms of subop_w_lat from the prev step). Wouldn't be
sufficient to
reset just a few OSDs before the cluster is back to normal?

>> currently values for avgtime are around 0.0003 for
subop_w_lat 

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-09 Thread Nikola Ciprich
Hello Igor,
> You didn't reset the counters every hour, do you? So having average
> subop_w_latency growing that way means the current values were much higher
> than before.

bummer, I didn't.. I've updated gather script to reset stats, wait 10m and then
gather perf data, each hour. It's running since yesterday, so now we'll have to 
wait
about one week for the problem to appear again..


> 
> Curious if subop latencies were growing for every OSD or just a subset (may
> be even just a single one) of them?
since I only have long time averaga, it's not easy to say, but based on what we 
have:

only two OSDs avg got sub_w_lat > 0.0006. no clear relation between them
19 OSDs got avg sub_w_lat > 0.0005 - this is more interesting - 15 out of them
are on those later installed nodes (note that those nodes have almost no VMs 
running
so they are much less used!) 4 are on other nodes. but also note, that not all
of OSDs on suspicious nodes are over the threshold, it's 6, 6 and 3 out of 7 
OSDs
on the node. but still it's strange..

> 
> 
> Next time you reach the bad state please do the following if possible:
> 
> - reset perf counters for every OSD
> 
> -  leave the cluster running for 10 mins and collect perf counters again.
> 
> - Then start restarting OSD one-by-one starting with the worst OSD (in terms
> of subop_w_lat from the prev step). Wouldn't be sufficient to reset just a
> few OSDs before the cluster is back to normal?

will do once it slows down again.


> > 
> > I see very similar crash reported here:https://tracker.ceph.com/issues/56346
> > so I'm not reporting..
> > 
> > Do you think this might somehow be the cause of the problem? Anything else 
> > I should
> > check in perf dumps or elsewhere?
> 
> Hmm... don't know yet. Could you please last 20K lines prior the crash from
> e.g two sample OSDs?

https://storage.linuxbox.cz/index.php/s/o5bMaGMiZQxWadi

> 
> And the crash isn't permanent, OSDs are able to start after the second(?)
> shot, aren't they?
yes, actually they start after issuing systemctl ceph-osd@xx restart, it just 
takes
long time performing log recovery..

If I can provide more info, please let me know

BR

nik

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-09 Thread Zakhar Kirpichenko
Thank you, Igor. I will try to see how to collect the perf values. Not sure
about restarting all OSDs as it's a production cluster, is there a less
invasive way?

/Z

On Tue, 9 May 2023 at 23:58, Igor Fedotov  wrote:

> Hi Zakhar,
>
> Let's leave questions regarding cache usage/tuning to a different topic
> for now. And concentrate on performance drop.
>
> Could you please do the same experiment I asked from Nikola once your
> cluster reaches "bad performance" state (Nikola, could you please use this
> improved scenario as well?):
>
> - collect perf counters for every OSD
>
> - reset perf counters for every OSD
>
> -  leave the cluster running for 10 mins and collect perf counters again.
>
> - Then restart OSDs one-by-one starting with the worst OSD (in terms of
> subop_w_lat from the prev step). Wouldn't be sufficient to reset just a few
> OSDs before the cluster is back to normal?
>
> - if partial OSD restart is sufficient - please leave the remaining OSDs
> run as-is without reboot.
>
> - after the restart (no matter partial or complete one - the key thing
> it's should successful) reset all the perf counters and leave the cluster
> run for 30 mins and collect perf counters again.
>
> - wait 24 hours and collect the counters one more time
>
> - share all four counters snapshots.
>
>
> Thanks,
>
> Igor
>
> On 5/8/2023 11:31 PM, Zakhar Kirpichenko wrote:
>
> Don't mean to hijack the thread, but I may be observing something similar
> with 16.2.12: OSD performance noticeably peaks after OSD restart and then
> gradually reduces over 10-14 days, while commit and apply latencies
> increase across the board.
>
> Non-default settings are:
>
> "bluestore_cache_size_hdd": {
> "default": "1073741824",
> "mon": "4294967296",
> "final": "4294967296"
> },
> "bluestore_cache_size_ssd": {
> "default": "3221225472",
> "mon": "4294967296",
> "final": "4294967296"
> },
> ...
> "osd_memory_cache_min": {
> "default": "134217728",
> "mon": "2147483648",
> "final": "2147483648"
> },
> "osd_memory_target": {
> "default": "4294967296",
> "mon": "17179869184",
> "final": "17179869184"
> },
> "osd_scrub_sleep": {
> "default": 0,
> "mon": 0.10001,
> "final": 0.10001
> },
> "rbd_balance_parent_reads": {
> "default": false,
> "mon": true,
> "final": true
> },
>
> All other settings are default, the usage is rather simple Openstack /
> RBD.
>
> I also noticed that OSD cache usage doesn't increase over time (see my
> message "Ceph 16.2.12, bluestore cache doesn't seem to be used much" dated
> 26 April 2023, which received no comments), despite OSDs are being used
> rather heavily and there's plenty of host and OSD cache / target memory
> available. It may be worth checking if available memory is being used in a
> good way.
>
> /Z
>
> On Mon, 8 May 2023 at 22:35, Igor Fedotov  wrote:
>
>> Hey Nikola,
>>
>> On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
>> > OK, starting collecting those for all OSDs..
>> > I have hour samples of all OSDs perf dumps loaded in DB, so I can
>> easily examine,
>> > sort, whatever..
>> >
>> You didn't reset the counters every hour, do you? So having average
>> subop_w_latency growing that way means the current values were much
>> higher than before.
>>
>> Curious if subop latencies were growing for every OSD or just a subset
>> (may be even just a single one) of them?
>>
>>
>> Next time you reach the bad state please do the following if possible:
>>
>> - reset perf counters for every OSD
>>
>> -  leave the cluster running for 10 mins and collect perf counters again.
>>
>> - Then start restarting OSD one-by-one starting with the worst OSD (in
>> terms of subop_w_lat from the prev step). Wouldn't be sufficient to
>> reset just a few OSDs before the cluster is back to normal?
>>
>> >> currently values for avgtime are around 0.0003 for subop_w_lat and
>> 0.001-0.002
>> >> for op_w_lat
>> > OK, so there is no visible trend on op_w_lat, still between 0.001 and
>> 0.002
>> >
>> > subop_w_lat seems to have increased since yesterday though! I see
>> values from
>> > 0.0004 to as high as 0.001
>> >
>> > If some other perf data might be interesting, please let me know..
>> >
>> > During OSD restarts, I noticed strange thing - restarts on first 6
>> machines
>> > went smooth, but then on another 3, I saw rocksdb logs recovery on all
>> SSD
>> > OSDs. but first didn't see any mention of daemon crash in ceph -s
>> >
>> > later, crash info appeared, but only about 3 daemons (in total, at
>> least 20
>> > of them crashed though)
>> >
>> > crash report was similar for all three OSDs:
>> >
>> > [root@nrbphav4a ~]# ceph crash info
>> 2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-09 Thread Igor Fedotov

Hi Zakhar,

Let's leave questions regarding cache usage/tuning to a different topic 
for now. And concentrate on performance drop.


Could you please do the same experiment I asked from Nikola once your 
cluster reaches "bad performance" state (Nikola, could you please use 
this improved scenario as well?):


- collect perf counters for every OSD

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf counters again.

- Then restart OSDs one-by-one starting with the worst OSD (in terms of 
subop_w_lat from the prev step). Wouldn't be sufficient to reset just a 
few OSDs before the cluster is back to normal?


- if partial OSD restart is sufficient - please leave the remaining OSDs 
run as-is without reboot.


- after the restart (no matter partial or complete one - the key thing 
it's should successful) reset all the perf counters and leave the 
cluster run for 30 mins and collect perf counters again.


- wait 24 hours and collect the counters one more time

- share all four counters snapshots.


Thanks,

Igor

On 5/8/2023 11:31 PM, Zakhar Kirpichenko wrote:
Don't mean to hijack the thread, but I may be observing something 
similar with 16.2.12: OSD performance noticeably peaks after OSD 
restart and then gradually reduces over 10-14 days, while commit and 
apply latencies increase across the board.


Non-default settings are:

        "bluestore_cache_size_hdd": {
            "default": "1073741824",
            "mon": "4294967296",
            "final": "4294967296"
        },
        "bluestore_cache_size_ssd": {
            "default": "3221225472",
            "mon": "4294967296",
            "final": "4294967296"
        },
...
        "osd_memory_cache_min": {
            "default": "134217728",
            "mon": "2147483648",
            "final": "2147483648"
        },
        "osd_memory_target": {
            "default": "4294967296",
            "mon": "17179869184",
            "final": "17179869184"
        },
        "osd_scrub_sleep": {
            "default": 0,
            "mon": 0.10001,
            "final": 0.10001
        },
        "rbd_balance_parent_reads": {
            "default": false,
            "mon": true,
            "final": true
        },

All other settings are default, the usage is rather simple Openstack / 
RBD.


I also noticed that OSD cache usage doesn't increase over time (see my 
message "Ceph 16.2.12, bluestore cache doesn't seem to be used much" 
dated 26 April 2023, which received no comments), despite OSDs are 
being used rather heavily and there's plenty of host and OSD cache / 
target memory available. It may be worth checking if available memory 
is being used in a good way.


/Z

On Mon, 8 May 2023 at 22:35, Igor Fedotov  wrote:

Hey Nikola,

On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
> OK, starting collecting those for all OSDs..
> I have hour samples of all OSDs perf dumps loaded in DB, so I
can easily examine,
> sort, whatever..
>
You didn't reset the counters every hour, do you? So having average
subop_w_latency growing that way means the current values were much
higher than before.

Curious if subop latencies were growing for every OSD or just a
subset
(may be even just a single one) of them?


Next time you reach the bad state please do the following if possible:

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf counters
again.

- Then start restarting OSD one-by-one starting with the worst OSD
(in
terms of subop_w_lat from the prev step). Wouldn't be sufficient to
reset just a few OSDs before the cluster is back to normal?

>> currently values for avgtime are around 0.0003 for subop_w_lat
and 0.001-0.002
>> for op_w_lat
> OK, so there is no visible trend on op_w_lat, still between
0.001 and 0.002
>
> subop_w_lat seems to have increased since yesterday though! I
see values from
> 0.0004 to as high as 0.001
>
> If some other perf data might be interesting, please let me know..
>
> During OSD restarts, I noticed strange thing - restarts on first
6 machines
> went smooth, but then on another 3, I saw rocksdb logs recovery
on all SSD
> OSDs. but first didn't see any mention of daemon crash in ceph -s
>
> later, crash info appeared, but only about 3 daemons (in total,
at least 20
> of them crashed though)
>
> crash report was similar for all three OSDs:
>
> [root@nrbphav4a ~]# ceph crash info
2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
> {
>      "backtrace": [
>          "/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
>          "(BlueStore::_txc_create(BlueStore::Collection*,
BlueStore::OpSequencer*, std::__cxx11::list >*,
boost::intrusive_ptr)+0x413) [0x55a1c9d07c43]",
>


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-08 Thread Zakhar Kirpichenko
Don't mean to hijack the thread, but I may be observing something similar
with 16.2.12: OSD performance noticeably peaks after OSD restart and then
gradually reduces over 10-14 days, while commit and apply latencies
increase across the board.

Non-default settings are:

"bluestore_cache_size_hdd": {
"default": "1073741824",
"mon": "4294967296",
"final": "4294967296"
},
"bluestore_cache_size_ssd": {
"default": "3221225472",
"mon": "4294967296",
"final": "4294967296"
},
...
"osd_memory_cache_min": {
"default": "134217728",
"mon": "2147483648",
"final": "2147483648"
},
"osd_memory_target": {
"default": "4294967296",
"mon": "17179869184",
"final": "17179869184"
},
"osd_scrub_sleep": {
"default": 0,
"mon": 0.10001,
"final": 0.10001
},
"rbd_balance_parent_reads": {
"default": false,
"mon": true,
"final": true
},

All other settings are default, the usage is rather simple Openstack / RBD.

I also noticed that OSD cache usage doesn't increase over time (see my
message "Ceph 16.2.12, bluestore cache doesn't seem to be used much" dated
26 April 2023, which received no comments), despite OSDs are being used
rather heavily and there's plenty of host and OSD cache / target memory
available. It may be worth checking if available memory is being used in a
good way.

/Z

On Mon, 8 May 2023 at 22:35, Igor Fedotov  wrote:

> Hey Nikola,
>
> On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
> > OK, starting collecting those for all OSDs..
> > I have hour samples of all OSDs perf dumps loaded in DB, so I can easily
> examine,
> > sort, whatever..
> >
> You didn't reset the counters every hour, do you? So having average
> subop_w_latency growing that way means the current values were much
> higher than before.
>
> Curious if subop latencies were growing for every OSD or just a subset
> (may be even just a single one) of them?
>
>
> Next time you reach the bad state please do the following if possible:
>
> - reset perf counters for every OSD
>
> -  leave the cluster running for 10 mins and collect perf counters again.
>
> - Then start restarting OSD one-by-one starting with the worst OSD (in
> terms of subop_w_lat from the prev step). Wouldn't be sufficient to
> reset just a few OSDs before the cluster is back to normal?
>
> >> currently values for avgtime are around 0.0003 for subop_w_lat and
> 0.001-0.002
> >> for op_w_lat
> > OK, so there is no visible trend on op_w_lat, still between 0.001 and
> 0.002
> >
> > subop_w_lat seems to have increased since yesterday though! I see values
> from
> > 0.0004 to as high as 0.001
> >
> > If some other perf data might be interesting, please let me know..
> >
> > During OSD restarts, I noticed strange thing - restarts on first 6
> machines
> > went smooth, but then on another 3, I saw rocksdb logs recovery on all
> SSD
> > OSDs. but first didn't see any mention of daemon crash in ceph -s
> >
> > later, crash info appeared, but only about 3 daemons (in total, at least
> 20
> > of them crashed though)
> >
> > crash report was similar for all three OSDs:
> >
> > [root@nrbphav4a ~]# ceph crash info
> 2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
> > {
> >  "backtrace": [
> >  "/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
> >  "(BlueStore::_txc_create(BlueStore::Collection*,
> BlueStore::OpSequencer*, std::__cxx11::list std::allocator >*, boost::intrusive_ptr)+0x413)
> [0x55a1c9d07c43]",
> >
> "(BlueStore::queue_transactions(boost::intrusive_ptr&,
> std::vector
> >&, boost::intrusive_ptr, ThreadPool::TPHandle*)+0x22b)
> [0x55a1c9d27e9b]",
> >  "(ReplicatedBackend::submit_transaction(hobject_t const&,
> object_stat_sum_t const&, eversion_t const&, std::unique_ptr std::default_delete >&&, eversion_t const&, eversion_t
> const&, std::vector >&&,
> std::optional&, Context*, unsigned long, osd_reqid_t,
> boost::intrusive_ptr)+0x8ad) [0x55a1c9bbcfdd]",
> >  "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
> PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",
> >
> "(PrimaryLogPG::simple_opc_submit(std::unique_ptr std::default_delete >)+0x57) [0x55a1c99d6777]",
> >
> "(PrimaryLogPG::handle_watch_timeout(std::shared_ptr)+0xb73)
> [0x55a1c99da883]",
> >  "/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
> >  "(CommonSafeTimer::timer_thread()+0x11a)
> [0x55a1c9e226aa]",
> >  "/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
> >  "/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
> >  "/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
> >  ],
> >  "ceph_version": "17.2.6",
> >  "crash_id":
> "2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
> >  

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-08 Thread Igor Fedotov

Hey Nikola,

On 5/8/2023 10:13 PM, Nikola Ciprich wrote:

OK, starting collecting those for all OSDs..
I have hour samples of all OSDs perf dumps loaded in DB, so I can easily 
examine,
sort, whatever..

You didn't reset the counters every hour, do you? So having average 
subop_w_latency growing that way means the current values were much 
higher than before.


Curious if subop latencies were growing for every OSD or just a subset 
(may be even just a single one) of them?



Next time you reach the bad state please do the following if possible:

- reset perf counters for every OSD

-  leave the cluster running for 10 mins and collect perf counters again.

- Then start restarting OSD one-by-one starting with the worst OSD (in 
terms of subop_w_lat from the prev step). Wouldn't be sufficient to 
reset just a few OSDs before the cluster is back to normal?



currently values for avgtime are around 0.0003 for subop_w_lat and 0.001-0.002
for op_w_lat

OK, so there is no visible trend on op_w_lat, still between 0.001 and 0.002

subop_w_lat seems to have increased since yesterday though! I see values from
0.0004 to as high as 0.001

If some other perf data might be interesting, please let me know..

During OSD restarts, I noticed strange thing - restarts on first 6 machines
went smooth, but then on another 3, I saw rocksdb logs recovery on all SSD
OSDs. but first didn't see any mention of daemon crash in ceph -s

later, crash info appeared, but only about 3 daemons (in total, at least 20
of them crashed though)

crash report was similar for all three OSDs:

[root@nrbphav4a ~]# ceph crash info 
2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
{
 "backtrace": [
 "/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
 "(BlueStore::_txc_create(BlueStore::Collection*, BlueStore::OpSequencer*, 
std::__cxx11::list >*, 
boost::intrusive_ptr)+0x413) [0x55a1c9d07c43]",
 "(BlueStore::queue_transactions(boost::intrusive_ptr&, 
std::vector >&, 
boost::intrusive_ptr, ThreadPool::TPHandle*)+0x22b) [0x55a1c9d27e9b]",
 "(ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr >&&, eversion_t const&, eversion_t const&, std::vector >&&, std::optional&, Context*, unsigned long, osd_reqid_t, 
boost::intrusive_ptr)+0x8ad) [0x55a1c9bbcfdd]",
 "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, 
PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",
 "(PrimaryLogPG::simple_opc_submit(std::unique_ptr >)+0x57) [0x55a1c99d6777]",
 "(PrimaryLogPG::handle_watch_timeout(std::shared_ptr)+0xb73) 
[0x55a1c99da883]",
 "/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
 "(CommonSafeTimer::timer_thread()+0x11a) [0x55a1c9e226aa]",
 "/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
 "/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
 "/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
 ],
 "ceph_version": "17.2.6",
 "crash_id": 
"2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
 "entity_name": "osd.98",
 "os_id": "almalinux",
 "os_name": "AlmaLinux",
 "os_version": "9.0 (Emerald Puma)",
 "os_version_id": "9.0",
 "process_name": "ceph-osd",
 "stack_sig": 
"b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66",
 "timestamp": "2023-05-08T17:45:47.056675Z",
 "utsname_hostname": "nrbphav4h",
 "utsname_machine": "x86_64",
 "utsname_release": "5.15.90lb9.01",
 "utsname_sysname": "Linux",
 "utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET 2023"
}


I was trying to figure out why this particular 3 nodes could behave differently
and found out from colleagues, that those 3 nodes were added to cluster lately
with direct install of 17.2.5 (others were installed 15.2.16 and later upgraded)

not sure whether this is related to our problem though..

I see very similar crash reported here:https://tracker.ceph.com/issues/56346
so I'm not reporting..

Do you think this might somehow be the cause of the problem? Anything else I 
should
check in perf dumps or elsewhere?


Hmm... don't know yet. Could you please last 20K lines prior the crash 
from e.g two sample OSDs?


And the crash isn't permanent, OSDs are able to start after the 
second(?) shot, aren't they?



with best regards

nik







--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web  | LinkedIn  | 
Youtube  | 
Twitter 


Meet us at the SC22 Conference! Learn more 
Technology Fast50 Award Winner by Deloitte 
!



[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-08 Thread Nikola Ciprich
Hello Igor,

so I was checking the performance every day since Tuesday.. every day it seemed
to be the same - ~ 60-70kOPS on random write from single VM
yesterday it finally dropped to 20kOPS
today to 10kOPS. I also tried with newly created volume, the result (after 
prefill)
is the same, so it doesn't make any difference..

so I reverted all mentioned options to their defaults and restarted all OSDs.
performance immediately returned to better values (I suppose this is again
caused by the restart only)

good news is, that setting osd_fast_shutdown_timeout to 0 really helped with
OSD crashes during restarts, which speeds it up a lot.. but I have some new
crashes, more on this later..

> > I'd suggest to start monitoring perf counters for your osds.
> > op_w_lat/subop_w_lat ones specifically. I presume they raise eventually,
> > don't they?
> OK, starting collecting those for all OSDs..
I have hour samples of all OSDs perf dumps loaded in DB, so I can easily 
examine,
sort, whatever..


> 
> currently values for avgtime are around 0.0003 for subop_w_lat and 0.001-0.002
> for op_w_lat
OK, so there is no visible trend on op_w_lat, still between 0.001 and 0.002

subop_w_lat seems to have increased since yesterday though! I see values from
0.0004 to as high as 0.001

If some other perf data might be interesting, please let me know..

During OSD restarts, I noticed strange thing - restarts on first 6 machines
went smooth, but then on another 3, I saw rocksdb logs recovery on all SSD
OSDs. but first didn't see any mention of daemon crash in ceph -s

later, crash info appeared, but only about 3 daemons (in total, at least 20
of them crashed though)

crash report was similar for all three OSDs:

[root@nrbphav4a ~]# ceph crash info 
2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
{
"backtrace": [
"/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
"(BlueStore::_txc_create(BlueStore::Collection*, 
BlueStore::OpSequencer*, std::__cxx11::list 
>*, boost::intrusive_ptr)+0x413) [0x55a1c9d07c43]",

"(BlueStore::queue_transactions(boost::intrusive_ptr&,
 std::vector >&, 
boost::intrusive_ptr, ThreadPool::TPHandle*)+0x22b) 
[0x55a1c9d27e9b]",
"(ReplicatedBackend::submit_transaction(hobject_t const&, 
object_stat_sum_t const&, eversion_t const&, std::unique_ptr >&&, eversion_t const&, eversion_t const&, 
std::vector >&&, 
std::optional&, Context*, unsigned long, osd_reqid_t, 
boost::intrusive_ptr)+0x8ad) [0x55a1c9bbcfdd]",
"(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, 
PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",

"(PrimaryLogPG::simple_opc_submit(std::unique_ptr >)+0x57) [0x55a1c99d6777]",
"(PrimaryLogPG::handle_watch_timeout(std::shared_ptr)+0xb73) 
[0x55a1c99da883]",
"/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
"(CommonSafeTimer::timer_thread()+0x11a) [0x55a1c9e226aa]",
"/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
"/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
"/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
],
"ceph_version": "17.2.6",
"crash_id": 
"2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
"entity_name": "osd.98",
"os_id": "almalinux",
"os_name": "AlmaLinux",
"os_version": "9.0 (Emerald Puma)",
"os_version_id": "9.0",
"process_name": "ceph-osd",
"stack_sig": 
"b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66",
"timestamp": "2023-05-08T17:45:47.056675Z",
"utsname_hostname": "nrbphav4h",
"utsname_machine": "x86_64",
"utsname_release": "5.15.90lb9.01",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET 2023"
}


I was trying to figure out why this particular 3 nodes could behave differently
and found out from colleagues, that those 3 nodes were added to cluster lately
with direct install of 17.2.5 (others were installed 15.2.16 and later upgraded)

not sure whether this is related to our problem though..

I see very similar crash reported here: https://tracker.ceph.com/issues/56346
so I'm not reporting..

Do you think this might somehow be the cause of the problem? Anything else I 
should
check in perf dumps or elsewhere?

with best regards

nik






-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-03 Thread Igor Fedotov



On 5/2/2023 9:02 PM, Nikola Ciprich wrote:


hewever, probably worh noting, historically we're using following OSD options:
ceph config set osd bluestore_rocksdb_options 
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB
ceph config set osd bluestore_cache_autotune 0
ceph config set osd bluestore_cache_size_ssd 2G
ceph config set osd bluestore_cache_kv_ratio 0.2
ceph config set osd bluestore_cache_meta_ratio 0.8
ceph config set osd osd_min_pg_log_entries 10
ceph config set osd osd_max_pg_log_entries 10
ceph config set osd osd_pg_log_dups_tracked 10
ceph config set osd osd_pg_log_trim_min 10

so maybe I'll start resetting those to defaults (ie enabling cache autotune etc)
as a first step..


Generally I wouldn't recommend using non-default settings unless there 
are explicit rationales. So yeah better to revert to defaults whenever 
possible.


I doubt this is a root cause for your issue though..







Thanks,

Igor

On 5/2/2023 11:32 AM, Nikola Ciprich wrote:

Hello dear CEPH users and developers,

we're dealing with strange problems.. we're having 12 node alma linux 9 cluster,
initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running bunch
of KVM virtual machines accessing volumes using RBD.

everything is working well, but there is strange and for us quite serious issue
   - speed of write operations (both sequential and random) is constantly 
degrading
   drastically to almost unusable numbers (in ~1week it drops from ~70k 4k 
writes/s
   from 1 VM  to ~7k writes/s)

When I restart all OSD daemons, numbers immediately return to normal..

volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
INTEL SSDPE2KX080T8 NVMEs.

I've updated cluster to 17.2.6 some time ago, but the problem persists. This is
especially annoying in connection with https://tracker.ceph.com/issues/56896
as restarting OSDs is quite painfull when half of them crash..

I don't see anything suspicious, nodes load is quite low, no logs errors,
network latency and throughput is OK too

Anyone having simimar issue?

I'd like to ask for hints on what should I check further..

we're running lots of 14.2.x and 15.2.x clusters, none showing similar
issue, so I'm suspecting this is something related to quincy

thanks a lot in advance

with best regards

nikola ciprich




--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Nikola Ciprich
Hello Igor,

On Tue, May 02, 2023 at 05:41:04PM +0300, Igor Fedotov wrote:
> Hi Nikola,
> 
> I'd suggest to start monitoring perf counters for your osds.
> op_w_lat/subop_w_lat ones specifically. I presume they raise eventually,
> don't they?
OK, starting collecting those for all OSDs..

currently values for avgtime are around 0.0003 for subop_w_lat and 0.001-0.002
for op_w_lat

I guess it'll need some time to find some trend, so I'll check tomorrow


> 
> Does subop_w_lat grow for every OSD or just a subset of them? How large is
> the delta between the best and the worst OSDs after a one week period? How
> many "bad" OSDs are at this point?
I'll see and report

> 
> 
> And some more questions:
> 
> How large are space utilization/fragmentation for your OSDs?
OSD usage is around 16-18%. fragmentation should not be very bad, this
cluster is deployed for few months only


> 
> Is the same performance drop observed for artificial benchmarks, e.g. 4k
> random writes to a fresh RBD image using fio?
will check again when the slowdown occurs and report


> 
> Is there any RAM utilization growth for OSD processes over time? Or may be
> any suspicious growth in mempool stats?
nope, RAM usage seems to be pretty constant.

hewever, probably worh noting, historically we're using following OSD options:
ceph config set osd bluestore_rocksdb_options 
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB
ceph config set osd bluestore_cache_autotune 0
ceph config set osd bluestore_cache_size_ssd 2G
ceph config set osd bluestore_cache_kv_ratio 0.2
ceph config set osd bluestore_cache_meta_ratio 0.8
ceph config set osd osd_min_pg_log_entries 10
ceph config set osd osd_max_pg_log_entries 10
ceph config set osd osd_pg_log_dups_tracked 10
ceph config set osd osd_pg_log_trim_min 10

so maybe I'll start resetting those to defaults (ie enabling cache autotune etc)
as a first step..


> 
> 
> As a blind and brute force approach you might also want to compact RocksDB
> through ceph-kvstore-tool and switch bluestore allocator to bitmap
> (presuming default hybrid one is effective right now). Please do one
> modification at a time to realize what action is actually helpful if any.
will do..

thanks again for your hints

BR

nik


> 
> 
> Thanks,
> 
> Igor
> 
> On 5/2/2023 11:32 AM, Nikola Ciprich wrote:
> > Hello dear CEPH users and developers,
> > 
> > we're dealing with strange problems.. we're having 12 node alma linux 9 
> > cluster,
> > initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running 
> > bunch
> > of KVM virtual machines accessing volumes using RBD.
> > 
> > everything is working well, but there is strange and for us quite serious 
> > issue
> >   - speed of write operations (both sequential and random) is constantly 
> > degrading
> >   drastically to almost unusable numbers (in ~1week it drops from ~70k 4k 
> > writes/s
> >   from 1 VM  to ~7k writes/s)
> > 
> > When I restart all OSD daemons, numbers immediately return to normal..
> > 
> > volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
> > INTEL SSDPE2KX080T8 NVMEs.
> > 
> > I've updated cluster to 17.2.6 some time ago, but the problem persists. 
> > This is
> > especially annoying in connection with https://tracker.ceph.com/issues/56896
> > as restarting OSDs is quite painfull when half of them crash..
> > 
> > I don't see anything suspicious, nodes load is quite low, no logs errors,
> > network latency and throughput is OK too
> > 
> > Anyone having simimar issue?
> > 
> > I'd like to ask for hints on what should I check further..
> > 
> > we're running lots of 14.2.x and 15.2.x clusters, none showing similar
> > issue, so I'm suspecting this is something related to quincy
> > 
> > thanks a lot in advance
> > 
> > with best regards
> > 
> > nikola ciprich
> > 
> > 
> > 
> -- 
> Igor Fedotov
> Ceph Lead Developer
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Gregory Farnum
On Tue, May 2, 2023 at 7:54 AM Igor Fedotov  wrote:
>
>
> On 5/2/2023 11:32 AM, Nikola Ciprich wrote:
> > I've updated cluster to 17.2.6 some time ago, but the problem persists. 
> > This is
> > especially annoying in connection with https://tracker.ceph.com/issues/56896
> > as restarting OSDs is quite painfull when half of them crash..
> > with best regards
> >
> Feel free to set osd_fast_shutdown_timeout to zero to workaround the
> above. IMO this assertion is a nonsence and I don't see any usage of
> this timeout parameter other than just throw an assertion.

This was added by Gabi in
https://github.com/ceph/ceph/commit/9b2a64a5f6ea743b2a4f4c2dbd703248d88b2a96;
presumably he has insight.

I wonder if it's just a debug config so we can see slow shutdowns in
our test runs? In which case it should certainly default to 0 and get
set for those test suites.
-Greg

>
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Igor Fedotov



On 5/2/2023 11:32 AM, Nikola Ciprich wrote:

I've updated cluster to 17.2.6 some time ago, but the problem persists. This is
especially annoying in connection with https://tracker.ceph.com/issues/56896
as restarting OSDs is quite painfull when half of them crash..
with best regards

Feel free to set osd_fast_shutdown_timeout to zero to workaround the 
above. IMO this assertion is a nonsence and I don't see any usage of 
this timeout parameter other than just throw an assertion.



--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Igor Fedotov

Hi Nikola,

I'd suggest to start monitoring perf counters for your osds. 
op_w_lat/subop_w_lat ones specifically. I presume they raise eventually, 
don't they?


Does subop_w_lat grow for every OSD or just a subset of them? How large 
is the delta between the best and the worst OSDs after a one week 
period? How many "bad" OSDs are at this point?



And some more questions:

How large are space utilization/fragmentation for your OSDs?

Is the same performance drop observed for artificial benchmarks, e.g. 4k 
random writes to a fresh RBD image using fio?


Is there any RAM utilization growth for OSD processes over time? Or may 
be any suspicious growth in mempool stats?



As a blind and brute force approach you might also want to compact 
RocksDB through ceph-kvstore-tool and switch bluestore allocator to 
bitmap (presuming default hybrid one is effective right now). Please do 
one modification at a time to realize what action is actually helpful if 
any.



Thanks,

Igor

On 5/2/2023 11:32 AM, Nikola Ciprich wrote:

Hello dear CEPH users and developers,

we're dealing with strange problems.. we're having 12 node alma linux 9 cluster,
initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running bunch
of KVM virtual machines accessing volumes using RBD.

everything is working well, but there is strange and for us quite serious issue
  - speed of write operations (both sequential and random) is constantly 
degrading
  drastically to almost unusable numbers (in ~1week it drops from ~70k 4k 
writes/s
  from 1 VM  to ~7k writes/s)

When I restart all OSD daemons, numbers immediately return to normal..

volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
INTEL SSDPE2KX080T8 NVMEs.

I've updated cluster to 17.2.6 some time ago, but the problem persists. This is
especially annoying in connection with https://tracker.ceph.com/issues/56896
as restarting OSDs is quite painfull when half of them crash..

I don't see anything suspicious, nodes load is quite low, no logs errors,
network latency and throughput is OK too

Anyone having simimar issue?

I'd like to ask for hints on what should I check further..

we're running lots of 14.2.x and 15.2.x clusters, none showing similar
issue, so I'm suspecting this is something related to quincy

thanks a lot in advance

with best regards

nikola ciprich




--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io