[ceph-users] Re: has anyone enabled bdev_enable_discard?
Is there any update on this? Did someone test the option and has performance values before and after? Is there any good documentation regarding this option? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
I played with this feature a while ago and recall it had visible negative impact on user operations due to the need to submit tons of discard operations - effectively each data overwrite operation triggers one or more discard operation submission to disk. And I doubt this has been widely used if any. Nevertheless recently we've got a PR to rework some aspects of thread management for this stuff, see https://github.com/ceph/ceph/pull/55469 The author claimed they needed this feature for their cluster so you might want to ask him about their user experience. W.r.t documentation - actually there are just two options - bdev_enable_discard - enables issuing discard to disk - bdev_async_discard - instructs whether discard requests are issued synchronously (along with disk extents release) or asynchronously (using a background thread). Thanks, Igor On 01/03/2024 13:06, jst...@proxforge.de wrote: Is there any update on this? Did someone test the option and has performance values before and after? Is there any good documentation regarding this option? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
I have a number of drives in my fleet with old firmware that seems to have discard / TRIM bugs, as in the drives get bricked. Much worse is that since they're on legacy RAID HBAs, many of them can't be updated. ymmv. > On Mar 1, 2024, at 13:15, Igor Fedotov wrote: > > I played with this feature a while ago and recall it had visible negative > impact on user operations due to the need to submit tons of discard > operations - effectively each data overwrite operation triggers one or more > discard operation submission to disk. > > And I doubt this has been widely used if any. > > Nevertheless recently we've got a PR to rework some aspects of thread > management for this stuff, see https://github.com/ceph/ceph/pull/55469 > > The author claimed they needed this feature for their cluster so you might > want to ask him about their user experience. > > > W.r.t documentation - actually there are just two options > > - bdev_enable_discard - enables issuing discard to disk > > - bdev_async_discard - instructs whether discard requests are issued > synchronously (along with disk extents release) or asynchronously (using a > background thread). > > Thanks, > > Igor > > On 01/03/2024 13:06, jst...@proxforge.de wrote: >> Is there any update on this? Did someone test the option and has performance >> values before and after? >> Is there any good documentation regarding this option? >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
I came across an enterprise NVMe used for BlueFS DB whose performance dropped sharply after a few months of delivery (I won't mention the brand here but it was not among these 3: Intel, Samsung, Micron). It is clear that enabling bdev_enable_discard impacted performance, but this option also saved the platform after a few days of discard. IMHO the most important thing is to validate the behavior when there has been a write to the entire flash media. But this option has the merit of existing. it seems to me that the ideal would be not to have several options on bdev_*discard, and that this task should be asynchronous and with the (D)iscard instructions during a calmer period of activity (I do not see any impact if the instructions are lost during an OSD reboot) Le ven. 1 mars 2024 à 19:17, Igor Fedotov a écrit : > I played with this feature a while ago and recall it had visible > negative impact on user operations due to the need to submit tons of > discard operations - effectively each data overwrite operation triggers > one or more discard operation submission to disk. > > And I doubt this has been widely used if any. > > Nevertheless recently we've got a PR to rework some aspects of thread > management for this stuff, see https://github.com/ceph/ceph/pull/55469 > > The author claimed they needed this feature for their cluster so you > might want to ask him about their user experience. > > > W.r.t documentation - actually there are just two options > > - bdev_enable_discard - enables issuing discard to disk > > - bdev_async_discard - instructs whether discard requests are issued > synchronously (along with disk extents release) or asynchronously (using > a background thread). > > Thanks, > > Igor > > On 01/03/2024 13:06, jst...@proxforge.de wrote: > > Is there any update on this? Did someone test the option and has > > performance values before and after? > > Is there any good documentation regarding this option? > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
We've had a specific set of drives that we've had to enable bdev_enable_discard and bdev_async_discard for in order to maintain acceptable performance on block clusters. I wrote the patch that Igor mentioned in order to try and send more parallel discards to the devices, but these ones in particular seem to process them in serial (based on observed discard counts and latency going to the device), which is unfortunate. We're also testing new firmware that suggests it should help alleviate some of the initial concerns we had about discards not keeping up which prompted the patch in the first place. Most of our drives do not need discards enabled (and definitely not without async) in order to maintain performance unless we're doing a full disk fio test or something like that where we're trying to find its cliff profile. We've used OSD classes to help target the options being applied to specific OSDs via centralized conf which helps when we would add new hosts that may have different drives so that the options weren't applied globally. Based on our experience, I wouldn't enable it unless you're seeing some sort of cliff-like behaviour as your OSDs run low on free space, or are heavily fragmented. I would also deem bdev_async_enabled = 1 to be a requirement so that it doesn't block user IO. Keep an eye on your discards being sent to devices and the discard latency, as well (via node_exporter, for example). Matt On 2024-03-02 06:18, David C. wrote: I came across an enterprise NVMe used for BlueFS DB whose performance dropped sharply after a few months of delivery (I won't mention the brand here but it was not among these 3: Intel, Samsung, Micron). It is clear that enabling bdev_enable_discard impacted performance, but this option also saved the platform after a few days of discard. IMHO the most important thing is to validate the behavior when there has been a write to the entire flash media. But this option has the merit of existing. it seems to me that the ideal would be not to have several options on bdev_*discard, and that this task should be asynchronous and with the (D)iscard instructions during a calmer period of activity (I do not see any impact if the instructions are lost during an OSD reboot) Le ven. 1 mars 2024 à 19:17, Igor Fedotov a écrit : I played with this feature a while ago and recall it had visible negative impact on user operations due to the need to submit tons of discard operations - effectively each data overwrite operation triggers one or more discard operation submission to disk. And I doubt this has been widely used if any. Nevertheless recently we've got a PR to rework some aspects of thread management for this stuff, see https://github.com/ceph/ceph/pull/55469 The author claimed they needed this feature for their cluster so you might want to ask him about their user experience. W.r.t documentation - actually there are just two options - bdev_enable_discard - enables issuing discard to disk - bdev_async_discard - instructs whether discard requests are issued synchronously (along with disk extents release) or asynchronously (using a background thread). Thanks, Igor On 01/03/2024 13:06, jst...@proxforge.de wrote: > Is there any update on this? Did someone test the option and has > performance values before and after? > Is there any good documentation regarding this option? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
Could we not consider setting up a “bluefstrim” which could be orchestrated ? This would avoid having a continuous stream of (D)iscard instructions on the disks during activity. A weekly (probably monthly) bluefstrim could probably be enough for platforms that really need it. Le sam. 2 mars 2024 à 12:58, Matt Vandermeulen a écrit : > We've had a specific set of drives that we've had to enable > bdev_enable_discard and bdev_async_discard for in order to maintain > acceptable performance on block clusters. I wrote the patch that Igor > mentioned in order to try and send more parallel discards to the > devices, but these ones in particular seem to process them in serial > (based on observed discard counts and latency going to the device), > which is unfortunate. We're also testing new firmware that suggests it > should help alleviate some of the initial concerns we had about discards > not keeping up which prompted the patch in the first place. > > Most of our drives do not need discards enabled (and definitely not > without async) in order to maintain performance unless we're doing a > full disk fio test or something like that where we're trying to find its > cliff profile. We've used OSD classes to help target the options being > applied to specific OSDs via centralized conf which helps when we would > add new hosts that may have different drives so that the options weren't > applied globally. > > Based on our experience, I wouldn't enable it unless you're seeing some > sort of cliff-like behaviour as your OSDs run low on free space, or are > heavily fragmented. I would also deem bdev_async_enabled = 1 to be a > requirement so that it doesn't block user IO. Keep an eye on your > discards being sent to devices and the discard latency, as well (via > node_exporter, for example). > > Matt > > > On 2024-03-02 06:18, David C. wrote: > > I came across an enterprise NVMe used for BlueFS DB whose performance > > dropped sharply after a few months of delivery (I won't mention the > > brand > > here but it was not among these 3: Intel, Samsung, Micron). > > It is clear that enabling bdev_enable_discard impacted performance, but > > this option also saved the platform after a few days of discard. > > > > IMHO the most important thing is to validate the behavior when there > > has > > been a write to the entire flash media. > > But this option has the merit of existing. > > > > it seems to me that the ideal would be not to have several options on > > bdev_*discard, and that this task should be asynchronous and with the > > (D)iscard instructions during a calmer period of activity (I do not see > > any > > impact if the instructions are lost during an OSD reboot) > > > > > > Le ven. 1 mars 2024 à 19:17, Igor Fedotov a > > écrit : > > > >> I played with this feature a while ago and recall it had visible > >> negative impact on user operations due to the need to submit tons of > >> discard operations - effectively each data overwrite operation > >> triggers > >> one or more discard operation submission to disk. > >> > >> And I doubt this has been widely used if any. > >> > >> Nevertheless recently we've got a PR to rework some aspects of thread > >> management for this stuff, see https://github.com/ceph/ceph/pull/55469 > >> > >> The author claimed they needed this feature for their cluster so you > >> might want to ask him about their user experience. > >> > >> > >> W.r.t documentation - actually there are just two options > >> > >> - bdev_enable_discard - enables issuing discard to disk > >> > >> - bdev_async_discard - instructs whether discard requests are issued > >> synchronously (along with disk extents release) or asynchronously > >> (using > >> a background thread). > >> > >> Thanks, > >> > >> Igor > >> > >> On 01/03/2024 13:06, jst...@proxforge.de wrote: > >> > Is there any update on this? Did someone test the option and has > >> > performance values before and after? > >> > Is there any good documentation regarding this option? > >> > ___ > >> > ceph-users mailing list -- ceph-users@ceph.io > >> > To unsubscribe send an email to ceph-users-le...@ceph.io > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
Periodic discard was actually attempted in the past: https://github.com/ceph/ceph/pull/20723 A proper implementation would probably need appropriate scheduling/throttling that can be tuned so as to balance against client I/O impact. Josh On Sat, Mar 2, 2024 at 6:20 AM David C. wrote: > > Could we not consider setting up a “bluefstrim” which could be orchestrated > ? > > This would avoid having a continuous stream of (D)iscard instructions on > the disks during activity. > > A weekly (probably monthly) bluefstrim could probably be enough for > platforms that really need it. > > > Le sam. 2 mars 2024 à 12:58, Matt Vandermeulen a > écrit : > > > We've had a specific set of drives that we've had to enable > > bdev_enable_discard and bdev_async_discard for in order to maintain > > acceptable performance on block clusters. I wrote the patch that Igor > > mentioned in order to try and send more parallel discards to the > > devices, but these ones in particular seem to process them in serial > > (based on observed discard counts and latency going to the device), > > which is unfortunate. We're also testing new firmware that suggests it > > should help alleviate some of the initial concerns we had about discards > > not keeping up which prompted the patch in the first place. > > > > Most of our drives do not need discards enabled (and definitely not > > without async) in order to maintain performance unless we're doing a > > full disk fio test or something like that where we're trying to find its > > cliff profile. We've used OSD classes to help target the options being > > applied to specific OSDs via centralized conf which helps when we would > > add new hosts that may have different drives so that the options weren't > > applied globally. > > > > Based on our experience, I wouldn't enable it unless you're seeing some > > sort of cliff-like behaviour as your OSDs run low on free space, or are > > heavily fragmented. I would also deem bdev_async_enabled = 1 to be a > > requirement so that it doesn't block user IO. Keep an eye on your > > discards being sent to devices and the discard latency, as well (via > > node_exporter, for example). > > > > Matt > > > > > > On 2024-03-02 06:18, David C. wrote: > > > I came across an enterprise NVMe used for BlueFS DB whose performance > > > dropped sharply after a few months of delivery (I won't mention the > > > brand > > > here but it was not among these 3: Intel, Samsung, Micron). > > > It is clear that enabling bdev_enable_discard impacted performance, but > > > this option also saved the platform after a few days of discard. > > > > > > IMHO the most important thing is to validate the behavior when there > > > has > > > been a write to the entire flash media. > > > But this option has the merit of existing. > > > > > > it seems to me that the ideal would be not to have several options on > > > bdev_*discard, and that this task should be asynchronous and with the > > > (D)iscard instructions during a calmer period of activity (I do not see > > > any > > > impact if the instructions are lost during an OSD reboot) > > > > > > > > > Le ven. 1 mars 2024 à 19:17, Igor Fedotov a > > > écrit : > > > > > >> I played with this feature a while ago and recall it had visible > > >> negative impact on user operations due to the need to submit tons of > > >> discard operations - effectively each data overwrite operation > > >> triggers > > >> one or more discard operation submission to disk. > > >> > > >> And I doubt this has been widely used if any. > > >> > > >> Nevertheless recently we've got a PR to rework some aspects of thread > > >> management for this stuff, see https://github.com/ceph/ceph/pull/55469 > > >> > > >> The author claimed they needed this feature for their cluster so you > > >> might want to ask him about their user experience. > > >> > > >> > > >> W.r.t documentation - actually there are just two options > > >> > > >> - bdev_enable_discard - enables issuing discard to disk > > >> > > >> - bdev_async_discard - instructs whether discard requests are issued > > >> synchronously (along with disk extents release) or asynchronously > > >> (using > > >> a background thread). > > >> > > >> Thanks, > > >> > > >> Igor > > >> > > >> On 01/03/2024 13:06, jst...@proxforge.de wrote: > > >> > Is there any update on this? Did someone test the option and has > > >> > performance values before and after? > > >> > Is there any good documentation regarding this option? > > >> > ___ > > >> > ceph-users mailing list -- ceph-users@ceph.io > > >> > To unsubscribe send an email to ceph-users-le...@ceph.io > > >> ___ > > >> ceph-users mailing list -- ceph-users@ceph.io > > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > >> > > > ___ > > > ceph-users mailing list -- ceph-users@ceph.io > > > To unsubscribe send an email to ceph-users-le...@c
[ceph-users] Re: has anyone enabled bdev_enable_discard?
Is there any update on this? Did someone test the option and has performance values before and after? Is there any good documentation regarding this option? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
On 4/12/21 5:46 PM, Dan van der Ster wrote: > Hi all, > > bdev_enable_discard has been in ceph for several major releases now > but it is still off by default. > Did anyone try it recently -- is it safe to use? And do you have perf > numbers before and after enabling? > I have done so on SATA SSDs in a few cases and: it worked Did I notice a real difference? Not really. It's highly debated if this still makes a difference with modern flash devices. I don't think there is a real conclusion if you still need to trim/discard blocks. Wido > Cheers, Dan > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
On Tue, Apr 13, 2021 at 9:00 AM Wido den Hollander wrote: > > > > On 4/12/21 5:46 PM, Dan van der Ster wrote: > > Hi all, > > > > bdev_enable_discard has been in ceph for several major releases now > > but it is still off by default. > > Did anyone try it recently -- is it safe to use? And do you have perf > > numbers before and after enabling? > > > > I have done so on SATA SSDs in a few cases and: it worked > > Did I notice a real difference? Not really. > Thanks, I've enabled it on a test box and am draining data to check that it doesn't crash anything. > It's highly debated if this still makes a difference with modern flash > devices. I don't think there is a real conclusion if you still need to > trim/discard blocks. Do you happen to have any more info on these debates? As you know we have seen major performance issues on hypervisors that are not running a periodic fstrim; we use similar or identical SATA ssds for HV local storage and our block.db's. If it doesn't hurt anything, why wouldn't we enable it by default? Cheers, Dan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
On 4/13/21 4:07 AM, Dan van der Ster wrote: On Tue, Apr 13, 2021 at 9:00 AM Wido den Hollander wrote: On 4/12/21 5:46 PM, Dan van der Ster wrote: Hi all, bdev_enable_discard has been in ceph for several major releases now but it is still off by default. Did anyone try it recently -- is it safe to use? And do you have perf numbers before and after enabling? I have done so on SATA SSDs in a few cases and: it worked Did I notice a real difference? Not really. Thanks, I've enabled it on a test box and am draining data to check that it doesn't crash anything. It's highly debated if this still makes a difference with modern flash devices. I don't think there is a real conclusion if you still need to trim/discard blocks. Do you happen to have any more info on these debates? As you know we have seen major performance issues on hypervisors that are not running a periodic fstrim; we use similar or identical SATA ssds for HV local storage and our block.db's. If it doesn't hurt anything, why wouldn't we enable it by default? There's some good discussion in the original PR: https://github.com/ceph/ceph/pull/14727 I suspect that the primary concerns for enabling it by default are two fold: (1) the issue of having to maintain a blocklist for buggy firmware implementations (2) Even "good" firmware can potentially see slowdowns with bursts of trim commands due to needing to update the FTL metadata per this comment: https://github.com/ceph/ceph/pull/14727#issuecomment-342399578 The original issue of how to decide between online discard, periodic bulk discard, or no discard is still an issue imho. I think we probably need to get more feedback from people with real large deployments (hint hint :D) before we enable online discard by default. Mark Cheers, Dan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
On Tue, Apr 13, 2021 at 12:35 PM Mark Nelson wrote: > > On 4/13/21 4:07 AM, Dan van der Ster wrote: > > > On Tue, Apr 13, 2021 at 9:00 AM Wido den Hollander wrote: > >> > >> > >> On 4/12/21 5:46 PM, Dan van der Ster wrote: > >>> Hi all, > >>> > >>> bdev_enable_discard has been in ceph for several major releases now > >>> but it is still off by default. > >>> Did anyone try it recently -- is it safe to use? And do you have perf > >>> numbers before and after enabling? > >>> > >> I have done so on SATA SSDs in a few cases and: it worked > >> > >> Did I notice a real difference? Not really. > >> > > Thanks, I've enabled it on a test box and am draining data to check > > that it doesn't crash anything. > > > >> It's highly debated if this still makes a difference with modern flash > >> devices. I don't think there is a real conclusion if you still need to > >> trim/discard blocks. > > Do you happen to have any more info on these debates? As you know we > > have seen major performance issues on hypervisors that are not running > > a periodic fstrim; we use similar or identical SATA ssds for HV local > > storage and our block.db's. If it doesn't hurt anything, why wouldn't > > we enable it by default? > > > There's some good discussion in the original PR: > > > https://github.com/ceph/ceph/pull/14727 > > > I suspect that the primary concerns for enabling it by default are two > fold: (1) the issue of having to maintain a blocklist for buggy firmware > implementations (2) Even "good" firmware can potentially see slowdowns > with bursts of trim commands due to needing to update the FTL metadata > per this comment: > > > https://github.com/ceph/ceph/pull/14727#issuecomment-342399578 > > > The original issue of how to decide between online discard, periodic > bulk discard, or no discard is still an issue imho. I think we probably > need to get more feedback from people with real large deployments (hint > hint :D) before we enable online discard by default. Thanks for the links. And further to those I found the attempt at a periodic discard: https://github.com/ceph/ceph/pull/20723 Igor posted some performance numbers there for online and periodic, both of which seem not very promising. And I didn't find any further work on periodic discard for bitmap or beyond. Since the runtime performance impact of this looks unpredictable, maybe a conservative way to resume this work would be to allow discard via the offline bluestore tooling? Cheers, Dan > > > Mark > > > > > > Cheers, Dan > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: has anyone enabled bdev_enable_discard?
On 13/04/2021 11:07, Dan van der Ster wrote: On Tue, Apr 13, 2021 at 9:00 AM Wido den Hollander wrote: On 4/12/21 5:46 PM, Dan van der Ster wrote: Hi all, bdev_enable_discard has been in ceph for several major releases now but it is still off by default. Did anyone try it recently -- is it safe to use? And do you have perf numbers before and after enabling? I have done so on SATA SSDs in a few cases and: it worked Did I notice a real difference? Not really. Thanks, I've enabled it on a test box and am draining data to check that it doesn't crash anything. It's highly debated if this still makes a difference with modern flash devices. I don't think there is a real conclusion if you still need to trim/discard blocks. Do you happen to have any more info on these debates? As you know we have seen major performance issues on hypervisors that are not running a periodic fstrim; we use similar or identical SATA ssds for HV local storage and our block.db's. If it doesn't hurt anything, why wouldn't we enable it by default? These debates are more about if it really makes sense with modern SSDs as the performance gain seems limited. With older (SATA) SSDs it might, but with the modern NVMe DC-grade ones people are doubting if it is still needed. SATA 3.0 also had the issue that the TRIM command was a blocking command where with SATA 3.1 it became async and thus non-blocking. With NVMe it is a different story again. I don't have links or papers for you, it's mainly stories I heard on conferences and such. Wido Cheers, Dan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io