ivandika3 commented on PR #9932: URL: https://github.com/apache/ozone/pull/9932#issuecomment-4071713976
Thanks @chungen0126 for the review. > Performance Impact: I’m concerned we are sacrificing consistency for a minimal benefit. Is there any production data showing that this change actually optimizes the system?" We have two performance issues and large incidents because of this behavior - We have a large bucket which contains a lot of keys (hundreds of millions) and we schedule lifecycle configuration to delete most of the keys, this generates millions of tombstones that were not compacted by RocksDB in a timely manner - The OM trash emptier that runs periodically and calls this getFileStatus on all the buckets `.Trash/` directory, it blocks when processing this large bucket for around 5 minutes, during this time the whole OM is stuck due to lock contention mentioned in the description - We had a recent issue where one user is trying to access `.Trash/` although the trash directory has not been created yet (since no user of the bucket has deleted with hadoop trash). This causes lock to be held of around few hours and RocksDB iterator metrics shows that there were hundreds of millions of keys skipped. I understand that we have a periodic compaction that will compact the large keyTable, but this is run every few hours and by that time there might already be a large number of tombstones. > Consistency Trade-off: Moving createFakeDirIfShould out the lock might compromise the atomicity of the operation. If we proceed with this, we should at least consider making it a configurable option so users can choose between performance and strict consistency. I understand the concern and since the fake dir logic is based on the limitation of the OBS/LEGACY flat namespace issue, we had to contend with it. Anyway, this is a legacy behavior which should not be used for new buckets. The long term is to migrate to OBS and FSO buckets to prevent this issues. Please let me know if you have any suggestions for this. I'm OK if the community decides not to go ahead with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
