[ https://issues.apache.org/jira/browse/HDDS-11256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870583#comment-17870583 ]
Ethan Rose commented on HDDS-11256: ----------------------------------- Yes feel free to add a proposal and design document if you want and others are interested. It's a good discussion to have, I just wanted to bring up key versioning as well since it is related. {quote}I think this might require OM to know which type of client sends the deletion (e.g. to handle deletion though S3 for FSO bucket). {quote} In ofs, trash is actually implemented on the client side, meaning when the client is given a delete command it actually transforms this to a trash rename command and sends that to the OM/Namenode. This comes from the HDFS implementation which does it this way as well. If we add key versioning for OBS only to start out it would not conflict, since OBS buckets cannot be accessed through ofs. If issuing a delete through S3 gateway to an FSO bucket the s3 gateway client could still use the trash on OM. This doesn't happen currently but it is doable. {quote}Additionally, I think S3 object versioning has not been fully implemented in OM due to various reasons {quote} Yes the original implementation of adding each key version to the value proto bloated the protos way too much. For proper key versioning we would need a new implementation, likely where each key is a new RocksDB entry. Again I think only implementing this for OBS will be simpler and still solve most or even all of the problem. The interaction between key versioning and trash in ofs would likely be too complex for users to manage. {quote}seems that S3 object versioning is enabled per bucket instead of per cluster {quote} Yes both trash and likely key versioning would only exist within their respective buckets. This is because a single move or delete request is confined within a bucket. If there is a use case for global cluster recovery then that would be a different use case that trash and key versioning does not solve. I don't think jobs/operations spanning multiple buckets are common so I'm not sure how much users would benefit from a global recovery option. One other thing to consider is ease of use. This feature would be unique to Ozone, while being similar to existing features like trash that users are already familiar with. I think it would be confusing in this sense. Contrast this with key versioning for OBS buckets. Then S3 users get the delete protection they are used to (versioning, snapshots as a bonus) and HDFS users get the delete protection they are used to (trash and snapshots). > OM Key Trash Feature > -------------------- > > Key: HDDS-11256 > URL: https://issues.apache.org/jira/browse/HDDS-11256 > Project: Apache Ozone > Issue Type: New Feature > Components: OM > Reporter: Ivan Andika > Assignee: Ivan Andika > Priority: Major > > Context: Currently Ozone supports Trash feature with similar implementation > as HDFS trash feature. However, this only protects large scale deletion > through Hadoop client (e.g. "hadoop fs -rm -r"). However, other deletions > using -skipTrash or through S3 are not protected by this trash. > We are currently working on the implementation in our internal cluster. We > started with idea of DN block trash (similar to HDFS-12996), but realized > that this requires changes in all Ozone components and the complexity will be > very high. The final design implements an OM-based solution that resembles > HDDS-2416 (Ozone Trash Feature): > * There is a separate "Trash Table" that hold the deleted keys > * There will be a background service that checks the "Trash Table" for > deleted keys older than a certain expiry threshold and move them to the > deletedTable for normal deletion (a trash cleanup service) > * In the event of large accidental deletions, an admin can call a "recover" > request which will query the trash tables and return it back to the original > keys > * listTrash commands to list keys under the trash > However, there are also some planned implementation differences that can be > covered in a design document, which includes things like: > * Another table which stores the modificationTime as RDB key for faster DB > traversal > * Recovery request starts a stateful background service on OM (similar to > ContainerBalancer) to handle large amount of keys > * Enabling the trash on runtime using a new OM request which uses Ratis > transaction to ensure consistency of the OM DB. This will set a flag in OM > metaTable that is used by OM to decide whether to use normal OM deletion or a > "trash" deletion > * Runtime reconfigurability of trash cleanup service parameters > If the community members are interested and see the need for this feature, I > can come up with a more complete design document. > As I understand, Ozone already supports snapshot which is able to protect > against accidental deletion, thus there might be a lot of overlaps. Due to > these overlaps, it might not make sense to have this trash feature. > Edit: There are some disadvantages of using Ozone snapshot as a strategy to > handle large scale deletion, which will be addressed in this feature > * Snapshot needs to be triggered by the operator (e.g. with cron jobs), but > we might extend this to make background snapshots. > * Snapshot does not capture the most recent set of changes that were just > deleted > ** Some user keys will be unreachable after recovering to the last snapshot -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org