[ 
https://issues.apache.org/jira/browse/HDDS-11256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870583#comment-17870583
 ] 

Ethan Rose commented on HDDS-11256:
-----------------------------------

Yes feel free to add a proposal and design document if you want and others are 
interested. It's a good discussion to have, I just wanted to bring up key 
versioning as well since it is related.
{quote}I think this might require OM to know which type of client sends the 
deletion (e.g. to handle deletion though S3 for FSO bucket).
{quote}
In ofs, trash is actually implemented on the client side, meaning when the 
client is given a delete command it actually transforms this to a trash rename 
command and sends that to the OM/Namenode. This comes from the HDFS 
implementation which does it this way as well. If we add key versioning for OBS 
only to start out it would not conflict, since OBS buckets cannot be accessed 
through ofs. If issuing a delete through S3 gateway to an FSO bucket the s3 
gateway client could still use the trash on OM. This doesn't happen currently 
but it is doable.
{quote}Additionally, I think S3 object versioning has not been fully 
implemented in OM due to various reasons
{quote}
Yes the original implementation of adding each key version to the value proto 
bloated the protos way too much. For proper key versioning we would need a new 
implementation, likely where each key is a new RocksDB entry. Again I think 
only implementing this for OBS will be simpler and still solve most or even all 
of the problem. The interaction between key versioning and trash in ofs would 
likely be too complex for users to manage.
{quote}seems that S3 object versioning is enabled per bucket instead of per 
cluster
{quote}
Yes both trash and likely key versioning would only exist within their 
respective buckets. This is because a single move or delete request is confined 
within a bucket. If there is a use case for global cluster recovery then that 
would be a different use case that trash and key versioning does not solve. I 
don't think jobs/operations spanning multiple buckets are common so I'm not 
sure how much users would benefit from a global recovery option.

 

One other thing to consider is ease of use. This feature would be unique to 
Ozone, while being similar to existing features like trash that users are 
already familiar with. I think it would be confusing in this sense. Contrast 
this with key versioning for OBS buckets. Then S3 users get the delete 
protection they are used to (versioning, snapshots as a bonus) and HDFS users 
get the delete protection they are used to (trash and snapshots).

> OM Key Trash Feature
> --------------------
>
>                 Key: HDDS-11256
>                 URL: https://issues.apache.org/jira/browse/HDDS-11256
>             Project: Apache Ozone
>          Issue Type: New Feature
>          Components: OM
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Context: Currently Ozone supports Trash feature with similar implementation 
> as HDFS trash feature. However, this only protects large scale deletion 
> through Hadoop client (e.g. "hadoop fs -rm -r"). However, other deletions 
> using -skipTrash or through S3 are not protected by this trash.
> We are currently working on the implementation in our internal cluster. We 
> started with idea of DN block trash (similar to HDFS-12996), but realized 
> that this requires changes in all Ozone components and the complexity will be 
> very high. The final design implements an OM-based solution that resembles 
> HDDS-2416 (Ozone Trash Feature):
>  * There is a separate "Trash Table" that hold the deleted keys
>  * There will be a background service that checks the "Trash Table" for 
> deleted keys older than a certain expiry threshold and move them to the 
> deletedTable for normal deletion (a trash cleanup service)
>  * In the event of large accidental deletions, an admin can call a "recover" 
> request which will query the trash tables and return it back to the original 
> keys
>  * listTrash commands to list keys under the trash
> However, there are also some planned implementation differences that can be 
> covered in a design document, which includes things like:
>  * Another table which stores the modificationTime as RDB key for faster DB 
> traversal
>  * Recovery request starts a stateful background service on OM (similar to 
> ContainerBalancer) to handle large amount of keys
>  * Enabling the trash on runtime using a new OM request which uses Ratis 
> transaction to ensure consistency of the OM DB. This will set a flag in OM 
> metaTable that is used by OM to decide whether to use normal OM deletion or a 
> "trash" deletion
>  * Runtime reconfigurability of trash cleanup service parameters
> If the community members are interested and see the need for this feature, I 
> can come up with a more complete design document.
> As I understand, Ozone already supports snapshot which is able to protect 
> against accidental deletion, thus there might be a lot of overlaps. Due to 
> these overlaps, it might not make sense to have this trash feature.
> Edit: There are some disadvantages of using Ozone snapshot as a strategy to 
> handle large scale deletion, which will be addressed in this feature
>  * Snapshot needs to be triggered by the operator (e.g. with cron jobs), but 
> we might extend this to make background snapshots.
>  * Snapshot does not capture the most recent set of changes that were just 
> deleted
>  ** Some user keys will be unreachable after recovering to the last snapshot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to