Github user davisp commented on the pull request:
https://github.com/apache/couchdb-couch/pull/161#issuecomment-211596768
So thinking through this I think we may be conflating two different
features that while similar are serving different purposes.
Specifically, there's one feature where we rename things into the .delete
directory so that we can get rid of them quickly as well as manage cleanup when
the server crashes during deletions. This feature has been in CouchDB for a
long time dating back years.
The second feature was a small thing that Cloudant added to help manage
recovering from accidental deletions. This feature was implemented by just
changing the delete to rename a .couch file to a version that includes the
deletion timestamp. This allowed operators to recover a "deleted" database for
clients.
This PR appears to attempt to flatten both of these features by renaming
files into the .delete directory with useful names and makes the actual
deletion a bit subtler in that the various renames are munged and so its not
100% obvious which feature we're handling in the various functions named
delete/nuke_dir/delete_file/etc.
As a follow on, the renaming here also tries to move away from using a UUID
in the .delete directory and instead uses a URL encoded path of the deleted
file (relative to the root database directory). On first glance this seemed
like a good idea, but researching filename length limits it'd be very easy to
break this as a user only needs to create a database with > 255 characters on
extX filesystems (and shorter on others).
What I think we should consider is that all of our deletions move the file
to the .delete directory mirroring the original filesystem hierarchy. Then our
existing "rename_on_delete" feature is relabeled as "delete_after_rename" with
a default of true. This allows us to enable a sysadmin approach to recovering
deleted databases as well as enables sysadmins to institute their own policies
for actual deletion.
As a last note, I'd also like to see all of the file system/deletion logic
moved to couch_file. Its quite awkward to have it split between couch_server
and couch_file. I'd work on structuring the commits to have one that moves the
existing logic to couch_file and then a subsequent commits to do the filesystem
hierarchy mirroring.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---