I think it looks good, Peng Hui. Nice work!

I like the API shape, and the implementation looks pretty small and
easy so far. Bonus points for using the HCA to hopefully get some
performance improvement from smaller keys overall. That was Paul's
original idea all along I believe.

Was wondering about a few minor API nits:

 * Maybe use `timestamp` instead of `deleted_when` since we used
`timestamp` in the rest of the API description?

 * Our db instances have unique a `uuid` (instance id) attribute
internally, we just don't surface it in the API. So when we re-create
a db with the same name it gets a new `uuid`. I could see using that
to identify individual deleted db instances when we restore them, as
opposed to using timestamps:  `/{db}/_restore/{DbUuid}`. However,
because we don't already surface that attribute in the API it would be
a bit more noise too... So I think that argues for keeping timestamp
as the id, but thought I'd mentioned and see if others have thoughts
on it anyway.

Concerning the backup implementation. I think that's still an option!
In other words the soft deletion API can still be the same, and
eventually, once we get backup implemented, soft-deleted instances
could immediately (or transparently in the background) become backups.
Users might just see an extra metadata rows in `_deleted_dbs_info`
something like "backed to blobstore foo as 1 day ago .... So they know
restoring it won't be a single transaction but might take a while.

FDB backup does have a local `file://</absolute/path/to/base_dir>`
option for URLs [1] so that might be useful in embedded scenarios. And
someone has probably created some sort of local filesystem S3 shim (S0
;-) ) we could adapt perhaps....

Cheers,
-Nick

[1] https://apple.github.io/foundationdb/backups.html#backup-urls

On Wed, Mar 18, 2020 at 5:06 PM Paul Davis <paul.joseph.da...@gmail.com> wrote:
>
> Alex,
>
> The first con I see for that approach is that its not soft-deletion.
> Its actual deletion with an API for restoration. Which, fair enough,
> is probably a feature we should consider supporting for CouchDB
> installations that are based on FoundationDB.
>
> The second major con is that it relies on CouchDB being based on
> FoundationDB. Part of CouchDB's design philosophy is that the internet
> may or may not exist, and if it does exist that it may or may not be
> reliable. There are lots of deployments of CouchDB that are part of a
> desktop application or POS installation that may see internet only
> periodically if at all so an S3 backup solution is out. There also may
> come a time that there's a flavor of CouchDB that uses LevelDB or
> SQLite or FDBLite (I just made that up, any idea how hard it'd be?)
> for these sorts of embedded deployments where fdbrestore/fdbbackup
> wouldn't be feasible.
>
> Then the last major con I see is the time-to-restore disparity. With
> soft-deletion restoration is a few milliseconds. Streaming from S3
> will obviously depend on the size of the database and obviously be
> orders of magnitude longer.
>
> On the pro side for the soft-delete on FoundationDB is that the first
> draft of the RFC is 108 lines [1]. We obviously can't say for sure how
> big or involved the fdbrestore approach would be but I think we'd all
> agree it'd be bigger.
>
> Paul
>
> [1] https://github.com/apache/couchdb/pull/2666
>
>
> On Wed, Mar 18, 2020 at 2:31 PM Alex Miller
> <alexmil...@apple.com.invalid> wrote:
> >
> > Let me perhaps paint an alternative RFC:
> >
> > 1) `DELETE /{db}`
> >
> > If soft-deletion is enabled, delete the database subspace, and also record 
> > into ?DELETED_DBS the timestamp of the commit and the database subspace 
> > prefix
> >
> > 2) `GET /{db}/_deleted_dbs_info`
> >
> > Return the timestamp (and whatever other info one should record) of deleted 
> > databases.
> >
> > 3) `PUT /{db}/_restore/{deletedTS}`
> >
> > Invoke `fdbrestore -k` to do a key range restricted restore into the 
> > current cluster of the deleted subspace prefix at versionstamp-1.  Wait for 
> > it to complete, and return 200 when completed.
> >
> > And this would all rely on having a continuous backup configured and 
> > running that would hold a minimum of 48 hours of changes.
> >
> >
> > Now, I don’t actually deal with backups often so my memory on current 
> > caveats is a bit fuzzy.  I think there might be a couple complications here 
> > that I’ve missed, like…
> > * There not being key range restricted locking of the database
> > * A key range restore is currently suboptimal in that it doesn’t do obvious 
> > filtering that it could to cut down on the amount of data it reads
> >
> > But, neither of these seem heavily blocking, as they could be tackled 
> > quickly, particularly if you leverage some upstream relationships ;).  
> > Backup and restore has been the general answer to accidental data deletion 
> > (or corruption) on FDB, and I could paint some attractive looking pros of 
> > this approach: backup files are more disk space efficient, soft deleted 
> > data could be offloaded to an S3-compatible store, it would be free if FDB 
> > is already configured to take backups.  I was just curious to hear a bit 
> > more detail on your/Peng’s side of the reasons for preferring to build soft 
> > deletion on top of FDB (and thus have also intentionally withheld more of 
> > the cons of this approach, or the pros of yours).
> >
> > > On Mar 18, 2020, at 11:59, Paul Davis <paul.joseph.da...@gmail.com> wrote:
> > >
> > > Alex,
> > >
> > > All joking aside, soft-deletion's target use case is accidental
> > > deletions. This isn't a replacement for backup/restore which will
> > > still happen for all the usual reasons.
> > >
> > > Paul
> > >
> > > On Wed, Mar 18, 2020 at 1:42 PM Paul Davis <paul.joseph.da...@gmail.com> 
> > > wrote:
> > >>
> > >> On Wed, Mar 18, 2020 at 1:29 PM Alex Miller
> > >> <alexmil...@apple.com.invalid> wrote:
> > >>>
> > >>>
> > >>>> On Mar 18, 2020, at 05:04, jiangph <jiangpeng...@hotmail.com> wrote:
> > >>>>
> > >>>> Instead of automatically and immediately removing data and index in 
> > >>>> database after a delete operation, soft-deletion allows to restore the 
> > >>>> deleted data back to original state due to a “fat finger”or undesired 
> > >>>> delete operation, up to defined periods, such as 48 hours.
> > >>>>
> > >>>> In CouchDB 3.0, soft-deletion of database is implemented in [1]. The 
> > >>>> .couch file is renamed with the .<timestamp>.deleted.couch file after 
> > >>>> soft-deletion is enabled, and such file can be changed back to .couch 
> > >>>> for the purpose of restore. If restore is not needed and some 
> > >>>> specified period passed, the .<timestamp>.deleted.couch file can be 
> > >>>> deleted to achieve deletion of database permanently.
> > >>>>
> > >>>> In CouchDB 4.0, with the introduction of FoundationDB, the data model 
> > >>>> and storage is changed. In order to support soft-deletion, we propose 
> > >>>> below solution and then implement them.
> > >>>
> > >>>
> > >>>
> > >>> I’ve sort of hand waved some answers to this in my head, but would you 
> > >>> mind expanding a bit on the advantages of keeping soft-deleted data in 
> > >>> FoundationDB as opposed to actually deleting it and relying on 
> > >>> FoundationDB’s backup and restore to recover it if needed?
> > >>
> > >> From: Panicked User
> > >> To: Customer Support
> > >> Subject: URGENT! EMERGENCY DATABASE RESTORE!
> > >>
> > >> Dear,
> > >>
> > >> I have accidentally deleted my Very Important Database and need to
> > >> have it restored ASAP! Without this mission critical database my
> > >> company is completely offline which is costing $1B an hour!!!!!
> > >>
> > >> Please respond ASAP!
> > >>
> > >> Sincerely,
> > >> Panicky McPanics
> >

Reply via email to