Hi Andrei,

Now, with large repos that spill over in aws s3 or azure storage it not
> practical to backup terabytes of data each day for various reasons (costs
> and time to backup/restore amount them), and I was fiddling with the idea
> of setting up the datastore in such way that deletion is prevented - I mean
> deletion from the s3 bucket/azure storage container.


I tried something similar a while ago and I used s3 bucket versioning [1].
It allows you to do point in time restores without forcing the "prevent
deletion policy". Is it suitable for your use case?


> This way backup would be a matter of backing up the segment store - thats
> a matter of a couple of gigs of data, and restore would be pretty much the
> same.


No matter what approach you use for backup/restore, the backup of the
segment store should come first (before datastore) to avoid any
inconsistencies with binaries referenced in the segment store. It would
probably be good if the backup doesn't contain the index data to avoid
possible corruptions.

HTH,
Andrei

[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html

2017-07-03 17:47 GMT+03:00 Andrei Kalfas <akal...@adobe.com.invalid>:

> Hi Andrei,
>
> Ok, here is the context, apologize for not starting with that first.
>
> I’m working on a PTR (point in time restore) proof of concept involving
> large asset repositories. The easy way for small-ish repos would be to
> backup everyday everything and when the customer pops by and says “I want
> everything back to day YYY” just restore the file systems from backups and
> thats it. Now, with large repos that spill over in aws s3 or azure storage
> it not practical to backup terabytes of data each day for various reasons
> (costs and time to backup/restore amount them), and I was fiddling with the
> idea of setting up the datastore in such way that deletion is prevented - I
> mean deletion from the s3 bucket/azure storage container. This way backup
> would be a matter of backing up the segment store - thats a matter of a
> couple of gigs of data, and restore would be pretty much the same. The
> problem is that not allowing to delete things from s3 bucket/azure storage
> container is that one will pack a lot of garbage over the time, so I’ll
> need a way to figure out whats garbage so that I can move that data into
> cheeper storages and eventually delete it completely. This is why I was
> fishing for a easy way to get the inverted list that I mentioned bellow.
>
> Thanks,
> Andrei
>
>
> > On Jul 3, 2017, at 4:47 PM, Andrei Dulceanu <andrei.dulce...@gmail.com>
> wrote:
> >
> > Hi Andrei,
> >
> > AFAIK, there isn't currently such an option for the consistency check.
> What
> > scenario do you have in mind for using it?
> >
> > Regards,
> > Andrei
> >
> > 2017-07-03 16:32 GMT+03:00 Andrei Kalfas <akal...@adobe.com.invalid>:
> >
> >> Hi,
> >>
> >> I’m reading about the consistency check tool thats available via oak-run
> >> and if I got it right its gonna report missing blobs that are
> referenced.
> >> Is there a way to get the inverted list, i.e. things that are in the
> >> datastore but not referenced from the segment store.
> >>
> >> Thank you,
> >> Andrei
> >>
> >>
>
>

Reply via email to