Hey all, I've put replies to everyone's questions below. Hope they help!
> Do the shard metadata files list all of the segments that make up the backup, > or only the segments that were uploaded in this incremental update? Mike: The former - they're intended to hold metadata about all of the segments that are needed to restore to the given snapshot/commit-point. So it's likely to hold metadata about files just uploaded, as well as ones that were added to the blob by previous backups. I'll see if I can make that clearer in the file descriptions. > leave the old Backup/Restore API as-is, deprecated, and add a new one on > /v2/cluster/backup Jan: Ultimately I agree with your concerns about scope, so I'd vote against trying to cover TRAs, multiple collection backups, etc. in this effort here. That aside though, I agree that the existing v2 backup API is a bit of a headscratcher. Why is it /v2/collections instead of /v2/collections/<collectionName> or a subpath of /v2/cluster? Does it have something to do with aliases? Or did it end up there mostly by default? I'd be open to creating a new v2 backup endpoint (without adding TRA, etc. compatibility) if there was consensus on that approach to handling backcompat and on the specific appearance of the API. It would help with backcompat after all. Though if finding consensus bogs down it may not be worth the addition. > I know you've seen SOLR-15051 (Shared storage -- BlobDirectory) ... We both > want to store checksums and file lengths. ... Your proposal did not discuss > how these files are GC'ed David: SIP-12 does address this, though maybe the writeup needs clarifying. The Delete Backup API includes a "purge" parameter which triggers GC activity. This probably works about the way you'd expect - Solr gets the list of UUID-named index files from the blob store, and then it compares that list to the set of UUID's referenced by any shard-metadata file (which requires reading all the shard-metadata files). This avoids adding to Solr's ZK state, but does so at the cost of requiring users to trigger sporadic cleanup manually instead of detecting orphans automatically like BlobDirectory does (assuming I understand that correctly). I'm def not saying this is the best approach necessarily. I like it, though it has downsides for sure. Just that there is a proposed approach that's easy to miss buried in the SIP. More broadly though - I share your sense that we should consider alignment. It may end up that Backup/Restore is different enough from the BlobDirectory usecase that it doesn't make sense, but it's at least worth figuring out. That's about as far as my understanding goes right now though. I'll read up on BlobDirectory while you absorb SIP-12 and maybe we can circle back to this shortly. Best, Jason On Sun, Jan 10, 2021 at 7:20 AM Jan Høydahl <jan....@cominvent.com> wrote: > > Jason, Shalin and Dat, thanks for the thorough work. This is an example for > other SIPs to follow! > > > I've also amended the backcompat/migration section to mention Jan's > > suggestion that the "incremental" features be exposed in the v2 API > > only. Though it's unclear to me whether that's still something people > > want since it turns out that we'll still have backcompat concerns with > > the existing v2 backup/restore APIs. So I've held off from > > removing/replacing the original plan. > > Since we already have v2 for the existing backup API, I guess my suggestion > is not that 'clean' after all. > > Another approach would be to leave the old Backup/Restore API as-is, > deprecated, and add a new one on /v2/cluster/backup, with support for backing > up multiple collections in one go, or backup a TRA alias with hundreds of > concrete "sub" collections. But as I write these words I imagine it probably > is way outside the scope for this SIP which is large enough. Anyone even > tried to backup a TRA with today's API? > > Jan > > > 5. jan. 2021 kl. 15:55 skrev Jason Gerlowski <gerlowsk...@gmail.com>: > > > > Hey, Happy New Year everybody. > > > > Some SIP updates based on the discussion above: > > > > I added v2 examples for each API to the SIP. Feedback welcome, > > especially on the v2 APIs that are net-new to this proposal (namely: > > "list backups" and "delete backup"). > > > > I've also amended the backcompat/migration section to mention Jan's > > suggestion that the "incremental" features be exposed in the v2 API > > only. Though it's unclear to me whether that's still something people > > want since it turns out that we'll still have backcompat concerns with > > the existing v2 backup/restore APIs. So I've held off from > > removing/replacing the original plan. > > > > Link for convenience: > > https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore > > > > Best, > > > > Jason > > > > > > On Thu, Dec 24, 2020 at 8:11 AM Jan Høydahl <jan....@cominvent.com> wrote: > >> > >> Ok, that’s the one I was looking for, it’s not documented in the backup > >> chapter of ref-guide :( > >> > >> Jan Høydahl > >> > >>> 23. des. 2020 kl. 17:10 skrev Jason Gerlowski <gerlowsk...@gmail.com>: > >>> > >>> > >>>> > >>>> We have a path alias to the old API ... but we don’t have a true v2 API > >>>> spec for it, do we? > >>> > >>> Tbh I'm not yet familiar enough with the v2 APIs to understand the > >>> distinction you're making. (Do you have a pointer to something that'd > >>> fill me in?) > >>> > >>> To zoom in on "backup" as an example, the v2 API I'm referring to > >>> looks like: /v2/collections" -d '{ "backup-collection": > >>> {"collection": "books", "name": "asdf3", "location": "/tmp/foo"}}'. > >>> And it's included in the v2 "introspect" documentation returned by > >>> this API: /v2/collections/_introspect?command=backup-collection". To > >>> me that looked like a v2 API, but maybe path-aliases are also covered > >>> in the introspect docs. > >>> > >>> Jason > >>> > >>>> On Wed, Dec 23, 2020 at 10:29 AM Jan Høydahl <jan....@cominvent.com> > >>>> wrote: > >>>> > >>>> Actually, don’t think we do have a v2 Backup/Restore API. We have a path > >>>> alias to the old API which takes GET ...&action=backup... but we don’t > >>>> have a true v2 API spec for it, do we? Where is that documented? > >>>> > >>>> Jan Høydahl > >>>> > >>>>>> 22. des. 2020 kl. 18:04 skrev Jason Gerlowski <gerlowsk...@gmail.com>: > >>>>> > >>>>> Hey guys, > >>>>> > >>>>> Following up to make sure I understand the specifics you're > >>>>> suggesting. You're proposing that: > >>>>> > >>>>> 1. The brand new backup-related APIs (list-backups and delete-backup) > >>>>> be added in v2-form only. > >>>>> 2. Tweaks to existing backup-related APIs (create-backup, restore) be > >>>>> made in V2-form only. > >>>>> 3. All existing v1 backup-related APIs be deprecated and left > >>>>> unchanged. Incremental backups will not be possible using the v1 API. > >>>>> > >>>>> I'm not against going this route if there's consensus around it. But > >>>>> I'm not 100% clear on how it means we don't need to worry about > >>>>> backcompat. Backup and Restore currently exist as both a v1 and a v2 > >>>>> API - I understand how leaving the v1 APIs untouched (other than > >>>>> deprecation) frees us of some backcompat concerns there, but we would > >>>>> still need to make tweaks to the v2 backup/restore APIs and would have > >>>>> to tread just as carefully there in terms of backcompat, afaict. > >>>>> Unless Solr's backcompatibility guarantees only cover the v1 API and > >>>>> leave v2 changes to be made freely? I looked around to see if the v2 > >>>>> APIs had any sort of "experimental" designation, but couldn't find > >>>>> that clearly stated anywhere. Am I missing something? > >>>>> > >>>>> Best, > >>>>> > >>>>> Jason > >>>>> > >>>>>> On Tue, Dec 22, 2020 at 2:49 AM Noble Paul <noble.p...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> , and implement the new imporved version as a V2-api only, and then > >>>>>>> deprecate the v1 API? > >>>>>> > >>>>>> > >>>>>> V2 only please > >>>>>> > >>>>>>> On Tue, Dec 22, 2020 at 1:34 AM Jason Gerlowski > >>>>>>> <gerlowsk...@gmail.com> wrote: > >>>>>>> > >>>>>>> Hey Jan, thanks for the review. > >>>>>>> > >>>>>>> I hadn't thought about the V2 API in connection to this work. You're > >>>>>>> right though I think - the SIP proposes net-new APIs, so it should add > >>>>>>> V2 equivalents at the very least. I'll draft tentative details for > >>>>>>> these APIs on the SIP and we can refine things from there. > >>>>>>> > >>>>>>> I'm more up in the air on your specific suggestion to restrict the SIP > >>>>>>> changes to these v2 APIs. It is an elegant approach to the > >>>>>>> backcompat, and it provides a carrot for v2 adoption - both of which I > >>>>>>> like. But it would let users create snapshot-based backups (and keep > >>>>>>> us maintaining that code) longer than there's any strict need to. And > >>>>>>> users are left on the less-efficient format by default. (By contrast, > >>>>>>> the current SIP has snapshot-backup creation being replaced by > >>>>>>> incremental-backup creation as soon as the latter is available.). Did > >>>>>>> you have a particular lifespan in mind for snapshot-based creation if > >>>>>>> we go with this approach? > >>>>>>> > >>>>>>> Jason > >>>>>>> > >>>>>>> On Thu, Dec 17, 2020 at 3:54 PM Jan Høydahl <jan....@cominvent.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Much needed! Thanks for initiating this Jason! > >>>>>>>> > >>>>>>>> As we want to move away from v1 APIs where a HTTP GET is used for > >>>>>>>> creation and deletion, would it be an idea to leave the old > >>>>>>>> backup/resotre APIs as-is, and implement the new imporved version as > >>>>>>>> a V2-api only, and then deprecate the v1 API? Then we don't need to > >>>>>>>> worry about back-compat, and we get a head-start on converting the > >>>>>>>> COLLECTION API to v2 style. > >>>>>>>> > >>>>>>>> Jan > >>>>>>>> > >>>>>>>>> 15. des. 2020 kl. 15:48 skrev Jason Gerlowski > >>>>>>>>> <gerlowsk...@gmail.com>: > >>>>>>>>> > >>>>>>>>> Hey all, > >>>>>>>>> > >>>>>>>>> This morning I published SIP-12, which proposes an overhaul of > >>>>>>>>> Solr's > >>>>>>>>> backup and restore functionality. While the "headline" improvement > >>>>>>>>> in > >>>>>>>>> this SIP is a change to do backups incrementally, it bundles in a > >>>>>>>>> number of other improvements as well, including the addition of > >>>>>>>>> corruption checks, APIs to list and delete backups, and stronger > >>>>>>>>> integration points with popular object storage APIs. > >>>>>>>>> > >>>>>>>>> The SIP can be found here: > >>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore > >>>>>>>>> > >>>>>>>>> Please read the SIP description and come back here for discussion. > >>>>>>>>> As > >>>>>>>>> the discussion progresses we will update the SIP page with any > >>>>>>>>> outcomes and eventually move things to a VOTE. > >>>>>>>>> > >>>>>>>>> Looking forward to hearing your feedback. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> > >>>>>>>>> Jason > >>>>>>>>> > >>>>>>>>> --------------------------------------------------------------------- > >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>>>>>> > >>>>>>> > >>>>>>> --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> ----------------------------------------------------- > >>>>>> Noble Paul > >>>>>> > >>>>>> --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org