Hey all, Two follow ups on recent discussion.
I reviewed the gc/ref-counting part of the BlobDirectory proposal on SOLR-15051 that David mentioned. We talked about it a bit offline and agreed that while an automatic gc mechanism is really needed for what he's trying to do, the requirements of the backup usecase are different enough that SIP-12 can get by with manually-triggered 'purging'. Mostly because infrequent static backups produce much less garbage than continually tracking all files for a (possibly ever-changing) index. > I'd be open to creating a new v2 backup endpoint (without adding TRA, etc. > compatibility) if there was consensus on that approach to handling backcompat > and on the specific appearance of the API On second thought, I'm going to flip-flop on this. Coming up with a better v2 API for backup/restore will be easier *after* some of the questions Jan raised (multi-collection? alias support? etc.) have been dealt with. i.e. It's tough to decide between /v2/cluster/backups and /v2/collections/<collectionName>/backups as alternatives until you figure out whether we currently support multi-collection backup, or want to in the near future. If people feel strongly or would veto the vote otherwise, then I'll try my best. But otherwise I think we're best served waiting until other stuff settles out to revisit larger v2 backup API changes. Best, Jason On Mon, Jan 11, 2021 at 10:41 AM Jason Gerlowski <gerlowsk...@gmail.com> wrote: > > Hey all, > > I've put replies to everyone's questions below. Hope they help! > > > Do the shard metadata files list all of the segments that make up the > > backup, or only the segments that were uploaded in this incremental update? > > Mike: The former - they're intended to hold metadata about all of the > segments that are needed to restore to the given > snapshot/commit-point. So it's likely to hold metadata about files > just uploaded, as well as ones that were added to the blob by previous > backups. I'll see if I can make that clearer in the file > descriptions. > > > leave the old Backup/Restore API as-is, deprecated, and add a new one on > > /v2/cluster/backup > > Jan: Ultimately I agree with your concerns about scope, so I'd vote > against trying to cover TRAs, multiple collection backups, etc. in > this effort here. > > That aside though, I agree that the existing v2 backup API is a bit of > a headscratcher. Why is it /v2/collections instead of > /v2/collections/<collectionName> or a subpath of /v2/cluster? Does it > have something to do with aliases? Or did it end up there mostly by > default? I'd be open to creating a new v2 backup endpoint (without > adding TRA, etc. compatibility) if there was consensus on that > approach to handling backcompat and on the specific appearance of the > API. It would help with backcompat after all. Though if finding > consensus bogs down it may not be worth the addition. > > > I know you've seen SOLR-15051 (Shared storage -- BlobDirectory) ... We both > > want to store checksums and file lengths. ... Your proposal did not discuss > > how these files are GC'ed > > David: SIP-12 does address this, though maybe the writeup needs > clarifying. The Delete Backup API includes a "purge" parameter which > triggers GC activity. This probably works about the way you'd expect > - Solr gets the list of UUID-named index files from the blob store, > and then it compares that list to the set of UUID's referenced by any > shard-metadata file (which requires reading all the shard-metadata > files). This avoids adding to Solr's ZK state, but does so at the > cost of requiring users to trigger sporadic cleanup manually instead > of detecting orphans automatically like BlobDirectory does (assuming I > understand that correctly). > > I'm def not saying this is the best approach necessarily. I like it, > though it has downsides for sure. Just that there is a proposed > approach that's easy to miss buried in the SIP. > > More broadly though - I share your sense that we should consider > alignment. It may end up that Backup/Restore is different enough from > the BlobDirectory usecase that it doesn't make sense, but it's at > least worth figuring out. That's about as far as my understanding > goes right now though. I'll read up on BlobDirectory while you absorb > SIP-12 and maybe we can circle back to this shortly. > > Best, > > Jason > > On Sun, Jan 10, 2021 at 7:20 AM Jan Høydahl <jan....@cominvent.com> wrote: > > > > Jason, Shalin and Dat, thanks for the thorough work. This is an example for > > other SIPs to follow! > > > > > I've also amended the backcompat/migration section to mention Jan's > > > suggestion that the "incremental" features be exposed in the v2 API > > > only. Though it's unclear to me whether that's still something people > > > want since it turns out that we'll still have backcompat concerns with > > > the existing v2 backup/restore APIs. So I've held off from > > > removing/replacing the original plan. > > > > Since we already have v2 for the existing backup API, I guess my suggestion > > is not that 'clean' after all. > > > > Another approach would be to leave the old Backup/Restore API as-is, > > deprecated, and add a new one on /v2/cluster/backup, with support for > > backing up multiple collections in one go, or backup a TRA alias with > > hundreds of concrete "sub" collections. But as I write these words I > > imagine it probably is way outside the scope for this SIP which is large > > enough. Anyone even tried to backup a TRA with today's API? > > > > Jan > > > > > 5. jan. 2021 kl. 15:55 skrev Jason Gerlowski <gerlowsk...@gmail.com>: > > > > > > Hey, Happy New Year everybody. > > > > > > Some SIP updates based on the discussion above: > > > > > > I added v2 examples for each API to the SIP. Feedback welcome, > > > especially on the v2 APIs that are net-new to this proposal (namely: > > > "list backups" and "delete backup"). > > > > > > I've also amended the backcompat/migration section to mention Jan's > > > suggestion that the "incremental" features be exposed in the v2 API > > > only. Though it's unclear to me whether that's still something people > > > want since it turns out that we'll still have backcompat concerns with > > > the existing v2 backup/restore APIs. So I've held off from > > > removing/replacing the original plan. > > > > > > Link for convenience: > > > https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore > > > > > > Best, > > > > > > Jason > > > > > > > > > On Thu, Dec 24, 2020 at 8:11 AM Jan Høydahl <jan....@cominvent.com> wrote: > > >> > > >> Ok, that’s the one I was looking for, it’s not documented in the backup > > >> chapter of ref-guide :( > > >> > > >> Jan Høydahl > > >> > > >>> 23. des. 2020 kl. 17:10 skrev Jason Gerlowski <gerlowsk...@gmail.com>: > > >>> > > >>> > > >>>> > > >>>> We have a path alias to the old API ... but we don’t have a true v2 > > >>>> API spec for it, do we? > > >>> > > >>> Tbh I'm not yet familiar enough with the v2 APIs to understand the > > >>> distinction you're making. (Do you have a pointer to something that'd > > >>> fill me in?) > > >>> > > >>> To zoom in on "backup" as an example, the v2 API I'm referring to > > >>> looks like: /v2/collections" -d '{ "backup-collection": > > >>> {"collection": "books", "name": "asdf3", "location": "/tmp/foo"}}'. > > >>> And it's included in the v2 "introspect" documentation returned by > > >>> this API: /v2/collections/_introspect?command=backup-collection". To > > >>> me that looked like a v2 API, but maybe path-aliases are also covered > > >>> in the introspect docs. > > >>> > > >>> Jason > > >>> > > >>>> On Wed, Dec 23, 2020 at 10:29 AM Jan Høydahl <jan....@cominvent.com> > > >>>> wrote: > > >>>> > > >>>> Actually, don’t think we do have a v2 Backup/Restore API. We have a > > >>>> path alias to the old API which takes GET ...&action=backup... but we > > >>>> don’t have a true v2 API spec for it, do we? Where is that documented? > > >>>> > > >>>> Jan Høydahl > > >>>> > > >>>>>> 22. des. 2020 kl. 18:04 skrev Jason Gerlowski > > >>>>>> <gerlowsk...@gmail.com>: > > >>>>> > > >>>>> Hey guys, > > >>>>> > > >>>>> Following up to make sure I understand the specifics you're > > >>>>> suggesting. You're proposing that: > > >>>>> > > >>>>> 1. The brand new backup-related APIs (list-backups and delete-backup) > > >>>>> be added in v2-form only. > > >>>>> 2. Tweaks to existing backup-related APIs (create-backup, restore) be > > >>>>> made in V2-form only. > > >>>>> 3. All existing v1 backup-related APIs be deprecated and left > > >>>>> unchanged. Incremental backups will not be possible using the v1 API. > > >>>>> > > >>>>> I'm not against going this route if there's consensus around it. But > > >>>>> I'm not 100% clear on how it means we don't need to worry about > > >>>>> backcompat. Backup and Restore currently exist as both a v1 and a v2 > > >>>>> API - I understand how leaving the v1 APIs untouched (other than > > >>>>> deprecation) frees us of some backcompat concerns there, but we would > > >>>>> still need to make tweaks to the v2 backup/restore APIs and would have > > >>>>> to tread just as carefully there in terms of backcompat, afaict. > > >>>>> Unless Solr's backcompatibility guarantees only cover the v1 API and > > >>>>> leave v2 changes to be made freely? I looked around to see if the v2 > > >>>>> APIs had any sort of "experimental" designation, but couldn't find > > >>>>> that clearly stated anywhere. Am I missing something? > > >>>>> > > >>>>> Best, > > >>>>> > > >>>>> Jason > > >>>>> > > >>>>>> On Tue, Dec 22, 2020 at 2:49 AM Noble Paul <noble.p...@gmail.com> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> , and implement the new imporved version as a V2-api only, and then > > >>>>>>> deprecate the v1 API? > > >>>>>> > > >>>>>> > > >>>>>> V2 only please > > >>>>>> > > >>>>>>> On Tue, Dec 22, 2020 at 1:34 AM Jason Gerlowski > > >>>>>>> <gerlowsk...@gmail.com> wrote: > > >>>>>>> > > >>>>>>> Hey Jan, thanks for the review. > > >>>>>>> > > >>>>>>> I hadn't thought about the V2 API in connection to this work. > > >>>>>>> You're > > >>>>>>> right though I think - the SIP proposes net-new APIs, so it should > > >>>>>>> add > > >>>>>>> V2 equivalents at the very least. I'll draft tentative details for > > >>>>>>> these APIs on the SIP and we can refine things from there. > > >>>>>>> > > >>>>>>> I'm more up in the air on your specific suggestion to restrict the > > >>>>>>> SIP > > >>>>>>> changes to these v2 APIs. It is an elegant approach to the > > >>>>>>> backcompat, and it provides a carrot for v2 adoption - both of > > >>>>>>> which I > > >>>>>>> like. But it would let users create snapshot-based backups (and > > >>>>>>> keep > > >>>>>>> us maintaining that code) longer than there's any strict need to. > > >>>>>>> And > > >>>>>>> users are left on the less-efficient format by default. (By > > >>>>>>> contrast, > > >>>>>>> the current SIP has snapshot-backup creation being replaced by > > >>>>>>> incremental-backup creation as soon as the latter is available.). > > >>>>>>> Did > > >>>>>>> you have a particular lifespan in mind for snapshot-based creation > > >>>>>>> if > > >>>>>>> we go with this approach? > > >>>>>>> > > >>>>>>> Jason > > >>>>>>> > > >>>>>>> On Thu, Dec 17, 2020 at 3:54 PM Jan Høydahl <jan....@cominvent.com> > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>> Much needed! Thanks for initiating this Jason! > > >>>>>>>> > > >>>>>>>> As we want to move away from v1 APIs where a HTTP GET is used for > > >>>>>>>> creation and deletion, would it be an idea to leave the old > > >>>>>>>> backup/resotre APIs as-is, and implement the new imporved version > > >>>>>>>> as a V2-api only, and then deprecate the v1 API? Then we don't > > >>>>>>>> need to worry about back-compat, and we get a head-start on > > >>>>>>>> converting the COLLECTION API to v2 style. > > >>>>>>>> > > >>>>>>>> Jan > > >>>>>>>> > > >>>>>>>>> 15. des. 2020 kl. 15:48 skrev Jason Gerlowski > > >>>>>>>>> <gerlowsk...@gmail.com>: > > >>>>>>>>> > > >>>>>>>>> Hey all, > > >>>>>>>>> > > >>>>>>>>> This morning I published SIP-12, which proposes an overhaul of > > >>>>>>>>> Solr's > > >>>>>>>>> backup and restore functionality. While the "headline" > > >>>>>>>>> improvement in > > >>>>>>>>> this SIP is a change to do backups incrementally, it bundles in a > > >>>>>>>>> number of other improvements as well, including the addition of > > >>>>>>>>> corruption checks, APIs to list and delete backups, and stronger > > >>>>>>>>> integration points with popular object storage APIs. > > >>>>>>>>> > > >>>>>>>>> The SIP can be found here: > > >>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore > > >>>>>>>>> > > >>>>>>>>> Please read the SIP description and come back here for > > >>>>>>>>> discussion. As > > >>>>>>>>> the discussion progresses we will update the SIP page with any > > >>>>>>>>> outcomes and eventually move things to a VOTE. > > >>>>>>>>> > > >>>>>>>>> Looking forward to hearing your feedback. > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> > > >>>>>>>>> Jason > > >>>>>>>>> > > >>>>>>>>> --------------------------------------------------------------------- > > >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> --------------------------------------------------------------------- > > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > > >>>>>>>> > > >>>>>>> > > >>>>>>> --------------------------------------------------------------------- > > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -- > > >>>>>> ----------------------------------------------------- > > >>>>>> Noble Paul > > >>>>>> > > >>>>>> --------------------------------------------------------------------- > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > > >>>>>> > > >>>>> > > >>>>> --------------------------------------------------------------------- > > >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > > >>>>> > > >>>> > > >>>> --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >>>> For additional commands, e-mail: dev-h...@lucene.apache.org > > >>>> > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >>> For additional commands, e-mail: dev-h...@lucene.apache.org > > >>> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: dev-h...@lucene.apache.org > > >> > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org