I think embrace lazy consensus -- no formal vote. Announce your intention to proceed with lazy consensus in two business days.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Fri, Jan 15, 2021 at 9:32 AM Jason Gerlowski <[email protected]> wrote: > Jan - I've modified the file-structure page to include support for > storing multiple collections within the same "location" per your > suggestion. > > I've also clarified the file-structure page to state that _all_ files > required to restore a shard are recorded in the shard-metadata file > (regardless of whether the file was uploaded by the current backup or > a past one). > > This DISCUSS thread has been open for a full month now, and it seems > like the feedback is winding down. Pending any objections, I'd like > to move out of the DISCUSS phase and start implementing. The "SIP > Process" page in Confluence mentions holding a VOTE thread for the > final proposal, but I can't find any examples of that actually being > done. Is that a required part of the process, or is the process page > out of date? IMO a VOTE seems slightly redundant, unless success on a > VOTE means that individual PRs can't be -1'd on design grounds? > > Either way I'm happy to do whatever the process requires here. I'll > plan on starting the next step on Monday, whatever that needs to be. > > Best, > > Jason > > On Thu, Jan 14, 2021 at 1:10 PM Jason Gerlowski <[email protected]> > wrote: > > > > Yeah, that's a good point. I think the only change we'd need to make > > to the file-structure to support multi-collection down the road would > > be to introduce a top-level directories for each collection. I'll > > experiment with that and tweak the described file structure to handle > > that (assuming the testing pans out). > > > > Best, > > > > Jason > > > > On Thu, Jan 14, 2021 at 5:10 AM Jan Høydahl <[email protected]> > wrote: > > > > > > > It's tough to decide between /v2/cluster/backups and > > > > /v2/collections/<collectionName>/backups as alternatives until you > > > > figure out whether we currently support multi-collection backup, or > > > > want to in the near future. > > > > > > I suppose multi-collection / TRA support cold be expanded on later > > > by supporting e.g. collection=<regex> or alias=my-tra. > > > > > > However, the file layout chosen here dictates whether one named backup > > > will be capable of storing more than one collection in the future, so > that's > > > perhaps something to consider. But if it gets too complicated we should > > > just delay it and redesign the storage structure once again when we > cross > > > that bridge. I'll not veto the current suggestion. > > > > > > Jan > > > > > > > 12. jan. 2021 kl. 17:53 skrev Jason Gerlowski <[email protected] > >: > > > > > > > > Hey all, > > > > > > > > Two follow ups on recent discussion. > > > > > > > > I reviewed the gc/ref-counting part of the BlobDirectory proposal on > > > > SOLR-15051 that David mentioned. We talked about it a bit offline > and > > > > agreed that while an automatic gc mechanism is really needed for what > > > > he's trying to do, the requirements of the backup usecase are > > > > different enough that SIP-12 can get by with manually-triggered > > > > 'purging'. Mostly because infrequent static backups produce much > less > > > > garbage than continually tracking all files for a (possibly > > > > ever-changing) index. > > > > > > > >> I'd be open to creating a new v2 backup endpoint (without adding > TRA, etc. compatibility) if there was consensus on that approach to > handling backcompat and on the specific appearance of the API > > > > > > > > On second thought, I'm going to flip-flop on this. Coming up with a > > > > better v2 API for backup/restore will be easier *after* some of the > > > > questions Jan raised (multi-collection? alias support? etc.) have > been > > > > dealt with. i.e. It's tough to decide between /v2/cluster/backups > and > > > > /v2/collections/<collectionName>/backups as alternatives until you > > > > figure out whether we currently support multi-collection backup, or > > > > want to in the near future. If people feel strongly or would veto > the > > > > vote otherwise, then I'll try my best. But otherwise I think we're > > > > best served waiting until other stuff settles out to revisit larger > v2 > > > > backup API changes. > > > > > > > > Best, > > > > > > > > Jason > > > > > > > > On Mon, Jan 11, 2021 at 10:41 AM Jason Gerlowski < > [email protected]> wrote: > > > >> > > > >> Hey all, > > > >> > > > >> I've put replies to everyone's questions below. Hope they help! > > > >> > > > >>> Do the shard metadata files list all of the segments that make up > the backup, or only the segments that were uploaded in this incremental > update? > > > >> > > > >> Mike: The former - they're intended to hold metadata about all of > the > > > >> segments that are needed to restore to the given > > > >> snapshot/commit-point. So it's likely to hold metadata about files > > > >> just uploaded, as well as ones that were added to the blob by > previous > > > >> backups. I'll see if I can make that clearer in the file > > > >> descriptions. > > > >> > > > >>> leave the old Backup/Restore API as-is, deprecated, and add a new > one on /v2/cluster/backup > > > >> > > > >> Jan: Ultimately I agree with your concerns about scope, so I'd vote > > > >> against trying to cover TRAs, multiple collection backups, etc. in > > > >> this effort here. > > > >> > > > >> That aside though, I agree that the existing v2 backup API is a bit > of > > > >> a headscratcher. Why is it /v2/collections instead of > > > >> /v2/collections/<collectionName> or a subpath of /v2/cluster? Does > it > > > >> have something to do with aliases? Or did it end up there mostly by > > > >> default? I'd be open to creating a new v2 backup endpoint (without > > > >> adding TRA, etc. compatibility) if there was consensus on that > > > >> approach to handling backcompat and on the specific appearance of > the > > > >> API. It would help with backcompat after all. Though if finding > > > >> consensus bogs down it may not be worth the addition. > > > >> > > > >>> I know you've seen SOLR-15051 (Shared storage -- BlobDirectory) > ... We both want to store checksums and file lengths. ... Your proposal did > not discuss how these files are GC'ed > > > >> > > > >> David: SIP-12 does address this, though maybe the writeup needs > > > >> clarifying. The Delete Backup API includes a "purge" parameter > which > > > >> triggers GC activity. This probably works about the way you'd > expect > > > >> - Solr gets the list of UUID-named index files from the blob store, > > > >> and then it compares that list to the set of UUID's referenced by > any > > > >> shard-metadata file (which requires reading all the shard-metadata > > > >> files). This avoids adding to Solr's ZK state, but does so at the > > > >> cost of requiring users to trigger sporadic cleanup manually instead > > > >> of detecting orphans automatically like BlobDirectory does > (assuming I > > > >> understand that correctly). > > > >> > > > >> I'm def not saying this is the best approach necessarily. I like > it, > > > >> though it has downsides for sure. Just that there is a proposed > > > >> approach that's easy to miss buried in the SIP. > > > >> > > > >> More broadly though - I share your sense that we should consider > > > >> alignment. It may end up that Backup/Restore is different enough > from > > > >> the BlobDirectory usecase that it doesn't make sense, but it's at > > > >> least worth figuring out. That's about as far as my understanding > > > >> goes right now though. I'll read up on BlobDirectory while you > absorb > > > >> SIP-12 and maybe we can circle back to this shortly. > > > >> > > > >> Best, > > > >> > > > >> Jason > > > >> > > > >> On Sun, Jan 10, 2021 at 7:20 AM Jan Høydahl <[email protected]> > wrote: > > > >>> > > > >>> Jason, Shalin and Dat, thanks for the thorough work. This is an > example for other SIPs to follow! > > > >>> > > > >>>> I've also amended the backcompat/migration section to mention > Jan's > > > >>>> suggestion that the "incremental" features be exposed in the v2 > API > > > >>>> only. Though it's unclear to me whether that's still something > people > > > >>>> want since it turns out that we'll still have backcompat concerns > with > > > >>>> the existing v2 backup/restore APIs. So I've held off from > > > >>>> removing/replacing the original plan. > > > >>> > > > >>> Since we already have v2 for the existing backup API, I guess my > suggestion is not that 'clean' after all. > > > >>> > > > >>> Another approach would be to leave the old Backup/Restore API > as-is, deprecated, and add a new one on /v2/cluster/backup, with support > for backing up multiple collections in one go, or backup a TRA alias with > hundreds of concrete "sub" collections. But as I write these words I > imagine it probably is way outside the scope for this SIP which is large > enough. Anyone even tried to backup a TRA with today's API? > > > >>> > > > >>> Jan > > > >>> > > > >>>> 5. jan. 2021 kl. 15:55 skrev Jason Gerlowski < > [email protected]>: > > > >>>> > > > >>>> Hey, Happy New Year everybody. > > > >>>> > > > >>>> Some SIP updates based on the discussion above: > > > >>>> > > > >>>> I added v2 examples for each API to the SIP. Feedback welcome, > > > >>>> especially on the v2 APIs that are net-new to this proposal > (namely: > > > >>>> "list backups" and "delete backup"). > > > >>>> > > > >>>> I've also amended the backcompat/migration section to mention > Jan's > > > >>>> suggestion that the "incremental" features be exposed in the v2 > API > > > >>>> only. Though it's unclear to me whether that's still something > people > > > >>>> want since it turns out that we'll still have backcompat concerns > with > > > >>>> the existing v2 backup/restore APIs. So I've held off from > > > >>>> removing/replacing the original plan. > > > >>>> > > > >>>> Link for convenience: > > > >>>> > https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore > > > >>>> > > > >>>> Best, > > > >>>> > > > >>>> Jason > > > >>>> > > > >>>> > > > >>>> On Thu, Dec 24, 2020 at 8:11 AM Jan Høydahl < > [email protected]> wrote: > > > >>>>> > > > >>>>> Ok, that’s the one I was looking for, it’s not documented in the > backup chapter of ref-guide :( > > > >>>>> > > > >>>>> Jan Høydahl > > > >>>>> > > > >>>>>> 23. des. 2020 kl. 17:10 skrev Jason Gerlowski < > [email protected]>: > > > >>>>>> > > > >>>>>> > > > >>>>>>> > > > >>>>>>> We have a path alias to the old API ... but we don’t have a > true v2 API spec for it, do we? > > > >>>>>> > > > >>>>>> Tbh I'm not yet familiar enough with the v2 APIs to understand > the > > > >>>>>> distinction you're making. (Do you have a pointer to something > that'd > > > >>>>>> fill me in?) > > > >>>>>> > > > >>>>>> To zoom in on "backup" as an example, the v2 API I'm referring > to > > > >>>>>> looks like: /v2/collections" -d '{ "backup-collection": > > > >>>>>> {"collection": "books", "name": "asdf3", "location": > "/tmp/foo"}}'. > > > >>>>>> And it's included in the v2 "introspect" documentation returned > by > > > >>>>>> this API: > /v2/collections/_introspect?command=backup-collection". To > > > >>>>>> me that looked like a v2 API, but maybe path-aliases are also > covered > > > >>>>>> in the introspect docs. > > > >>>>>> > > > >>>>>> Jason > > > >>>>>> > > > >>>>>>> On Wed, Dec 23, 2020 at 10:29 AM Jan Høydahl < > [email protected]> wrote: > > > >>>>>>> > > > >>>>>>> Actually, don’t think we do have a v2 Backup/Restore API. We > have a path alias to the old API which takes GET ...&action=backup... but > we don’t have a true v2 API spec for it, do we? Where is that documented? > > > >>>>>>> > > > >>>>>>> Jan Høydahl > > > >>>>>>> > > > >>>>>>>>> 22. des. 2020 kl. 18:04 skrev Jason Gerlowski < > [email protected]>: > > > >>>>>>>> > > > >>>>>>>> Hey guys, > > > >>>>>>>> > > > >>>>>>>> Following up to make sure I understand the specifics you're > > > >>>>>>>> suggesting. You're proposing that: > > > >>>>>>>> > > > >>>>>>>> 1. The brand new backup-related APIs (list-backups and > delete-backup) > > > >>>>>>>> be added in v2-form only. > > > >>>>>>>> 2. Tweaks to existing backup-related APIs (create-backup, > restore) be > > > >>>>>>>> made in V2-form only. > > > >>>>>>>> 3. All existing v1 backup-related APIs be deprecated and left > > > >>>>>>>> unchanged. Incremental backups will not be possible using > the v1 API. > > > >>>>>>>> > > > >>>>>>>> I'm not against going this route if there's consensus around > it. But > > > >>>>>>>> I'm not 100% clear on how it means we don't need to worry > about > > > >>>>>>>> backcompat. Backup and Restore currently exist as both a v1 > and a v2 > > > >>>>>>>> API - I understand how leaving the v1 APIs untouched (other > than > > > >>>>>>>> deprecation) frees us of some backcompat concerns there, but > we would > > > >>>>>>>> still need to make tweaks to the v2 backup/restore APIs and > would have > > > >>>>>>>> to tread just as carefully there in terms of backcompat, > afaict. > > > >>>>>>>> Unless Solr's backcompatibility guarantees only cover the v1 > API and > > > >>>>>>>> leave v2 changes to be made freely? I looked around to see > if the v2 > > > >>>>>>>> APIs had any sort of "experimental" designation, but couldn't > find > > > >>>>>>>> that clearly stated anywhere. Am I missing something? > > > >>>>>>>> > > > >>>>>>>> Best, > > > >>>>>>>> > > > >>>>>>>> Jason > > > >>>>>>>> > > > >>>>>>>>> On Tue, Dec 22, 2020 at 2:49 AM Noble Paul < > [email protected]> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> , and implement the new imporved version as a V2-api only, > and then deprecate the v1 API? > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> V2 only please > > > >>>>>>>>> > > > >>>>>>>>>> On Tue, Dec 22, 2020 at 1:34 AM Jason Gerlowski < > [email protected]> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>> Hey Jan, thanks for the review. > > > >>>>>>>>>> > > > >>>>>>>>>> I hadn't thought about the V2 API in connection to this > work. You're > > > >>>>>>>>>> right though I think - the SIP proposes net-new APIs, so it > should add > > > >>>>>>>>>> V2 equivalents at the very least. I'll draft tentative > details for > > > >>>>>>>>>> these APIs on the SIP and we can refine things from there. > > > >>>>>>>>>> > > > >>>>>>>>>> I'm more up in the air on your specific suggestion to > restrict the SIP > > > >>>>>>>>>> changes to these v2 APIs. It is an elegant approach to the > > > >>>>>>>>>> backcompat, and it provides a carrot for v2 adoption - both > of which I > > > >>>>>>>>>> like. But it would let users create snapshot-based backups > (and keep > > > >>>>>>>>>> us maintaining that code) longer than there's any strict > need to. And > > > >>>>>>>>>> users are left on the less-efficient format by default. > (By contrast, > > > >>>>>>>>>> the current SIP has snapshot-backup creation being replaced > by > > > >>>>>>>>>> incremental-backup creation as soon as the latter is > available.). Did > > > >>>>>>>>>> you have a particular lifespan in mind for snapshot-based > creation if > > > >>>>>>>>>> we go with this approach? > > > >>>>>>>>>> > > > >>>>>>>>>> Jason > > > >>>>>>>>>> > > > >>>>>>>>>> On Thu, Dec 17, 2020 at 3:54 PM Jan Høydahl < > [email protected]> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>> Much needed! Thanks for initiating this Jason! > > > >>>>>>>>>>> > > > >>>>>>>>>>> As we want to move away from v1 APIs where a HTTP GET is > used for creation and deletion, would it be an idea to leave the old > backup/resotre APIs as-is, and implement the new imporved version as a > V2-api only, and then deprecate the v1 API? Then we don't need to worry > about back-compat, and we get a head-start on converting the COLLECTION API > to v2 style. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Jan > > > >>>>>>>>>>> > > > >>>>>>>>>>>> 15. des. 2020 kl. 15:48 skrev Jason Gerlowski < > [email protected]>: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Hey all, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> This morning I published SIP-12, which proposes an > overhaul of Solr's > > > >>>>>>>>>>>> backup and restore functionality. While the "headline" > improvement in > > > >>>>>>>>>>>> this SIP is a change to do backups incrementally, it > bundles in a > > > >>>>>>>>>>>> number of other improvements as well, including the > addition of > > > >>>>>>>>>>>> corruption checks, APIs to list and delete backups, and > stronger > > > >>>>>>>>>>>> integration points with popular object storage APIs. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> The SIP can be found here: > > > >>>>>>>>>>>> > https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Please read the SIP description and come back here for > discussion. As > > > >>>>>>>>>>>> the discussion progresses we will update the SIP page > with any > > > >>>>>>>>>>>> outcomes and eventually move things to a VOTE. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Looking forward to hearing your feedback. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Best, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Jason > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > --------------------------------------------------------------------- > > > >>>>>>>>>>>> To unsubscribe, e-mail: [email protected] > > > >>>>>>>>>>>> For additional commands, e-mail: > [email protected] > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > --------------------------------------------------------------------- > > > >>>>>>>>>>> To unsubscribe, e-mail: [email protected] > > > >>>>>>>>>>> For additional commands, e-mail: > [email protected] > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > --------------------------------------------------------------------- > > > >>>>>>>>>> To unsubscribe, e-mail: [email protected] > > > >>>>>>>>>> For additional commands, e-mail: [email protected] > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> -- > > > >>>>>>>>> ----------------------------------------------------- > > > >>>>>>>>> Noble Paul > > > >>>>>>>>> > > > >>>>>>>>> > --------------------------------------------------------------------- > > > >>>>>>>>> To unsubscribe, e-mail: [email protected] > > > >>>>>>>>> For additional commands, e-mail: [email protected] > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > --------------------------------------------------------------------- > > > >>>>>>>> To unsubscribe, e-mail: [email protected] > > > >>>>>>>> For additional commands, e-mail: [email protected] > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> > --------------------------------------------------------------------- > > > >>>>>>> To unsubscribe, e-mail: [email protected] > > > >>>>>>> For additional commands, e-mail: [email protected] > > > >>>>>>> > > > >>>>>> > > > >>>>>> > --------------------------------------------------------------------- > > > >>>>>> To unsubscribe, e-mail: [email protected] > > > >>>>>> For additional commands, e-mail: [email protected] > > > >>>>>> > > > >>>>> > > > >>>>> > --------------------------------------------------------------------- > > > >>>>> To unsubscribe, e-mail: [email protected] > > > >>>>> For additional commands, e-mail: [email protected] > > > >>>>> > > > >>>> > > > >>>> > --------------------------------------------------------------------- > > > >>>> To unsubscribe, e-mail: [email protected] > > > >>>> For additional commands, e-mail: [email protected] > > > >>>> > > > >>> > > > >>> > > > >>> > --------------------------------------------------------------------- > > > >>> To unsubscribe, e-mail: [email protected] > > > >>> For additional commands, e-mail: [email protected] > > > >>> > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
