> Announce your intention to proceed with lazy consensus in two business days.
I intend to proceed then. : ) I'm going to file JIRAs for the SIP today and start pushing up PRs. Based on how this discussion went I also plan on updating the "SIP" process page to say that lazy consensus can be used over a strict VOTE if no one objects. Thanks everyone for the attention and feedback. Jason On Fri, Jan 15, 2021 at 10:43 AM David Smiley <dsmi...@apache.org> wrote: > > I think embrace lazy consensus -- no formal vote. Announce your intention to > proceed with lazy consensus in two business days. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Fri, Jan 15, 2021 at 9:32 AM Jason Gerlowski <gerlowsk...@gmail.com> wrote: >> >> Jan - I've modified the file-structure page to include support for >> storing multiple collections within the same "location" per your >> suggestion. >> >> I've also clarified the file-structure page to state that _all_ files >> required to restore a shard are recorded in the shard-metadata file >> (regardless of whether the file was uploaded by the current backup or >> a past one). >> >> This DISCUSS thread has been open for a full month now, and it seems >> like the feedback is winding down. Pending any objections, I'd like >> to move out of the DISCUSS phase and start implementing. The "SIP >> Process" page in Confluence mentions holding a VOTE thread for the >> final proposal, but I can't find any examples of that actually being >> done. Is that a required part of the process, or is the process page >> out of date? IMO a VOTE seems slightly redundant, unless success on a >> VOTE means that individual PRs can't be -1'd on design grounds? >> >> Either way I'm happy to do whatever the process requires here. I'll >> plan on starting the next step on Monday, whatever that needs to be. >> >> Best, >> >> Jason >> >> On Thu, Jan 14, 2021 at 1:10 PM Jason Gerlowski <gerlowsk...@gmail.com> >> wrote: >> > >> > Yeah, that's a good point. I think the only change we'd need to make >> > to the file-structure to support multi-collection down the road would >> > be to introduce a top-level directories for each collection. I'll >> > experiment with that and tweak the described file structure to handle >> > that (assuming the testing pans out). >> > >> > Best, >> > >> > Jason >> > >> > On Thu, Jan 14, 2021 at 5:10 AM Jan Høydahl <jan....@cominvent.com> wrote: >> > > >> > > > It's tough to decide between /v2/cluster/backups and >> > > > /v2/collections/<collectionName>/backups as alternatives until you >> > > > figure out whether we currently support multi-collection backup, or >> > > > want to in the near future. >> > > >> > > I suppose multi-collection / TRA support cold be expanded on later >> > > by supporting e.g. collection=<regex> or alias=my-tra. >> > > >> > > However, the file layout chosen here dictates whether one named backup >> > > will be capable of storing more than one collection in the future, so >> > > that's >> > > perhaps something to consider. But if it gets too complicated we should >> > > just delay it and redesign the storage structure once again when we cross >> > > that bridge. I'll not veto the current suggestion. >> > > >> > > Jan >> > > >> > > > 12. jan. 2021 kl. 17:53 skrev Jason Gerlowski <gerlowsk...@gmail.com>: >> > > > >> > > > Hey all, >> > > > >> > > > Two follow ups on recent discussion. >> > > > >> > > > I reviewed the gc/ref-counting part of the BlobDirectory proposal on >> > > > SOLR-15051 that David mentioned. We talked about it a bit offline and >> > > > agreed that while an automatic gc mechanism is really needed for what >> > > > he's trying to do, the requirements of the backup usecase are >> > > > different enough that SIP-12 can get by with manually-triggered >> > > > 'purging'. Mostly because infrequent static backups produce much less >> > > > garbage than continually tracking all files for a (possibly >> > > > ever-changing) index. >> > > > >> > > >> I'd be open to creating a new v2 backup endpoint (without adding TRA, >> > > >> etc. compatibility) if there was consensus on that approach to >> > > >> handling backcompat and on the specific appearance of the API >> > > > >> > > > On second thought, I'm going to flip-flop on this. Coming up with a >> > > > better v2 API for backup/restore will be easier *after* some of the >> > > > questions Jan raised (multi-collection? alias support? etc.) have been >> > > > dealt with. i.e. It's tough to decide between /v2/cluster/backups and >> > > > /v2/collections/<collectionName>/backups as alternatives until you >> > > > figure out whether we currently support multi-collection backup, or >> > > > want to in the near future. If people feel strongly or would veto the >> > > > vote otherwise, then I'll try my best. But otherwise I think we're >> > > > best served waiting until other stuff settles out to revisit larger v2 >> > > > backup API changes. >> > > > >> > > > Best, >> > > > >> > > > Jason >> > > > >> > > > On Mon, Jan 11, 2021 at 10:41 AM Jason Gerlowski >> > > > <gerlowsk...@gmail.com> wrote: >> > > >> >> > > >> Hey all, >> > > >> >> > > >> I've put replies to everyone's questions below. Hope they help! >> > > >> >> > > >>> Do the shard metadata files list all of the segments that make up >> > > >>> the backup, or only the segments that were uploaded in this >> > > >>> incremental update? >> > > >> >> > > >> Mike: The former - they're intended to hold metadata about all of the >> > > >> segments that are needed to restore to the given >> > > >> snapshot/commit-point. So it's likely to hold metadata about files >> > > >> just uploaded, as well as ones that were added to the blob by previous >> > > >> backups. I'll see if I can make that clearer in the file >> > > >> descriptions. >> > > >> >> > > >>> leave the old Backup/Restore API as-is, deprecated, and add a new >> > > >>> one on /v2/cluster/backup >> > > >> >> > > >> Jan: Ultimately I agree with your concerns about scope, so I'd vote >> > > >> against trying to cover TRAs, multiple collection backups, etc. in >> > > >> this effort here. >> > > >> >> > > >> That aside though, I agree that the existing v2 backup API is a bit of >> > > >> a headscratcher. Why is it /v2/collections instead of >> > > >> /v2/collections/<collectionName> or a subpath of /v2/cluster? Does it >> > > >> have something to do with aliases? Or did it end up there mostly by >> > > >> default? I'd be open to creating a new v2 backup endpoint (without >> > > >> adding TRA, etc. compatibility) if there was consensus on that >> > > >> approach to handling backcompat and on the specific appearance of the >> > > >> API. It would help with backcompat after all. Though if finding >> > > >> consensus bogs down it may not be worth the addition. >> > > >> >> > > >>> I know you've seen SOLR-15051 (Shared storage -- BlobDirectory) ... >> > > >>> We both want to store checksums and file lengths. ... Your proposal >> > > >>> did not discuss how these files are GC'ed >> > > >> >> > > >> David: SIP-12 does address this, though maybe the writeup needs >> > > >> clarifying. The Delete Backup API includes a "purge" parameter which >> > > >> triggers GC activity. This probably works about the way you'd expect >> > > >> - Solr gets the list of UUID-named index files from the blob store, >> > > >> and then it compares that list to the set of UUID's referenced by any >> > > >> shard-metadata file (which requires reading all the shard-metadata >> > > >> files). This avoids adding to Solr's ZK state, but does so at the >> > > >> cost of requiring users to trigger sporadic cleanup manually instead >> > > >> of detecting orphans automatically like BlobDirectory does (assuming I >> > > >> understand that correctly). >> > > >> >> > > >> I'm def not saying this is the best approach necessarily. I like it, >> > > >> though it has downsides for sure. Just that there is a proposed >> > > >> approach that's easy to miss buried in the SIP. >> > > >> >> > > >> More broadly though - I share your sense that we should consider >> > > >> alignment. It may end up that Backup/Restore is different enough from >> > > >> the BlobDirectory usecase that it doesn't make sense, but it's at >> > > >> least worth figuring out. That's about as far as my understanding >> > > >> goes right now though. I'll read up on BlobDirectory while you absorb >> > > >> SIP-12 and maybe we can circle back to this shortly. >> > > >> >> > > >> Best, >> > > >> >> > > >> Jason >> > > >> >> > > >> On Sun, Jan 10, 2021 at 7:20 AM Jan Høydahl <jan....@cominvent.com> >> > > >> wrote: >> > > >>> >> > > >>> Jason, Shalin and Dat, thanks for the thorough work. This is an >> > > >>> example for other SIPs to follow! >> > > >>> >> > > >>>> I've also amended the backcompat/migration section to mention Jan's >> > > >>>> suggestion that the "incremental" features be exposed in the v2 API >> > > >>>> only. Though it's unclear to me whether that's still something >> > > >>>> people >> > > >>>> want since it turns out that we'll still have backcompat concerns >> > > >>>> with >> > > >>>> the existing v2 backup/restore APIs. So I've held off from >> > > >>>> removing/replacing the original plan. >> > > >>> >> > > >>> Since we already have v2 for the existing backup API, I guess my >> > > >>> suggestion is not that 'clean' after all. >> > > >>> >> > > >>> Another approach would be to leave the old Backup/Restore API as-is, >> > > >>> deprecated, and add a new one on /v2/cluster/backup, with support >> > > >>> for backing up multiple collections in one go, or backup a TRA alias >> > > >>> with hundreds of concrete "sub" collections. But as I write these >> > > >>> words I imagine it probably is way outside the scope for this SIP >> > > >>> which is large enough. Anyone even tried to backup a TRA with >> > > >>> today's API? >> > > >>> >> > > >>> Jan >> > > >>> >> > > >>>> 5. jan. 2021 kl. 15:55 skrev Jason Gerlowski >> > > >>>> <gerlowsk...@gmail.com>: >> > > >>>> >> > > >>>> Hey, Happy New Year everybody. >> > > >>>> >> > > >>>> Some SIP updates based on the discussion above: >> > > >>>> >> > > >>>> I added v2 examples for each API to the SIP. Feedback welcome, >> > > >>>> especially on the v2 APIs that are net-new to this proposal (namely: >> > > >>>> "list backups" and "delete backup"). >> > > >>>> >> > > >>>> I've also amended the backcompat/migration section to mention Jan's >> > > >>>> suggestion that the "incremental" features be exposed in the v2 API >> > > >>>> only. Though it's unclear to me whether that's still something >> > > >>>> people >> > > >>>> want since it turns out that we'll still have backcompat concerns >> > > >>>> with >> > > >>>> the existing v2 backup/restore APIs. So I've held off from >> > > >>>> removing/replacing the original plan. >> > > >>>> >> > > >>>> Link for convenience: >> > > >>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore >> > > >>>> >> > > >>>> Best, >> > > >>>> >> > > >>>> Jason >> > > >>>> >> > > >>>> >> > > >>>> On Thu, Dec 24, 2020 at 8:11 AM Jan Høydahl <jan....@cominvent.com> >> > > >>>> wrote: >> > > >>>>> >> > > >>>>> Ok, that’s the one I was looking for, it’s not documented in the >> > > >>>>> backup chapter of ref-guide :( >> > > >>>>> >> > > >>>>> Jan Høydahl >> > > >>>>> >> > > >>>>>> 23. des. 2020 kl. 17:10 skrev Jason Gerlowski >> > > >>>>>> <gerlowsk...@gmail.com>: >> > > >>>>>> >> > > >>>>>> >> > > >>>>>>> >> > > >>>>>>> We have a path alias to the old API ... but we don’t have a true >> > > >>>>>>> v2 API spec for it, do we? >> > > >>>>>> >> > > >>>>>> Tbh I'm not yet familiar enough with the v2 APIs to understand the >> > > >>>>>> distinction you're making. (Do you have a pointer to something >> > > >>>>>> that'd >> > > >>>>>> fill me in?) >> > > >>>>>> >> > > >>>>>> To zoom in on "backup" as an example, the v2 API I'm referring to >> > > >>>>>> looks like: /v2/collections" -d '{ "backup-collection": >> > > >>>>>> {"collection": "books", "name": "asdf3", "location": >> > > >>>>>> "/tmp/foo"}}'. >> > > >>>>>> And it's included in the v2 "introspect" documentation returned by >> > > >>>>>> this API: /v2/collections/_introspect?command=backup-collection". >> > > >>>>>> To >> > > >>>>>> me that looked like a v2 API, but maybe path-aliases are also >> > > >>>>>> covered >> > > >>>>>> in the introspect docs. >> > > >>>>>> >> > > >>>>>> Jason >> > > >>>>>> >> > > >>>>>>> On Wed, Dec 23, 2020 at 10:29 AM Jan Høydahl >> > > >>>>>>> <jan....@cominvent.com> wrote: >> > > >>>>>>> >> > > >>>>>>> Actually, don’t think we do have a v2 Backup/Restore API. We >> > > >>>>>>> have a path alias to the old API which takes GET >> > > >>>>>>> ...&action=backup... but we don’t have a true v2 API spec for >> > > >>>>>>> it, do we? Where is that documented? >> > > >>>>>>> >> > > >>>>>>> Jan Høydahl >> > > >>>>>>> >> > > >>>>>>>>> 22. des. 2020 kl. 18:04 skrev Jason Gerlowski >> > > >>>>>>>>> <gerlowsk...@gmail.com>: >> > > >>>>>>>> >> > > >>>>>>>> Hey guys, >> > > >>>>>>>> >> > > >>>>>>>> Following up to make sure I understand the specifics you're >> > > >>>>>>>> suggesting. You're proposing that: >> > > >>>>>>>> >> > > >>>>>>>> 1. The brand new backup-related APIs (list-backups and >> > > >>>>>>>> delete-backup) >> > > >>>>>>>> be added in v2-form only. >> > > >>>>>>>> 2. Tweaks to existing backup-related APIs (create-backup, >> > > >>>>>>>> restore) be >> > > >>>>>>>> made in V2-form only. >> > > >>>>>>>> 3. All existing v1 backup-related APIs be deprecated and left >> > > >>>>>>>> unchanged. Incremental backups will not be possible using the >> > > >>>>>>>> v1 API. >> > > >>>>>>>> >> > > >>>>>>>> I'm not against going this route if there's consensus around >> > > >>>>>>>> it. But >> > > >>>>>>>> I'm not 100% clear on how it means we don't need to worry about >> > > >>>>>>>> backcompat. Backup and Restore currently exist as both a v1 >> > > >>>>>>>> and a v2 >> > > >>>>>>>> API - I understand how leaving the v1 APIs untouched (other than >> > > >>>>>>>> deprecation) frees us of some backcompat concerns there, but we >> > > >>>>>>>> would >> > > >>>>>>>> still need to make tweaks to the v2 backup/restore APIs and >> > > >>>>>>>> would have >> > > >>>>>>>> to tread just as carefully there in terms of backcompat, afaict. >> > > >>>>>>>> Unless Solr's backcompatibility guarantees only cover the v1 >> > > >>>>>>>> API and >> > > >>>>>>>> leave v2 changes to be made freely? I looked around to see if >> > > >>>>>>>> the v2 >> > > >>>>>>>> APIs had any sort of "experimental" designation, but couldn't >> > > >>>>>>>> find >> > > >>>>>>>> that clearly stated anywhere. Am I missing something? >> > > >>>>>>>> >> > > >>>>>>>> Best, >> > > >>>>>>>> >> > > >>>>>>>> Jason >> > > >>>>>>>> >> > > >>>>>>>>> On Tue, Dec 22, 2020 at 2:49 AM Noble Paul >> > > >>>>>>>>> <noble.p...@gmail.com> wrote: >> > > >>>>>>>>> >> > > >>>>>>>>>> , and implement the new imporved version as a V2-api only, >> > > >>>>>>>>>> and then deprecate the v1 API? >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> V2 only please >> > > >>>>>>>>> >> > > >>>>>>>>>> On Tue, Dec 22, 2020 at 1:34 AM Jason Gerlowski >> > > >>>>>>>>>> <gerlowsk...@gmail.com> wrote: >> > > >>>>>>>>>> >> > > >>>>>>>>>> Hey Jan, thanks for the review. >> > > >>>>>>>>>> >> > > >>>>>>>>>> I hadn't thought about the V2 API in connection to this work. >> > > >>>>>>>>>> You're >> > > >>>>>>>>>> right though I think - the SIP proposes net-new APIs, so it >> > > >>>>>>>>>> should add >> > > >>>>>>>>>> V2 equivalents at the very least. I'll draft tentative >> > > >>>>>>>>>> details for >> > > >>>>>>>>>> these APIs on the SIP and we can refine things from there. >> > > >>>>>>>>>> >> > > >>>>>>>>>> I'm more up in the air on your specific suggestion to >> > > >>>>>>>>>> restrict the SIP >> > > >>>>>>>>>> changes to these v2 APIs. It is an elegant approach to the >> > > >>>>>>>>>> backcompat, and it provides a carrot for v2 adoption - both >> > > >>>>>>>>>> of which I >> > > >>>>>>>>>> like. But it would let users create snapshot-based backups >> > > >>>>>>>>>> (and keep >> > > >>>>>>>>>> us maintaining that code) longer than there's any strict need >> > > >>>>>>>>>> to. And >> > > >>>>>>>>>> users are left on the less-efficient format by default. (By >> > > >>>>>>>>>> contrast, >> > > >>>>>>>>>> the current SIP has snapshot-backup creation being replaced by >> > > >>>>>>>>>> incremental-backup creation as soon as the latter is >> > > >>>>>>>>>> available.). Did >> > > >>>>>>>>>> you have a particular lifespan in mind for snapshot-based >> > > >>>>>>>>>> creation if >> > > >>>>>>>>>> we go with this approach? >> > > >>>>>>>>>> >> > > >>>>>>>>>> Jason >> > > >>>>>>>>>> >> > > >>>>>>>>>> On Thu, Dec 17, 2020 at 3:54 PM Jan Høydahl >> > > >>>>>>>>>> <jan....@cominvent.com> wrote: >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> Much needed! Thanks for initiating this Jason! >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> As we want to move away from v1 APIs where a HTTP GET is >> > > >>>>>>>>>>> used for creation and deletion, would it be an idea to leave >> > > >>>>>>>>>>> the old backup/resotre APIs as-is, and implement the new >> > > >>>>>>>>>>> imporved version as a V2-api only, and then deprecate the v1 >> > > >>>>>>>>>>> API? Then we don't need to worry about back-compat, and we >> > > >>>>>>>>>>> get a head-start on converting the COLLECTION API to v2 >> > > >>>>>>>>>>> style. >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> Jan >> > > >>>>>>>>>>> >> > > >>>>>>>>>>>> 15. des. 2020 kl. 15:48 skrev Jason Gerlowski >> > > >>>>>>>>>>>> <gerlowsk...@gmail.com>: >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> Hey all, >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> This morning I published SIP-12, which proposes an overhaul >> > > >>>>>>>>>>>> of Solr's >> > > >>>>>>>>>>>> backup and restore functionality. While the "headline" >> > > >>>>>>>>>>>> improvement in >> > > >>>>>>>>>>>> this SIP is a change to do backups incrementally, it >> > > >>>>>>>>>>>> bundles in a >> > > >>>>>>>>>>>> number of other improvements as well, including the >> > > >>>>>>>>>>>> addition of >> > > >>>>>>>>>>>> corruption checks, APIs to list and delete backups, and >> > > >>>>>>>>>>>> stronger >> > > >>>>>>>>>>>> integration points with popular object storage APIs. >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> The SIP can be found here: >> > > >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> Please read the SIP description and come back here for >> > > >>>>>>>>>>>> discussion. As >> > > >>>>>>>>>>>> the discussion progresses we will update the SIP page with >> > > >>>>>>>>>>>> any >> > > >>>>>>>>>>>> outcomes and eventually move things to a VOTE. >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> Looking forward to hearing your feedback. >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> Best, >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> Jason >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>>> --------------------------------------------------------------------- >> > > >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>>>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>>>>>>>>>> >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> >> > > >>>>>>>>>>> --------------------------------------------------------------------- >> > > >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>>>>>>>>> >> > > >>>>>>>>>> >> > > >>>>>>>>>> --------------------------------------------------------------------- >> > > >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> -- >> > > >>>>>>>>> ----------------------------------------------------- >> > > >>>>>>>>> Noble Paul >> > > >>>>>>>>> >> > > >>>>>>>>> --------------------------------------------------------------------- >> > > >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>>>>>>> >> > > >>>>>>>> >> > > >>>>>>>> --------------------------------------------------------------------- >> > > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>>>>>> >> > > >>>>>>> >> > > >>>>>>> --------------------------------------------------------------------- >> > > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>>>>> >> > > >>>>>> >> > > >>>>>> --------------------------------------------------------------------- >> > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>>>> >> > > >>>>> >> > > >>>>> --------------------------------------------------------------------- >> > > >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>>> >> > > >>>> >> > > >>>> --------------------------------------------------------------------- >> > > >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>>> >> > > >>> >> > > >>> >> > > >>> --------------------------------------------------------------------- >> > > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > >>> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >>> >> > > > >> > > > --------------------------------------------------------------------- >> > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > > For additional commands, e-mail: dev-h...@lucene.apache.org >> > > > >> > > >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > For additional commands, e-mail: dev-h...@lucene.apache.org >> > > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org