I think embrace lazy consensus -- no formal vote.  Announce your intention
to proceed with lazy consensus in two business days.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jan 15, 2021 at 9:32 AM Jason Gerlowski <[email protected]>
wrote:

> Jan - I've modified the file-structure page to include support for
> storing multiple collections within the same "location" per your
> suggestion.
>
> I've also clarified the file-structure page to state that _all_ files
> required to restore a shard are recorded in the shard-metadata file
> (regardless of whether the file was uploaded by the current backup or
> a past one).
>
> This DISCUSS thread has been open for a full month now, and it seems
> like the feedback is winding down.  Pending any objections, I'd like
> to move out of the DISCUSS phase and start implementing.  The "SIP
> Process" page in Confluence mentions holding a VOTE thread for the
> final proposal, but I can't find any examples of that actually being
> done.  Is that a required part of the process, or is the process page
> out of date?  IMO a VOTE seems slightly redundant, unless success on a
> VOTE means that individual PRs can't be -1'd on design grounds?
>
> Either way I'm happy to do whatever the process requires here.  I'll
> plan on starting the next step on Monday, whatever that needs to be.
>
> Best,
>
> Jason
>
> On Thu, Jan 14, 2021 at 1:10 PM Jason Gerlowski <[email protected]>
> wrote:
> >
> > Yeah, that's a good point.  I think the only change we'd need to make
> > to the file-structure to support multi-collection down the road would
> > be to introduce a top-level directories for each collection.  I'll
> > experiment with that and tweak the described file structure to handle
> > that (assuming the testing pans out).
> >
> > Best,
> >
> > Jason
> >
> > On Thu, Jan 14, 2021 at 5:10 AM Jan Høydahl <[email protected]>
> wrote:
> > >
> > > > It's tough to decide between /v2/cluster/backups and
> > > > /v2/collections/<collectionName>/backups as alternatives until you
> > > > figure out whether we currently support multi-collection backup, or
> > > > want to in the near future.
> > >
> > > I suppose multi-collection / TRA support cold be expanded on later
> > > by supporting e.g. collection=<regex> or alias=my-tra.
> > >
> > > However, the file layout chosen here dictates whether one named backup
> > > will be capable of storing more than one collection in the future, so
> that's
> > > perhaps something to consider. But if it gets too complicated we should
> > > just delay it and redesign the storage structure once again when we
> cross
> > > that bridge. I'll not veto the current suggestion.
> > >
> > > Jan
> > >
> > > > 12. jan. 2021 kl. 17:53 skrev Jason Gerlowski <[email protected]
> >:
> > > >
> > > > Hey all,
> > > >
> > > > Two follow ups on recent discussion.
> > > >
> > > > I reviewed the gc/ref-counting part of the BlobDirectory proposal on
> > > > SOLR-15051 that David mentioned.  We talked about it a bit offline
> and
> > > > agreed that while an automatic gc mechanism is really needed for what
> > > > he's trying to do, the requirements of the backup usecase are
> > > > different enough that SIP-12 can get by with manually-triggered
> > > > 'purging'.  Mostly because infrequent static backups produce much
> less
> > > > garbage than continually tracking all files for a (possibly
> > > > ever-changing) index.
> > > >
> > > >> I'd be open to creating a new v2 backup endpoint (without adding
> TRA, etc. compatibility) if there was consensus on that approach to
> handling backcompat and on the specific appearance of the API
> > > >
> > > > On second thought, I'm going to flip-flop on this.  Coming up with a
> > > > better v2 API for backup/restore will be easier *after* some of the
> > > > questions Jan raised (multi-collection? alias support? etc.) have
> been
> > > > dealt with.  i.e. It's tough to decide between /v2/cluster/backups
> and
> > > > /v2/collections/<collectionName>/backups as alternatives until you
> > > > figure out whether we currently support multi-collection backup, or
> > > > want to in the near future.  If people feel strongly or would veto
> the
> > > > vote otherwise, then I'll try my best.  But otherwise I think we're
> > > > best served waiting until other stuff settles out to revisit larger
> v2
> > > > backup API changes.
> > > >
> > > > Best,
> > > >
> > > > Jason
> > > >
> > > > On Mon, Jan 11, 2021 at 10:41 AM Jason Gerlowski <
> [email protected]> wrote:
> > > >>
> > > >> Hey all,
> > > >>
> > > >> I've put replies to everyone's questions below.  Hope they help!
> > > >>
> > > >>> Do the shard metadata files list all of the segments that make up
> the backup, or only the segments that were uploaded in this incremental
> update?
> > > >>
> > > >> Mike: The former - they're intended to hold metadata about all of
> the
> > > >> segments that are needed to restore to the given
> > > >> snapshot/commit-point.  So it's likely to hold metadata about files
> > > >> just uploaded, as well as ones that were added to the blob by
> previous
> > > >> backups.  I'll see if I can make that clearer in the file
> > > >> descriptions.
> > > >>
> > > >>> leave the old Backup/Restore API as-is, deprecated, and add a new
> one on /v2/cluster/backup
> > > >>
> > > >> Jan: Ultimately I agree with your concerns about scope, so I'd vote
> > > >> against trying to cover TRAs, multiple collection backups, etc. in
> > > >> this effort here.
> > > >>
> > > >> That aside though, I agree that the existing v2 backup API is a bit
> of
> > > >> a headscratcher.  Why is it /v2/collections instead of
> > > >> /v2/collections/<collectionName> or a subpath of /v2/cluster?  Does
> it
> > > >> have something to do with aliases?  Or did it end up there mostly by
> > > >> default?  I'd be open to creating a new v2 backup endpoint (without
> > > >> adding TRA, etc. compatibility) if there was consensus on that
> > > >> approach to handling backcompat and on the specific appearance of
> the
> > > >> API.  It would help with backcompat after all.  Though if finding
> > > >> consensus bogs down it may not be worth the addition.
> > > >>
> > > >>> I know you've seen SOLR-15051 (Shared storage -- BlobDirectory)
> ... We both want to store checksums and file lengths. ... Your proposal did
> not discuss how these files are GC'ed
> > > >>
> > > >> David: SIP-12 does address this, though maybe the writeup needs
> > > >> clarifying.  The Delete Backup API includes a "purge" parameter
> which
> > > >> triggers GC activity.  This probably works about the way you'd
> expect
> > > >> - Solr gets the list of UUID-named index files from the blob store,
> > > >> and then it compares that list to the set of UUID's referenced by
> any
> > > >> shard-metadata file (which requires reading all the shard-metadata
> > > >> files).  This avoids adding to Solr's ZK state, but does so at the
> > > >> cost of requiring users to trigger sporadic cleanup manually instead
> > > >> of detecting orphans automatically like BlobDirectory does
> (assuming I
> > > >> understand that correctly).
> > > >>
> > > >> I'm def not saying this is the best approach necessarily.  I like
> it,
> > > >> though it has downsides for sure.  Just that there is a proposed
> > > >> approach that's easy to miss buried in the SIP.
> > > >>
> > > >> More broadly though - I share your sense that we should consider
> > > >> alignment.  It may end up that Backup/Restore is different enough
> from
> > > >> the BlobDirectory usecase that it doesn't make sense, but it's at
> > > >> least worth figuring out.  That's about as far as my understanding
> > > >> goes right now though.  I'll read up on BlobDirectory while you
> absorb
> > > >> SIP-12 and maybe we can circle back to this shortly.
> > > >>
> > > >> Best,
> > > >>
> > > >> Jason
> > > >>
> > > >> On Sun, Jan 10, 2021 at 7:20 AM Jan Høydahl <[email protected]>
> wrote:
> > > >>>
> > > >>> Jason, Shalin and Dat, thanks for the thorough work. This is an
> example for other SIPs to follow!
> > > >>>
> > > >>>> I've also amended the backcompat/migration section to mention
> Jan's
> > > >>>> suggestion that the "incremental" features be exposed in the v2
> API
> > > >>>> only.  Though it's unclear to me whether that's still something
> people
> > > >>>> want since it turns out that we'll still have backcompat concerns
> with
> > > >>>> the existing v2 backup/restore APIs.  So I've held off from
> > > >>>> removing/replacing the original plan.
> > > >>>
> > > >>> Since we already have v2 for the existing backup API, I guess my
> suggestion is not that 'clean' after all.
> > > >>>
> > > >>> Another approach would be to leave the old Backup/Restore API
> as-is, deprecated, and add a new one on /v2/cluster/backup, with support
> for backing up multiple collections in one go, or backup a TRA alias with
> hundreds of concrete "sub" collections. But as I write these words I
> imagine it probably is way outside the scope for this SIP which is large
> enough. Anyone even tried to backup a TRA with today's API?
> > > >>>
> > > >>> Jan
> > > >>>
> > > >>>> 5. jan. 2021 kl. 15:55 skrev Jason Gerlowski <
> [email protected]>:
> > > >>>>
> > > >>>> Hey, Happy New Year everybody.
> > > >>>>
> > > >>>> Some SIP updates based on the discussion above:
> > > >>>>
> > > >>>> I added v2 examples for each API to the SIP.  Feedback welcome,
> > > >>>> especially on the v2 APIs that are net-new to this proposal
> (namely:
> > > >>>> "list backups" and "delete backup").
> > > >>>>
> > > >>>> I've also amended the backcompat/migration section to mention
> Jan's
> > > >>>> suggestion that the "incremental" features be exposed in the v2
> API
> > > >>>> only.  Though it's unclear to me whether that's still something
> people
> > > >>>> want since it turns out that we'll still have backcompat concerns
> with
> > > >>>> the existing v2 backup/restore APIs.  So I've held off from
> > > >>>> removing/replacing the original plan.
> > > >>>>
> > > >>>> Link for convenience:
> > > >>>>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore
> > > >>>>
> > > >>>> Best,
> > > >>>>
> > > >>>> Jason
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Dec 24, 2020 at 8:11 AM Jan Høydahl <
> [email protected]> wrote:
> > > >>>>>
> > > >>>>> Ok, that’s the one I was looking for, it’s not documented in the
> backup chapter of ref-guide :(
> > > >>>>>
> > > >>>>> Jan Høydahl
> > > >>>>>
> > > >>>>>> 23. des. 2020 kl. 17:10 skrev Jason Gerlowski <
> [email protected]>:
> > > >>>>>>
> > > >>>>>> 
> > > >>>>>>>
> > > >>>>>>> We have a path alias to the old API ... but we don’t have a
> true v2 API spec for it, do we?
> > > >>>>>>
> > > >>>>>> Tbh I'm not yet familiar enough with the v2 APIs to understand
> the
> > > >>>>>> distinction you're making.  (Do you have a pointer to something
> that'd
> > > >>>>>> fill me in?)
> > > >>>>>>
> > > >>>>>> To zoom in on "backup" as an example, the v2 API I'm referring
> to
> > > >>>>>> looks like:  /v2/collections" -d '{ "backup-collection":
> > > >>>>>> {"collection": "books", "name": "asdf3", "location":
> "/tmp/foo"}}'.
> > > >>>>>> And it's included in the v2 "introspect" documentation returned
> by
> > > >>>>>> this API:
> /v2/collections/_introspect?command=backup-collection".  To
> > > >>>>>> me that looked like a v2 API, but maybe path-aliases are also
> covered
> > > >>>>>> in the introspect docs.
> > > >>>>>>
> > > >>>>>> Jason
> > > >>>>>>
> > > >>>>>>> On Wed, Dec 23, 2020 at 10:29 AM Jan Høydahl <
> [email protected]> wrote:
> > > >>>>>>>
> > > >>>>>>> Actually, don’t think we do have a v2 Backup/Restore API. We
> have a path alias to the old API which takes GET ...&action=backup... but
> we don’t have a true v2 API spec for it, do we? Where is that documented?
> > > >>>>>>>
> > > >>>>>>> Jan Høydahl
> > > >>>>>>>
> > > >>>>>>>>> 22. des. 2020 kl. 18:04 skrev Jason Gerlowski <
> [email protected]>:
> > > >>>>>>>>
> > > >>>>>>>> Hey guys,
> > > >>>>>>>>
> > > >>>>>>>> Following up to make sure I understand the specifics you're
> > > >>>>>>>> suggesting.  You're proposing that:
> > > >>>>>>>>
> > > >>>>>>>> 1. The brand new backup-related APIs (list-backups and
> delete-backup)
> > > >>>>>>>> be added in v2-form only.
> > > >>>>>>>> 2. Tweaks to existing backup-related APIs (create-backup,
> restore) be
> > > >>>>>>>> made in V2-form only.
> > > >>>>>>>> 3. All existing v1 backup-related APIs be deprecated and left
> > > >>>>>>>> unchanged.  Incremental backups will not be possible using
> the v1 API.
> > > >>>>>>>>
> > > >>>>>>>> I'm not against going this route if there's consensus around
> it.  But
> > > >>>>>>>> I'm not 100% clear on how it means we don't need to worry
> about
> > > >>>>>>>> backcompat.  Backup and Restore currently exist as both a v1
> and a v2
> > > >>>>>>>> API - I understand how leaving the v1 APIs untouched (other
> than
> > > >>>>>>>> deprecation) frees us of some backcompat concerns there, but
> we would
> > > >>>>>>>> still need to make tweaks to the v2 backup/restore APIs and
> would have
> > > >>>>>>>> to tread just as carefully there in terms of backcompat,
> afaict.
> > > >>>>>>>> Unless Solr's backcompatibility guarantees only cover the v1
> API and
> > > >>>>>>>> leave v2 changes to be made freely?  I looked around to see
> if the v2
> > > >>>>>>>> APIs had any sort of "experimental" designation, but couldn't
> find
> > > >>>>>>>> that clearly stated anywhere.  Am I missing something?
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>>
> > > >>>>>>>> Jason
> > > >>>>>>>>
> > > >>>>>>>>> On Tue, Dec 22, 2020 at 2:49 AM Noble Paul <
> [email protected]> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> , and implement the new imporved version as a V2-api only,
> and then deprecate the v1 API?
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> V2 only please
> > > >>>>>>>>>
> > > >>>>>>>>>> On Tue, Dec 22, 2020 at 1:34 AM Jason Gerlowski <
> [email protected]> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> Hey Jan, thanks for the review.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I hadn't thought about the V2 API in connection to this
> work.  You're
> > > >>>>>>>>>> right though I think - the SIP proposes net-new APIs, so it
> should add
> > > >>>>>>>>>> V2 equivalents at the very least.  I'll draft tentative
> details for
> > > >>>>>>>>>> these APIs on the SIP and we can refine things from there.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I'm more up in the air on your specific suggestion to
> restrict the SIP
> > > >>>>>>>>>> changes to these v2 APIs.  It is an elegant approach to the
> > > >>>>>>>>>> backcompat, and it provides a carrot for v2 adoption - both
> of which I
> > > >>>>>>>>>> like.  But it would let users create snapshot-based backups
> (and keep
> > > >>>>>>>>>> us maintaining that code) longer than there's any strict
> need to.  And
> > > >>>>>>>>>> users are left on the less-efficient format by default.
> (By contrast,
> > > >>>>>>>>>> the current SIP has snapshot-backup creation being replaced
> by
> > > >>>>>>>>>> incremental-backup creation as soon as the latter is
> available.).  Did
> > > >>>>>>>>>> you have a particular lifespan in mind for snapshot-based
> creation if
> > > >>>>>>>>>> we go with this approach?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Jason
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Thu, Dec 17, 2020 at 3:54 PM Jan Høydahl <
> [email protected]> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Much needed! Thanks for initiating this Jason!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> As we want to move away from v1 APIs where a HTTP GET is
> used for creation and deletion, would it be an idea to leave the old
> backup/resotre APIs as-is, and implement the new imporved version as a
> V2-api only, and then deprecate the v1 API? Then we don't need to worry
> about back-compat, and we get a head-start on converting the COLLECTION API
> to v2 style.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Jan
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> 15. des. 2020 kl. 15:48 skrev Jason Gerlowski <
> [email protected]>:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Hey all,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> This morning I published SIP-12, which proposes an
> overhaul of Solr's
> > > >>>>>>>>>>>> backup and restore functionality.  While the "headline"
> improvement in
> > > >>>>>>>>>>>> this SIP is a change to do backups incrementally, it
> bundles in a
> > > >>>>>>>>>>>> number of other improvements as well, including the
> addition of
> > > >>>>>>>>>>>> corruption checks, APIs to list and delete backups, and
> stronger
> > > >>>>>>>>>>>> integration points with popular object storage APIs.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> The SIP can be found here:
> > > >>>>>>>>>>>>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Please read the SIP description and come back here for
> discussion.  As
> > > >>>>>>>>>>>> the discussion progresses we will update the SIP page
> with any
> > > >>>>>>>>>>>> outcomes and eventually move things to a VOTE.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Looking forward to hearing your feedback.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Jason
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> ---------------------------------------------------------------------
> > > >>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>>>>>>>>> For additional commands, e-mail:
> [email protected]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> ---------------------------------------------------------------------
> > > >>>>>>>>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>>>>>>>> For additional commands, e-mail:
> [email protected]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> ---------------------------------------------------------------------
> > > >>>>>>>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>>>>>>> For additional commands, e-mail: [email protected]
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> -----------------------------------------------------
> > > >>>>>>>>> Noble Paul
> > > >>>>>>>>>
> > > >>>>>>>>>
> ---------------------------------------------------------------------
> > > >>>>>>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>>>>>> For additional commands, e-mail: [email protected]
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> ---------------------------------------------------------------------
> > > >>>>>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>>>>> For additional commands, e-mail: [email protected]
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> ---------------------------------------------------------------------
> > > >>>>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>>>> For additional commands, e-mail: [email protected]
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> ---------------------------------------------------------------------
> > > >>>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>>> For additional commands, e-mail: [email protected]
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> ---------------------------------------------------------------------
> > > >>>>> To unsubscribe, e-mail: [email protected]
> > > >>>>> For additional commands, e-mail: [email protected]
> > > >>>>>
> > > >>>>
> > > >>>>
> ---------------------------------------------------------------------
> > > >>>> To unsubscribe, e-mail: [email protected]
> > > >>>> For additional commands, e-mail: [email protected]
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> ---------------------------------------------------------------------
> > > >>> To unsubscribe, e-mail: [email protected]
> > > >>> For additional commands, e-mail: [email protected]
> > > >>>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to