Hey all,

Two follow ups on recent discussion.

I reviewed the gc/ref-counting part of the BlobDirectory proposal on
SOLR-15051 that David mentioned.  We talked about it a bit offline and
agreed that while an automatic gc mechanism is really needed for what
he's trying to do, the requirements of the backup usecase are
different enough that SIP-12 can get by with manually-triggered
'purging'.  Mostly because infrequent static backups produce much less
garbage than continually tracking all files for a (possibly
ever-changing) index.

> I'd be open to creating a new v2 backup endpoint (without adding TRA, etc. 
> compatibility) if there was consensus on that approach to handling backcompat 
> and on the specific appearance of the API

On second thought, I'm going to flip-flop on this.  Coming up with a
better v2 API for backup/restore will be easier *after* some of the
questions Jan raised (multi-collection? alias support? etc.) have been
dealt with.  i.e. It's tough to decide between /v2/cluster/backups and
/v2/collections/<collectionName>/backups as alternatives until you
figure out whether we currently support multi-collection backup, or
want to in the near future.  If people feel strongly or would veto the
vote otherwise, then I'll try my best.  But otherwise I think we're
best served waiting until other stuff settles out to revisit larger v2
backup API changes.

Best,

Jason

On Mon, Jan 11, 2021 at 10:41 AM Jason Gerlowski <gerlowsk...@gmail.com> wrote:
>
> Hey all,
>
> I've put replies to everyone's questions below.  Hope they help!
>
> > Do the shard metadata files list all of the segments that make up the 
> > backup, or only the segments that were uploaded in this incremental update?
>
> Mike: The former - they're intended to hold metadata about all of the
> segments that are needed to restore to the given
> snapshot/commit-point.  So it's likely to hold metadata about files
> just uploaded, as well as ones that were added to the blob by previous
> backups.  I'll see if I can make that clearer in the file
> descriptions.
>
> > leave the old Backup/Restore API as-is, deprecated, and add a new one on 
> > /v2/cluster/backup
>
> Jan: Ultimately I agree with your concerns about scope, so I'd vote
> against trying to cover TRAs, multiple collection backups, etc. in
> this effort here.
>
> That aside though, I agree that the existing v2 backup API is a bit of
> a headscratcher.  Why is it /v2/collections instead of
> /v2/collections/<collectionName> or a subpath of /v2/cluster?  Does it
> have something to do with aliases?  Or did it end up there mostly by
> default?  I'd be open to creating a new v2 backup endpoint (without
> adding TRA, etc. compatibility) if there was consensus on that
> approach to handling backcompat and on the specific appearance of the
> API.  It would help with backcompat after all.  Though if finding
> consensus bogs down it may not be worth the addition.
>
> > I know you've seen SOLR-15051 (Shared storage -- BlobDirectory) ... We both 
> > want to store checksums and file lengths. ... Your proposal did not discuss 
> > how these files are GC'ed
>
> David: SIP-12 does address this, though maybe the writeup needs
> clarifying.  The Delete Backup API includes a "purge" parameter which
> triggers GC activity.  This probably works about the way you'd expect
> - Solr gets the list of UUID-named index files from the blob store,
> and then it compares that list to the set of UUID's referenced by any
> shard-metadata file (which requires reading all the shard-metadata
> files).  This avoids adding to Solr's ZK state, but does so at the
> cost of requiring users to trigger sporadic cleanup manually instead
> of detecting orphans automatically like BlobDirectory does (assuming I
> understand that correctly).
>
> I'm def not saying this is the best approach necessarily.  I like it,
> though it has downsides for sure.  Just that there is a proposed
> approach that's easy to miss buried in the SIP.
>
> More broadly though - I share your sense that we should consider
> alignment.  It may end up that Backup/Restore is different enough from
> the BlobDirectory usecase that it doesn't make sense, but it's at
> least worth figuring out.  That's about as far as my understanding
> goes right now though.  I'll read up on BlobDirectory while you absorb
> SIP-12 and maybe we can circle back to this shortly.
>
> Best,
>
> Jason
>
> On Sun, Jan 10, 2021 at 7:20 AM Jan Høydahl <jan....@cominvent.com> wrote:
> >
> > Jason, Shalin and Dat, thanks for the thorough work. This is an example for 
> > other SIPs to follow!
> >
> > > I've also amended the backcompat/migration section to mention Jan's
> > > suggestion that the "incremental" features be exposed in the v2 API
> > > only.  Though it's unclear to me whether that's still something people
> > > want since it turns out that we'll still have backcompat concerns with
> > > the existing v2 backup/restore APIs.  So I've held off from
> > > removing/replacing the original plan.
> >
> > Since we already have v2 for the existing backup API, I guess my suggestion 
> > is not that 'clean' after all.
> >
> > Another approach would be to leave the old Backup/Restore API as-is, 
> > deprecated, and add a new one on /v2/cluster/backup, with support for 
> > backing up multiple collections in one go, or backup a TRA alias with 
> > hundreds of concrete "sub" collections. But as I write these words I 
> > imagine it probably is way outside the scope for this SIP which is large 
> > enough. Anyone even tried to backup a TRA with today's API?
> >
> > Jan
> >
> > > 5. jan. 2021 kl. 15:55 skrev Jason Gerlowski <gerlowsk...@gmail.com>:
> > >
> > > Hey, Happy New Year everybody.
> > >
> > > Some SIP updates based on the discussion above:
> > >
> > > I added v2 examples for each API to the SIP.  Feedback welcome,
> > > especially on the v2 APIs that are net-new to this proposal (namely:
> > > "list backups" and "delete backup").
> > >
> > > I've also amended the backcompat/migration section to mention Jan's
> > > suggestion that the "incremental" features be exposed in the v2 API
> > > only.  Though it's unclear to me whether that's still something people
> > > want since it turns out that we'll still have backcompat concerns with
> > > the existing v2 backup/restore APIs.  So I've held off from
> > > removing/replacing the original plan.
> > >
> > > Link for convenience:
> > > https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore
> > >
> > > Best,
> > >
> > > Jason
> > >
> > >
> > > On Thu, Dec 24, 2020 at 8:11 AM Jan Høydahl <jan....@cominvent.com> wrote:
> > >>
> > >> Ok, that’s the one I was looking for, it’s not documented in the backup 
> > >> chapter of ref-guide :(
> > >>
> > >> Jan Høydahl
> > >>
> > >>> 23. des. 2020 kl. 17:10 skrev Jason Gerlowski <gerlowsk...@gmail.com>:
> > >>>
> > >>> 
> > >>>>
> > >>>> We have a path alias to the old API ... but we don’t have a true v2 
> > >>>> API spec for it, do we?
> > >>>
> > >>> Tbh I'm not yet familiar enough with the v2 APIs to understand the
> > >>> distinction you're making.  (Do you have a pointer to something that'd
> > >>> fill me in?)
> > >>>
> > >>> To zoom in on "backup" as an example, the v2 API I'm referring to
> > >>> looks like:  /v2/collections" -d '{ "backup-collection":
> > >>> {"collection": "books", "name": "asdf3", "location": "/tmp/foo"}}'.
> > >>> And it's included in the v2 "introspect" documentation returned by
> > >>> this API: /v2/collections/_introspect?command=backup-collection".  To
> > >>> me that looked like a v2 API, but maybe path-aliases are also covered
> > >>> in the introspect docs.
> > >>>
> > >>> Jason
> > >>>
> > >>>> On Wed, Dec 23, 2020 at 10:29 AM Jan Høydahl <jan....@cominvent.com> 
> > >>>> wrote:
> > >>>>
> > >>>> Actually, don’t think we do have a v2 Backup/Restore API. We have a 
> > >>>> path alias to the old API which takes GET ...&action=backup... but we 
> > >>>> don’t have a true v2 API spec for it, do we? Where is that documented?
> > >>>>
> > >>>> Jan Høydahl
> > >>>>
> > >>>>>> 22. des. 2020 kl. 18:04 skrev Jason Gerlowski 
> > >>>>>> <gerlowsk...@gmail.com>:
> > >>>>>
> > >>>>> Hey guys,
> > >>>>>
> > >>>>> Following up to make sure I understand the specifics you're
> > >>>>> suggesting.  You're proposing that:
> > >>>>>
> > >>>>> 1. The brand new backup-related APIs (list-backups and delete-backup)
> > >>>>> be added in v2-form only.
> > >>>>> 2. Tweaks to existing backup-related APIs (create-backup, restore) be
> > >>>>> made in V2-form only.
> > >>>>> 3. All existing v1 backup-related APIs be deprecated and left
> > >>>>> unchanged.  Incremental backups will not be possible using the v1 API.
> > >>>>>
> > >>>>> I'm not against going this route if there's consensus around it.  But
> > >>>>> I'm not 100% clear on how it means we don't need to worry about
> > >>>>> backcompat.  Backup and Restore currently exist as both a v1 and a v2
> > >>>>> API - I understand how leaving the v1 APIs untouched (other than
> > >>>>> deprecation) frees us of some backcompat concerns there, but we would
> > >>>>> still need to make tweaks to the v2 backup/restore APIs and would have
> > >>>>> to tread just as carefully there in terms of backcompat, afaict.
> > >>>>> Unless Solr's backcompatibility guarantees only cover the v1 API and
> > >>>>> leave v2 changes to be made freely?  I looked around to see if the v2
> > >>>>> APIs had any sort of "experimental" designation, but couldn't find
> > >>>>> that clearly stated anywhere.  Am I missing something?
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> Jason
> > >>>>>
> > >>>>>> On Tue, Dec 22, 2020 at 2:49 AM Noble Paul <noble.p...@gmail.com> 
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> , and implement the new imporved version as a V2-api only, and then 
> > >>>>>>> deprecate the v1 API?
> > >>>>>>
> > >>>>>>
> > >>>>>> V2 only please
> > >>>>>>
> > >>>>>>> On Tue, Dec 22, 2020 at 1:34 AM Jason Gerlowski 
> > >>>>>>> <gerlowsk...@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Hey Jan, thanks for the review.
> > >>>>>>>
> > >>>>>>> I hadn't thought about the V2 API in connection to this work.  
> > >>>>>>> You're
> > >>>>>>> right though I think - the SIP proposes net-new APIs, so it should 
> > >>>>>>> add
> > >>>>>>> V2 equivalents at the very least.  I'll draft tentative details for
> > >>>>>>> these APIs on the SIP and we can refine things from there.
> > >>>>>>>
> > >>>>>>> I'm more up in the air on your specific suggestion to restrict the 
> > >>>>>>> SIP
> > >>>>>>> changes to these v2 APIs.  It is an elegant approach to the
> > >>>>>>> backcompat, and it provides a carrot for v2 adoption - both of 
> > >>>>>>> which I
> > >>>>>>> like.  But it would let users create snapshot-based backups (and 
> > >>>>>>> keep
> > >>>>>>> us maintaining that code) longer than there's any strict need to.  
> > >>>>>>> And
> > >>>>>>> users are left on the less-efficient format by default.  (By 
> > >>>>>>> contrast,
> > >>>>>>> the current SIP has snapshot-backup creation being replaced by
> > >>>>>>> incremental-backup creation as soon as the latter is available.).  
> > >>>>>>> Did
> > >>>>>>> you have a particular lifespan in mind for snapshot-based creation 
> > >>>>>>> if
> > >>>>>>> we go with this approach?
> > >>>>>>>
> > >>>>>>> Jason
> > >>>>>>>
> > >>>>>>> On Thu, Dec 17, 2020 at 3:54 PM Jan Høydahl <jan....@cominvent.com> 
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Much needed! Thanks for initiating this Jason!
> > >>>>>>>>
> > >>>>>>>> As we want to move away from v1 APIs where a HTTP GET is used for 
> > >>>>>>>> creation and deletion, would it be an idea to leave the old 
> > >>>>>>>> backup/resotre APIs as-is, and implement the new imporved version 
> > >>>>>>>> as a V2-api only, and then deprecate the v1 API? Then we don't 
> > >>>>>>>> need to worry about back-compat, and we get a head-start on 
> > >>>>>>>> converting the COLLECTION API to v2 style.
> > >>>>>>>>
> > >>>>>>>> Jan
> > >>>>>>>>
> > >>>>>>>>> 15. des. 2020 kl. 15:48 skrev Jason Gerlowski 
> > >>>>>>>>> <gerlowsk...@gmail.com>:
> > >>>>>>>>>
> > >>>>>>>>> Hey all,
> > >>>>>>>>>
> > >>>>>>>>> This morning I published SIP-12, which proposes an overhaul of 
> > >>>>>>>>> Solr's
> > >>>>>>>>> backup and restore functionality.  While the "headline" 
> > >>>>>>>>> improvement in
> > >>>>>>>>> this SIP is a change to do backups incrementally, it bundles in a
> > >>>>>>>>> number of other improvements as well, including the addition of
> > >>>>>>>>> corruption checks, APIs to list and delete backups, and stronger
> > >>>>>>>>> integration points with popular object storage APIs.
> > >>>>>>>>>
> > >>>>>>>>> The SIP can be found here:
> > >>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-12%3A+Incremental+Backup+and+Restore
> > >>>>>>>>>
> > >>>>>>>>> Please read the SIP description and come back here for 
> > >>>>>>>>> discussion.  As
> > >>>>>>>>> the discussion progresses we will update the SIP page with any
> > >>>>>>>>> outcomes and eventually move things to a VOTE.
> > >>>>>>>>>
> > >>>>>>>>> Looking forward to hearing your feedback.
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>>
> > >>>>>>>>> Jason
> > >>>>>>>>>
> > >>>>>>>>> ---------------------------------------------------------------------
> > >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> ---------------------------------------------------------------------
> > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> ---------------------------------------------------------------------
> > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> -----------------------------------------------------
> > >>>>>> Noble Paul
> > >>>>>>
> > >>>>>> ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>>>>>
> > >>>>>
> > >>>>> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>>>>
> > >>>>
> > >>>> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to