Fixing Inefficient Solr Operator Backups

Jason Gerlowski Fri, 30 Jul 2021 12:15:23 -0700

Hey all,

I've been getting familiar in the last week or two with our new
operator, and noticed that the way its backups work will miss out on
the "incremental" efficiency improvements added recently as a part of
SIP-12.  For backups to be done incrementally, an ongoing backup has
to be able to "see" the files stored by previous backups so that it
knows which index files to skip over.  Our current operator support
does a few things that prevent this in practice:


- the operator "rm -rf"s all files at the backup location before
starting each new backup
- the operator requests each backup at a unique name/location.
- the operator compresses the backup file tree after finishing each backup

Everything will still work, the backups just won't be nearly as
efficient for many common usecases as they could be.

There's a few ways we could address this.

In one approach, we could leave 'solrbackup' mostly untouched. For
"incremental" situations, we would create a new resource-type
('solrbackupschedule'? 'solrbackuprepeating'?) that's explicitly
geared towards repeated backups of the same collections and knows to
store these all in the same location.  Conceivably it could also have
other useful ops features like cron-job-like scheduling of backups.
'solrbackupschedule' would then be our solution for users who want to
do recurring or repeated backups, and 'solrbackup' could be
repositioned in the docs as the solution for those doing an ad-hoc,
standalone backup.

Another approach would be to focus instead on adding configuration
options to 'solrbackup' that would make it suitable for incremental
backups: enable/disable backup compression, cleaning/retaining the
"location" prior to doing a backup, an override for the backup
location, etc.  'solrbackup' would remain the option for anyone doing
any sort of backup.  (Of course, we could also add a
solrbackupschedule resource-type as a layer on top of this if the idea
of cron-like backup triggering is appealing, but it could be
implemented in terms of managing 'solrbackup' sub-resources that
perform the actual "work".)

There are tradeoffs for both approaches IMO.

The first approach is simplest in terms of backcompat.  It may also
prove simplest in handling discrepancies between Solr versions
(incremental backups only supported in v8.9+).  But it leaves a
potential usecase gap: users may take backups frequently enough to
benefit from "incrementality", but without any sort of defined
schedule or set periodicity like a 'solrbackupschedule' resource might
require.  It also risks duplicating code as both 'solrbackup' and
'solrbackupschedule' would involve similar actions.

OTOH, the second approach is more flexible ('solrbackup' would become
suitable for any common backup usecase), and 'solrbackupschedule', if
created, has a really nice conceptual separation being implemented as
a level on top of 'solrbackup'.  But it pays for this all by making
'solrbackup' more complex and harder for a non-Solr-SME to "get right"
out of the box and opening some backcompat questions/challenges.
Lastly, it'd require us to think carefully about how cleanup and
resource-deletion works, since this approach will allow multiple
'solrbackup' resources to share a backup "location".

Anyone have any thoughts or preferences between those two options?  Or
some third approach I missed?  Or even general context around why our
operator backup support looks the way it does?  Really appreciate any
input!

Best,

Jason

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Fixing Inefficient Solr Operator Backups

Reply via email to