On 2019-11-08 09:13, ellie timoney wrote:
I'm not sure if I'm just not understanding, but if the chunk offsets were to 
remain the same, then there's no benefit to compaction? A (say) 2gb file full 
of zeroes between small chunks is still the same 2gb on disk as one that's 
never been compacted at all!

That's true.  I suppose I'm imagining a threshold, where if the file hits, say, 20% wasted space, then I can "defrag" the file and recover the lost space, on the understanding that the next sync will have to copy the entire file again.

But you mentioned:

And if you don't use the compaction feature, you might as well skip the backups 
system entirely, and have your backup server just be a normal replica that 
doesn't accept client traffic (maybe with a very long cyr_expire -D time?), and 
then you shut it down on schedule for safe block/file system backups to your 
offsite location.
... and that seems a more reasonable approach.  I didn't know if copying the filesystem of a (paused) Cyrus replica was a supported way of backing up, but now I do.  Is there a list of which database and index files I need to copy apart from the files inside the partition structure?
This setting might be helpful:

           backup_compact_work_threshold: 1
               The  number of chunks that must obviously need compaction before 
the com‐
               pact tool will go ahead with the compaction.  If set to  less  
than  one,
               the value is treated as being one.
If you set your backup_compact_min/max_sizes to a size that's 
comfortable/practical for your block backup algorithm, but then set a very lax 
backup_compact_work_threshold, you might be able to find a sweet spot where 
you're getting the benefits of compaction eventually, but are not constantly 
changing every block in the file (until you do).  The default (1) is basically 
for compaction to occur as soon as there's something to compact out, just 
because the default had to be something, and without experiential data any 
other value would just be a hat rabbit.  But this sounds like a case where a 
big number would play nicer.

I guess I'd try to target a minimum size of 1 disk block per chunk, and a 
maximum of (fair dice roll) 4 disk blocks? But you'd need some experimentation 
to figure out ballpark numbers, and won't be able to tune it to exact block 
sizes, because the configured thresholds are the uncompressed data size, not 
the compressed chunk size on disk.

Thanks, I saw that setting but didn't really think through how it would help me.  I'll experiment with it and report back.

--
*Deborah Pickett*
System Administrator
*Polyfoam Australia Pty Ltd*
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Reply via email to